cobertos/timelinize

Author	SHA1	Message	Date
Matthew Holt	7f3c90b71a	Continue WIP interactive mode	2025-10-24 00:00:43 -06:00
Matthew Holt	03d126ed68	Use PRAGMA optimize instead of ANALYZE This is supposedly a smarter way to do ANALYZE, as it only analyzes what sqlite thinks is needful. Should hopefully address some reports of too-frequent, long-running analyze queries. There was one time I noticed that the pragma didn't improve query plans, until I ran analyze specifically which did improve it, but that was using the old DB connection model where I had a single pool of mixed readers/writers, so maybe it's possible that the new pooling style (separate r/w pools) also addresses that, I dunno.	2025-10-23 11:30:05 -06:00
Matthew Holt	e75213e841	Sanitize super-future timestamps (#145 )	2025-10-13 11:03:50 -06:00
Matthew Holt	20f6c4a8f5	Sanitize zero-coordinates (fix #145 )	2025-10-13 10:34:00 -06:00
Matthew Holt	0d1d4311ae	Fix unnecessary errors in thumbnail generation	2025-10-10 14:25:52 -06:00
Matthew Holt	55b687a7aa	Make a hot path query more efficient	2025-10-10 07:23:49 -06:00
Matthew Holt	eaff29e1c3	Log data file download duration	2025-10-07 13:36:57 -06:00
Matthew Holt	9fc0c3e5c1	Work around Google Photos bug with missing ext on sidecar video files Also fix motion picture transcoding for data files that don't have an extension, by looking up the media type of the image	2025-10-02 18:16:24 -06:00
Matthew Holt	fb3d529228	Refactor thumbnail DB handle as well Fix error when repo property doesn't exist	2025-09-30 14:04:53 -06:00
Matthew Holt	e9a7c03c53	Fix ExFAT crashes; refactor sql.DB handling The crashes on ExFAT are caused by a bug in the MacOS ExFAT driver. It is unclear whether other OSes are affected too. https://github.com/mattn/go-sqlite3/issues/1355 We now utilize sqlite's concurrency features by creating a write pool (size 1) and a read pool, and can eliminate our own RWMutex, which prevents reads at the same time as writes. Sqlite's WAL mode allows reads concurrent with writes, and our code is much cleaner. Still need to do similar for the thumbnail DB. Also could look into using prepared statements for more efficiency gains.	2025-09-30 12:31:41 -06:00
Matthew Holt	5994da8c75	Run ANALYZE less frequently; use write lock Doubt this will fix the DB corruption errors, but, likely a good change anyway	2025-09-26 14:57:36 -06:00
Matthew Holt	039dfe5ba8	Fix and optimize entity processing; faster imports Some certain rare edge cases were problematic, like when importing a contact list / vcard dataset after importing multiple messaging data sets, and there are entities with multiple phone numbers... That, and a few other things are handled better. The loadEntities query has been cleaned up and corrected. I got rid of autolink stuff with entity_attributes in the DB because it was not useful or really correct either. Added complexity causing bugs. Imports are sometimes about 20-50% faster now.	2025-09-25 22:49:39 -06:00
Matthew Holt	1aed8ca2ca	Fix missing data files in some cases The refactored processor had a bug where small, binary data files like images < 100 KB would be buffered entirely while peeking, and wouldn't end up being saved as a file. Fixed the logic around that and simplified a bit too.	2025-09-22 14:39:02 -06:00
Matthew Holt	16a7d99fda	Actually fix map colors, kind of The rendering seems inconsistent. If I refresh the page or load the results again, it fixes the color mismatches. I can't explain why they vary like this, other than potential mapbox bugs??	2025-09-20 10:08:42 -06:00
Matthew Holt	3c40bbc182	Minor UI fixes	2025-09-19 09:37:00 -06:00
Matthew Holt	aaaed9ab8d	One more fix	2025-09-18 21:20:34 -06:00
Matthew Holt	4bd1ae8856	Optionally generate thumbnails during import This does away with the experimental generation of thumbhashes during import. It's easier to generate the thumbnails and thumbhashes at the same time. Does add a DB lock to phase1, but at this point the DB isn't the bottleneck in that phase.	2025-09-18 17:37:53 -06:00
Matthew Holt	c8c1b65ce2	Try generating thumbhashes during import pipeline Also show loading spinner for videos	2025-09-18 09:07:05 -06:00
Matthew Holt	4a7458d048	facebook: Import photo albums	2025-09-16 16:47:11 -06:00
Matthew Holt	31dd7fd6f5	Try to support multi-archive Facebook exports; fix conversation loading Conversations with more than ~6 participants should now load properly, also faster thanks to a simplified query	2025-09-16 11:26:23 -06:00
Matthew Holt	2b5fd57259	Proper support for mixed timestamps and time zones This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only. Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app. Can now try to infer time zones during import, which is the default setting. This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.	2025-09-12 11:17:49 -06:00
Matthew Holt	967f3ab28b	Fix panic from EXIF parsing; checkpoint resumption in Google Photos Also show what file path information we do have for some imports that lack filename and preview, on the import job page.	2025-09-05 09:39:13 -06:00
Matthew Holt	b3376b5298	Fix pipeline bugs; rethink embeddings Fixed several bugs introduced by the pipeline refactoring. Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images. Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc. Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.	2025-09-04 21:40:50 -06:00
Matthew Holt	694d2a3959	Fix data file creation bug Oops! This created data files outside the data directory.	2025-09-02 15:06:37 -06:00
Matt Holt	a85f47f1a3	Major processor refactor (#112 ) * Major processor refactor - New processing pipeline, vastly simplified - Several edge case bug fixes related to Google Photos (but applies generally too) - Major import speed improvements - UI bug fixes - Update dependencies The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded. In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor. The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3. The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases. This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about. On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size. An import of about 200,000 text messages, including media attachments, finished in about 2 minutes. My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB. Also fixed several other minor bugs/edge cases. This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge. * Round out a few corners; fix some bugs * Appease linter * Try to fix linter again * See if this works * Try again * See what actually fixed it * See if allow list is necessary for replace in go.mod * Ok fine just move it into place * Refine retrieval keys a bit * One more test	2025-09-02 11:18:39 -06:00

25 commits