cobertos/timelinize

Author	SHA1	Message	Date
Matthew Holt	e7650c784a	Some minor changes - New config parameter "resume_jobs" which can disable auto-resuming jobs at timeline open. (closes #159) - Renamed "a" to "app" in one method using "Rename symbol" (not "Change all occurrences"), which surprisingly updated the identifier in ALL methods. That must be new. Anyway, that's the huge diff. - Minor fix to metadata merge that does a more proper nil check to avoid a panic. - Changed some omitempty to omitzero	2025-10-22 15:13:32 -06:00
Matthew Holt	3bb6af98c1	Ensure meta map is not nil (fix #160 )	2025-10-21 11:38:15 -06:00
Matthew Holt	6fa22bed8b	Support rendering of event class items	2025-10-07 09:53:31 -06:00
Matthew Holt	e9a7c03c53	Fix ExFAT crashes; refactor sql.DB handling The crashes on ExFAT are caused by a bug in the MacOS ExFAT driver. It is unclear whether other OSes are affected too. https://github.com/mattn/go-sqlite3/issues/1355 We now utilize sqlite's concurrency features by creating a write pool (size 1) and a read pool, and can eliminate our own RWMutex, which prevents reads at the same time as writes. Sqlite's WAL mode allows reads concurrent with writes, and our code is much cleaner. Still need to do similar for the thumbnail DB. Also could look into using prepared statements for more efficiency gains.	2025-09-30 12:31:41 -06:00
Matthew Holt	4bd1ae8856	Optionally generate thumbnails during import This does away with the experimental generation of thumbhashes during import. It's easier to generate the thumbnails and thumbhashes at the same time. Does add a DB lock to phase1, but at this point the DB isn't the bottleneck in that phase.	2025-09-18 17:37:53 -06:00
Matthew Holt	c8c1b65ce2	Try generating thumbhashes during import pipeline Also show loading spinner for videos	2025-09-18 09:07:05 -06:00
Matthew Holt	faac1fcb60	facebook: Import check-ins and tagged places; fix post attachments We now import check-ins and tagged places, so, mainly location-related data. Some of the concepts still need a good mapping to the project, e.g. whether to put the coordinates as the item (i.e. "I was here") or with the entity (i.e. "this place is here"). The code that imported post attachments is some of the oldest code in the base, about 10 years ago. Back then, an item could have both text and file contents, and we just combined all the various attachment data for an attachment into one related item. Now, we properly treat them as separate. Still need to add posts in groups and FB stories.	2025-09-16 23:26:11 -06:00
Matthew Holt	2b5fd57259	Proper support for mixed timestamps and time zones This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only. Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app. Can now try to infer time zones during import, which is the default setting. This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.	2025-09-12 11:17:49 -06:00
Matthew Holt	a0e7c0eefd	Include time_offset when updating timestamp	2025-09-04 21:55:22 -06:00
Matthew Holt	b3376b5298	Fix pipeline bugs; rethink embeddings Fixed several bugs introduced by the pipeline refactoring. Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images. Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc. Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.	2025-09-04 21:40:50 -06:00
Matt Holt	a85f47f1a3	Major processor refactor (#112 ) * Major processor refactor - New processing pipeline, vastly simplified - Several edge case bug fixes related to Google Photos (but applies generally too) - Major import speed improvements - UI bug fixes - Update dependencies The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded. In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor. The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3. The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases. This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about. On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size. An import of about 200,000 text messages, including media attachments, finished in about 2 minutes. My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB. Also fixed several other minor bugs/edge cases. This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge. * Round out a few corners; fix some bugs * Appease linter * Try to fix linter again * See if this works * Try again * See what actually fixed it * See if allow list is necessary for replace in go.mod * Ok fine just move it into place * Refine retrieval keys a bit * One more test	2025-09-02 11:18:39 -06:00
Matthew Holt	8388bb78b4	Fix typo	2025-06-20 08:01:37 -06:00
Matthew Holt	fa9ad482b3	Place entities from GPX sources; several other improvements/fixes Location processing is still being revised (WIP).	2025-06-09 17:18:44 -06:00
Matthew Holt	ebc731d221	Vastly speed up imports ?? (WIP)	2025-05-30 11:14:09 -06:00
Matthew Holt	31f003b3d4	Fix metadata updates for items and relationships Also relocate data files if the item's timestamp changes	2025-05-28 18:09:46 -06:00
Matthew Holt	1bd7c2a5c8	Fix several bugs related to duplicates, lat/lon tolerances, etc. Separate altitude out from latlon in unique constraints	2025-05-25 12:36:03 -06:00
Matthew Holt	d268486f55	Several import fixes; metadata merging - Quick unit tests for a function related to Google Takeout archives - We now combine existing metadata with new according to the update policy, instead of either writing all or none of incoming metadata. This merging happens before the DB update query and is a bit of a special case as the policy is applied per-key. - Special handling for corrupted timestamp in Google Photos data. This is a singular case I haven't observed more of, but seems like a reasonable heuristic. There might be thousands more out there, who knows. - Fix job creation time (milliseconds) - Hopefully make repeated imports faster by skipping duplicate items more intelligently based on update policies.	2025-05-19 12:47:18 -06:00
Matthew Holt	0d26c6eb31	Fix several bugs - Obfuscation mode enabled would set a fake phone number in smsbackuprestore's DS options, which led to bad data. Now, the UI does not auto-fill that value. But that means we need... - SMS Backup & Restore: Phone number can now be inferred from repo owner in the backend, if ds opt phone number is empty. This works even with obfuscation enabled. - Aborting a scheduled job before it starts now stays aborted. (Unless you manually restart it.) - Added a data validation error modal for DS options on the import page. For now, if smsbackuprestore has no phone number set, and the timeline repo owner doesn't have a phone number, an error will be shown.	2025-05-15 16:53:35 -06:00
Matthew Holt	ae3a5d02b0	Field update preferences allow more control over item updates	2025-05-09 10:04:03 -06:00
Matthew Holt	ffc8ad6f51	applephotos: Preserve a lot more metadata about people in photos Also infer owner entity from DB if necessary, very cool! Also fix a couple minor bugs	2025-05-05 14:48:54 -06:00
Matthew Holt	a62f4aa05a	applephotos: Initial commit of Apple Photos data source Still a WIP, but mostly there!	2025-05-05 12:07:13 -06:00
Matthew Holt	ba4635cf7e	Fix data file handling It wasn't updated properly with the big pipeline refactor	2025-05-04 13:28:20 -06:00
Matthew Holt	3d2222fce2	Fix thumbnail job size count and paging; other minor fixes Including one fix for a panic introduced by obfuscated logging during processing	2025-05-01 11:15:13 -06:00
Matthew Holt	f0697d2d6b	Refactor embedding jobs; enhance tooltips; upgrade gofakeit to v7 The gofakeit upgrade uses the new math/rand/v2 package, which uses uint64 more than int64, so we had to change a bunch of row IDs from int64 to uint64.	2025-04-24 16:33:41 -06:00
Sergio Rubio	612cae9c03	Firefox data source (#55 ) * [WiP] Firefox data source Work in progress. Implements a new Firefox datasource capable of reading its places.sqlite database to import the browser history (page visits). The implementation currently has a number of issues: * Firefox (and Firefox based) browser keeps an exclusive lock on the places.sqlite database, and we can't dump or backup it while the browser is open, at least on Linux. To work around that, we copy the database to a temporary directory and import from it. This generally works, but isn't safe, as there's a risk of database corruption when doing the hot copy. Potential alternatives: * Ask the user to close the browser while the import happens, which isn't convenient/possible if this is happening regularly in the background. * Ignore and retry, as it'll eventually succeed, in the rare case the temporary db copy is corrupted and unreadable * Something else, no expert here. * You need to point Timelinize to the places.sqlite file directly. Pointing it to the Firefox profile directory doesn't seem to work, as it doesn't seem to scan recursively or list all the directory files and pass them to Recognize. I'm probably missing something obvious here. * Missing tests (will be added) * Linter fixes * Adapt it to the new API * Send the full path to process * Simplify import process * Add datasource description * Use the URL as the item content * Add basic tests * Give the test some more time * Do not return an error if context was cancelled	2025-04-08 07:02:20 -06:00
Matthew Holt	932831db47	Refactor data sources to make them dynamic Also change the checkbox dropdown to a more interactive tomselect (type-to-search dropdown with chips) with pictures. This makes it so data sources can be added to a timeline dynamically. In the future, data sources can be implemented externally and push data to the timeline, so these need to not be rigidly hard-coded into the app and assumed to never change. This essentially adds all their info (name, title, description, image, etc) into each timeline DB.	2025-02-11 16:49:20 -07:00
Matthew Holt	d18c4cd2c8	googlelocation: Fix on-device Android format handling	2025-02-05 13:52:20 -07:00
Matt Holt	628ecc1cb3	ci: Update workflows; restore functioning CI jobs (#64 ) * ci: Attempt to fix broken CI It broke out of the blue several months ago. I think ubuntu-latest updated, but there's no PPA for libheif in that distro I guess * Try tests next * More fixing * Try again * Yada yada * Woops * I don't really know what I'm doing	2025-01-27 22:30:54 -07:00
Matthew Holt	c3dc7728a1	Add basic calendar (.ics) data source	2025-01-26 14:39:02 -07:00
Matthew Holt	29e2bc8fef	Fix iphone/imessage: Update attribute_id in DB if inserting item piecewise iMessage db may send a reaction graph for a message before sending the message itself to the pipeline, thus an empty item with only an original ID gets inserted, and later the full message item comes in, but I had neglected to add attribute_id to updateOverrides.	2025-01-09 18:09:58 -07:00
Matthew Holt	bb9151628f	WIP: new entity page; delete almost all DB indexes Imports are now 4-5x faster and queries are still just about as fast. New indexes should only be created after proving their usefulness.	2025-01-07 13:42:05 -07:00
Matthew Holt	ce297389b0	Thumbnail job streaming; WIP: interactive imports	2024-12-19 06:51:06 -07:00
Matthew Holt	294e2a72a9	Reconnect after disconnection; improve checkpointing	2024-12-17 14:27:50 -07:00
Matthew Holt	d7b1d73796	Auto-resume jobs on start; improve checkpoint performance	2024-12-16 16:23:26 -07:00
Matthew Holt	e319e1f60d	Checkpoints for Google Location; minor jobs fix	2024-12-16 10:57:34 -07:00
Matthew Holt	22628833a7	Refactor obfuscation mode and some processing logic	2024-12-13 07:19:27 -07:00
Matthew Holt	786f516696	Refine import stream	2024-12-12 10:18:28 -07:00
Matthew Holt	742a543977	WIP live stream of imported items	2024-12-12 08:26:37 -07:00
Matt Holt	746e5d6b5c	Refactored import flow, new import UI, thumbnails stored in timeline, etc. (close #3 ) (#43 ) * Schema revisions for new import flow and thumbnails * WIP settings * WIP quick schema fix * gallery: Image search using ML embeddings Still very rough around the edges, but basically works. 'uv' gets auto-installed, but currently requires restarting Timelinize before it can be used. Lots of tunings and optimizations are needed. There is much room for improvement. Still migrating from imports -> jobs, so that part of the code and schema is still a mess. * Implement search for similar items * Finish import/planning rewrite; it compiles and tests pass * Fix some bugs, probably introduce other bugs * WIP new import planning page * Fix Google Photos and Twitter recognition * Finish most of import page UI; start button still WIP * WIP: Start Import button * Fixes to jobs, thumbnail job, import job, etc. * Implement proper checkpointing support; jobs fixes	2024-12-06 11:03:29 -07:00

39 commits