cobertos/timelinize

Author	SHA1	Message	Date
Matthew Holt	faac1fcb60	facebook: Import check-ins and tagged places; fix post attachments We now import check-ins and tagged places, so, mainly location-related data. Some of the concepts still need a good mapping to the project, e.g. whether to put the coordinates as the item (i.e. "I was here") or with the entity (i.e. "this place is here"). The code that imported post attachments is some of the oldest code in the base, about 10 years ago. Back then, an item could have both text and file contents, and we just combined all the various attachment data for an attachment into one related item. Now, we properly treat them as separate. Still need to add posts in groups and FB stories.	2025-09-16 23:26:11 -06:00
Matthew Holt	4a7458d048	facebook: Import photo albums	2025-09-16 16:47:11 -06:00
Matthew Holt	31dd7fd6f5	Try to support multi-archive Facebook exports; fix conversation loading Conversations with more than ~6 participants should now load properly, also faster thanks to a simplified query	2025-09-16 11:26:23 -06:00
Matthew Holt	2b5fd57259	Proper support for mixed timestamps and time zones This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only. Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app. Can now try to infer time zones during import, which is the default setting. This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.	2025-09-12 11:17:49 -06:00
Matthew Holt	c9db392d20	Implement timeline settings stored in DB; toggle semantic features I don't love that the type has to be stored in the table... it would be great if we could infer it, but I don't know how that would work for strings that look like another type.	2025-09-05 16:27:17 -06:00
Matthew Holt	967f3ab28b	Fix panic from EXIF parsing; checkpoint resumption in Google Photos Also show what file path information we do have for some imports that lack filename and preview, on the import job page.	2025-09-05 09:39:13 -06:00
Matthew Holt	a0e7c0eefd	Include time_offset when updating timestamp	2025-09-04 21:55:22 -06:00
Matthew Holt	b3376b5298	Fix pipeline bugs; rethink embeddings Fixed several bugs introduced by the pipeline refactoring. Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images. Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc. Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.	2025-09-04 21:40:50 -06:00
Matthew Holt	694d2a3959	Fix data file creation bug Oops! This created data files outside the data directory.	2025-09-02 15:06:37 -06:00
Matthew Holt	55053f8096	Fixes from job logging refactoring	2025-09-02 14:14:32 -06:00
Matt Holt	a85f47f1a3	Major processor refactor (#112 ) * Major processor refactor - New processing pipeline, vastly simplified - Several edge case bug fixes related to Google Photos (but applies generally too) - Major import speed improvements - UI bug fixes - Update dependencies The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded. In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor. The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3. The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases. This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about. On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size. An import of about 200,000 text messages, including media attachments, finished in about 2 minutes. My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB. Also fixed several other minor bugs/edge cases. This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge. * Round out a few corners; fix some bugs * Appease linter * Try to fix linter again * See if this works * Try again * See what actually fixed it * See if allow list is necessary for replace in go.mod * Ok fine just move it into place * Refine retrieval keys a bit * One more test	2025-09-02 11:18:39 -06:00
Matthew Holt	1f73da0527	Fix lint errors	2025-08-21 15:39:36 -06:00
Matthew Holt	3b670ff3f7	Allow opening timeline from parent folder This is useful if a My Timeline subfolder is (sort-of) implicitly created for the user, and the user doesn't realize that is where their timeline is. They should be able to select the same folder to open the timeline as they did to create it.	2025-07-16 22:11:47 -06:00
Matthew Holt	a52fb35c4d	Data sources can honor job pauses; minor improvements to some errors, logs	2025-07-15 15:58:02 -06:00
Matthew Holt	b365dbbafc	Fix panics with obfuscation	2025-07-09 13:30:50 -06:00
Matthew Holt	1d59104ab7	Try using 8-bit color depth on Windows Otherwise every encoded AVIF image gets a "10-bit colour depth not supported" on Windows.	2025-07-03 13:34:39 -06:00
Matthew Holt	d0f929cdc3	Like whack-a-mole with the linter	2025-07-01 15:46:32 -06:00
Matthew Holt	336ff7fae0	Fix new lint warnings Must have been a change in golang-ci-lint	2025-07-01 15:41:07 -06:00
Matthew Holt	8388bb78b4	Fix typo	2025-06-20 08:01:37 -06:00
Matthew Holt	230fcb8583	Avoid inserting/updating with empty (not null) metadata	2025-06-19 09:10:18 -06:00
Matthew Holt	9d75c2895f	Optimize entity loading in hot path of import job Verified with EXPLAIN QUERY PLAN	2025-06-18 19:09:48 -06:00
Matthew Holt	bf7f0cdf3c	Upgrade python deps; get locale on Windows	2025-06-18 10:33:30 -06:00
Matthew Holt	056f813889	gpx: Mark place entity points as significant Also still allow clustering significant points, since we do preserve them, the data source can just call ClusterPoints() to get it back...	2025-06-17 21:37:38 -06:00
Matt Holt	def05a6cfa	Revise location processing and improve place entities (#101 ) * Revise location processing and place entities - New, more dynamic, recursive clustering algorithm - Place entities are globally unique by name - Higher spatial tolerance for coordinate attributes if entity name is the same (i.e. don't insert new attribute row for coordinate if it's sort of close to another row for that attribute -- but if name is different, then points have to be closer to not insert new attribute row) There is still a bug where clustering is too aggressive on some data. Looking into it... * Fix overly aggressive clustering (...lots of commits that fixed the CI environment which changed things without warning...)	2025-06-17 16:13:44 -06:00
Matthew Holt	4bec2e0b86	Fix lint, tweak email recognition a bit more	2025-06-10 11:19:36 -06:00
Matthew Holt	fa9ad482b3	Place entities from GPX sources; several other improvements/fixes Location processing is still being revised (WIP).	2025-06-09 17:18:44 -06:00
JP Hastings-Edrei	27a2f462cf	lint: bump golangci-lint version (#92 ) * lint: bump golangci-lint version - Bumps the version of golangci-lint that's used in the Github Action to be the most recent version (as installed with eg. `brew install golangci-lint` — v2.1.6) - Migrates the `.golangci.toml` file, and manually moves the comments over - `errchkjson` appears to work now, so added that back into the linter (the `forbidigo` and `goheader` linters I've left commented out) * lint: remove checkers we don't like Removes two static checkers that cause code changes we don't like. * lint: remove old lint declaration apparently `gosimple` isn't available any more, so I've removed its `nolint` declaration here. * lint: swap location of `nolint:goconst` This _seems_ to be an unstable declaration, because of he parallel & undeterministic nature of the linter. If this keeps causing trouble we can either remove the goconst linter, or change _both_ of these lines to hold `//nolint:goconst,nolintlint`.	2025-06-02 15:03:19 -06:00
Matthew Holt	41ff81ceb6	Minor enhancements, fix howStored for items deduped by data file at end of pipeline	2025-05-30 16:20:26 -06:00
Matthew Holt	0c2b069e39	Bit of cleanup/comment enhancing	2025-05-30 11:42:18 -06:00
Matthew Holt	d4b71a35eb	Forgot to add IF NOT EXISTS to new indexes	2025-05-30 11:16:20 -06:00
Matthew Holt	ebc731d221	Vastly speed up imports ?? (WIP)	2025-05-30 11:14:09 -06:00
Matthew Holt	31f003b3d4	Fix metadata updates for items and relationships Also relocate data files if the item's timestamp changes	2025-05-28 18:09:46 -06:00
Matthew Holt	863d0e978b	Detect and handle corrupt timestamps a little better	2025-05-27 11:24:08 -06:00
Matthew Holt	39afe39a27	Wow data out there be realllly bad	2025-05-25 12:51:21 -06:00
Matthew Holt	1bd7c2a5c8	Fix several bugs related to duplicates, lat/lon tolerances, etc. Separate altitude out from latlon in unique constraints	2025-05-25 12:36:03 -06:00
Matthew Holt	2b586c56da	Treat lower precision input as unknown for coordinate uncertainty Rather than treating them as significant 0s	2025-05-23 13:51:52 -06:00
Matthew Holt	9dd00b724c	Use limited decimal precision for decision to reprocess coordinates Coordinates are arbitrary precision floats, so it is silly to compare, say, 35.320366666667 against 35.320367 and have them not be equal. I have yet to test this, but it should speed up importing duplicate location points since it will skip coordinates that are within about 1 meter of each other.	2025-05-21 15:45:16 -06:00
Matthew Holt	65268b5af9	Fix import job resumption	2025-05-21 12:22:43 -06:00
Matthew Holt	d0d76473fa	Relate sidecar motion pics from Google Photos; fix related entity display on item page - Somehow I totally forgot to relate sidecar motion photos in Google Photos. (They don't use sidecars on Google phones.) - Item page now displays entities in the picture even without face coordinates	2025-05-20 11:35:15 -06:00
Matthew Holt	d268486f55	Several import fixes; metadata merging - Quick unit tests for a function related to Google Takeout archives - We now combine existing metadata with new according to the update policy, instead of either writing all or none of incoming metadata. This merging happens before the DB update query and is a bit of a special case as the policy is applied per-key. - Special handling for corrupted timestamp in Google Photos data. This is a singular case I haven't observed more of, but seems like a reasonable heuristic. There might be thousands more out there, who knows. - Fix job creation time (milliseconds) - Hopefully make repeated imports faster by skipping duplicate items more intelligently based on update policies.	2025-05-19 12:47:18 -06:00
Matthew Holt	571d469a96	Fix some dashboard charts with future data Also improve bubble size scaling on the bubble plot	2025-05-17 17:18:41 -06:00
Matthew Holt	952057f1b5	Update profile picture in corner when entity 1 gets new picture Including when entity 1 is merged into	2025-05-17 16:58:03 -06:00
Matthew Holt	9caa54dce9	Ability to filter by existence of relation (or lack thereof) This is useful on the gallery page where we do NOT want to show motion pictures. We will also need to block motion pictures from being displayed as separate items on other UI views when they do show non-root items.	2025-05-16 17:35:54 -06:00
Matthew Holt	3e311d99c3	Sort data sources in import planner; rename some DS The sorting can help imports go faster if we put DB-heavy sources first, when the database is still small. The data source names were also standardized to use snake_case like most other word-IDs in the app.	2025-05-16 11:10:23 -06:00
Matthew Holt	0d26c6eb31	Fix several bugs - Obfuscation mode enabled would set a fake phone number in smsbackuprestore's DS options, which led to bad data. Now, the UI does not auto-fill that value. But that means we need... - SMS Backup & Restore: Phone number can now be inferred from repo owner in the backend, if ds opt phone number is empty. This works even with obfuscation enabled. - Aborting a scheduled job before it starts now stays aborted. (Unless you manually restart it.) - Added a data validation error modal for DS options on the import page. For now, if smsbackuprestore has no phone number set, and the timeline repo owner doesn't have a phone number, an error will be shown.	2025-05-15 16:53:35 -06:00
Matthew Holt	4838dbd7d3	Huh, gofmt failed me	2025-05-15 13:59:20 -06:00
Matthew Holt	812cfad74d	Modernize a few lines of code	2025-05-15 13:56:30 -06:00
Matthew Holt	cedfb68fe9	Allow resuming failed jobs	2025-05-15 07:53:15 -06:00
Matthew Holt	ac794cb5f3	Fix unnecessary item updating Ignore empty/zero-value metadata keys, and consider time+zone separately since they are stored separately in the DB.	2025-05-15 06:35:56 -06:00
Matthew Holt	3960415d97	Exclude null dates in chart queries Fixes surprise nulls that cause errors	2025-05-14 14:57:37 -06:00

1 2 3 4 5

202 commits