1
0
Fork 0
Commit graph

26 commits

Author SHA1 Message Date
Matthew Holt
c8cfe5001f
Fix more lint errors 2026-01-16 23:32:10 -07:00
Dominik Roszkowski
d0cd6f3ce9
googlelocation: add Polish translation for 'Timeline' when recognizing on-device export (#143)
Small change to handle Google Location export when using Polish langauge
2025-10-13 10:22:27 -06:00
Matthew Holt
6354e9740f
googlelocation: Improve locale support for on-device export names
This is far from a complete list, but for now let's try to get most of the user base covered and see how that goes.

Close #111
2025-10-09 17:06:40 -06:00
Matthew Holt
e44daa85df
Refactor location processing options
Add clustering strength parameter
2025-09-20 22:18:40 -06:00
Matt Holt
a85f47f1a3
Major processor refactor (#112)
* Major processor refactor

- New processing pipeline, vastly simplified
- Several edge case bug fixes related to Google Photos (but applies generally too)
- Major import speed improvements
- UI bug fixes
- Update dependencies

The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded.

In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor.

The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3.

The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases.

This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about.

On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size.

An import of about 200,000 text messages, including media attachments, finished in about 2 minutes.

My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB.

Also fixed several other minor bugs/edge cases.

This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge.

* Round out a few corners; fix some bugs

* Appease linter

* Try to fix linter again

* See if this works

* Try again

* See what actually fixed it

* See if allow list is necessary for replace in go.mod

* Ok fine just move it into place

* Refine retrieval keys a bit

* One more test
2025-09-02 11:18:39 -06:00
Matthew Holt
056f813889
gpx: Mark place entity points as significant
Also still allow clustering significant points, since we do preserve them, the data source can just call ClusterPoints() to get it back...
2025-06-17 21:37:38 -06:00
Matthew Holt
1c14853317
Tune path simplification a little more 2025-06-17 16:43:22 -06:00
Matt Holt
def05a6cfa
Revise location processing and improve place entities (#101)
* Revise location processing and place entities

- New, more dynamic, recursive clustering algorithm
- Place entities are globally unique by name
- Higher spatial tolerance for coordinate attributes if entity name is the same (i.e. don't insert new attribute row for coordinate if it's sort of close to another row for that attribute -- but if name is different, then points have to be closer to not insert new attribute row)

There is still a bug where clustering is too aggressive on some data. Looking into it...

* Fix overly aggressive clustering

(...lots of commits that fixed the CI environment which changed things without warning...)
2025-06-17 16:13:44 -06:00
Matthew Holt
4bec2e0b86
Fix lint, tweak email recognition a bit more 2025-06-10 11:19:36 -06:00
Matthew Holt
fa9ad482b3
Place entities from GPX sources; several other improvements/fixes
Location processing is still being revised (WIP).
2025-06-09 17:18:44 -06:00
Matthew Holt
3e311d99c3
Sort data sources in import planner; rename some DS
The sorting can help imports go faster if we put DB-heavy sources first, when the database is still small.

The data source names were also standardized to use snake_case like most other word-IDs in the app.
2025-05-16 11:10:23 -06:00
Matthew Holt
f0697d2d6b
Refactor embedding jobs; enhance tooltips; upgrade gofakeit to v7
The gofakeit upgrade uses the new math/rand/v2 package, which uses uint64 more than int64, so we had to change a bunch of row IDs from int64 to uint64.
2025-04-24 16:33:41 -06:00
Matthew Holt
b88485a84b
A few fixes/enhancements
googlelocation: Allow iOS on-device location filename to be renamed, but it should still contain "location-history" and be a .json file.

- Upgrade mapbox-gl-js to 3.11

- Run thumbnail+embedding jobs even if import failed; WIP
2025-04-19 13:44:51 -06:00
Matthew Holt
73196f51ae
Refactor DirEntry, fix some bugs
Remove TopDir* functions, they aren't really relevant with our new import planner.
2025-04-02 21:52:49 -06:00
Matthew Holt
d18c4cd2c8
googlelocation: Fix on-device Android format handling 2025-02-05 13:52:20 -07:00
Matthew Holt
8437a38746
googlelocation: Fix longitude 2025-01-30 14:46:42 -07:00
Matthew Holt
c1a9abb74b
googlelocation: Support on-device Android 2025 format
(Thanks to those who helped in Discord!)
2025-01-30 13:08:26 -07:00
Matthew Holt
e34667bcce
googlelocation: Fix checkpoints 2024-12-16 20:21:16 -07:00
Matthew Holt
d7b1d73796
Auto-resume jobs on start; improve checkpoint performance 2024-12-16 16:23:26 -07:00
Matthew Holt
e319e1f60d
Checkpoints for Google Location; minor jobs fix 2024-12-16 10:57:34 -07:00
Matt Holt
746e5d6b5c
Refactored import flow, new import UI, thumbnails stored in timeline, etc. (close #3) (#43)
* Schema revisions for new import flow and thumbnails

* WIP settings

* WIP quick schema fix

* gallery: Image search using ML embeddings

Still very rough around the edges, but basically works.

'uv' gets auto-installed, but currently requires restarting Timelinize before it can be used.

Lots of tunings and optimizations are needed. There is much room for improvement.

Still migrating from imports -> jobs, so that part of the code and schema is still a mess.

* Implement search for similar items

* Finish import/planning rewrite; it compiles and tests pass

* Fix some bugs, probably introduce other bugs

* WIP new import planning page

* Fix Google Photos and Twitter recognition

* Finish most of import page UI; start button still WIP

* WIP: Start Import button

* Fixes to jobs, thumbnail job, import job, etc.

* Implement proper checkpointing support; jobs fixes
2024-12-06 11:03:29 -07:00
Matthew Holt
3066ddbeb9
Major linting overhaul
I've addressed most of the "fast" linters errors locally in my editor.

Some linters are broken or buggy.
2024-08-29 16:43:52 -06:00
Matthew Holt
21d5a2ed8e
chore: Fix some lint errors (add package comments) 2024-08-28 16:05:43 -06:00
Matthew Holt
a581abf765
geojson: Accept noncompliant positions; refactor & fix bug (fix #23) 2024-08-16 15:05:25 -06:00
Matthew Holt
e296b73d2d
googlelocation: Support on-device location history (close #20) 2024-08-14 16:31:27 -06:00
Matthew Holt
1daf6f4157
Initial open source commit 2024-08-11 08:02:27 -06:00