1
0
Fork 0
Commit graph

39 commits

Author SHA1 Message Date
Matthew Holt
c8cfe5001f
Fix more lint errors 2026-01-16 23:32:10 -07:00
Matthew Holt
962369382a
Hopefully appease new linter 2025-11-04 16:27:09 -07:00
Matthew Holt
19acc6cc11
Create WAL checkpoint after jobs 2025-10-23 13:20:20 -06:00
Matt Holt
41361913d1
ci: Attempt fix Linux workflows related to old libvips (#129)
Things would be so much easier if ubuntu just updated their packages

* ci: Attempt to fix Linux workflows

See if Copilot is worth its snuff

* Revert

* Try downgrading vipsgen instead

* Try again

* Try to install vips from source

* Sigh, ok try building from source

* Sigh

* sighhhh

* Sighhhhhhhhhhhhhhhhhhhh

* Try without cache for a moment

* Try caching again

* Try composite action

* Try again?

* Set shell on composite action steps...

* Update a couple other workflows

* Try to fix test job

* Some cleanup

* Add heif

* Oops

* Pointless comment but let's see if the cache worked

* Fix go builds

* Try installing pkg-config I guess?

* Try more pkg config paths?

* Tweak

* Are we there yet

* One more tweak

* Rename some things
2025-10-08 14:21:38 -06:00
Matthew Holt
e9a7c03c53
Fix ExFAT crashes; refactor sql.DB handling
The crashes on ExFAT are caused by a bug in the MacOS ExFAT driver. It is unclear whether other OSes are affected too.

https://github.com/mattn/go-sqlite3/issues/1355

We now utilize sqlite's concurrency features by creating a write pool (size 1) and a read pool, and can eliminate our own RWMutex, which prevents reads at the same time as writes. Sqlite's WAL mode allows reads concurrent with writes, and our code is much cleaner.

Still need to do similar for the thumbnail DB.

Also could look into using prepared statements for more efficiency gains.
2025-09-30 12:31:41 -06:00
Matthew Holt
2b5fd57259
Proper support for mixed timestamps and time zones
This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only.

Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app.

Can now try to infer time zones during import, which is the default setting.

This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.
2025-09-12 11:17:49 -06:00
Matthew Holt
b3376b5298
Fix pipeline bugs; rethink embeddings
Fixed several bugs introduced by the pipeline refactoring.

Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images.

Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc.

Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.
2025-09-04 21:40:50 -06:00
Matthew Holt
55053f8096
Fixes from job logging refactoring 2025-09-02 14:14:32 -06:00
Matt Holt
a85f47f1a3
Major processor refactor (#112)
* Major processor refactor

- New processing pipeline, vastly simplified
- Several edge case bug fixes related to Google Photos (but applies generally too)
- Major import speed improvements
- UI bug fixes
- Update dependencies

The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded.

In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor.

The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3.

The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases.

This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about.

On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size.

An import of about 200,000 text messages, including media attachments, finished in about 2 minutes.

My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB.

Also fixed several other minor bugs/edge cases.

This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge.

* Round out a few corners; fix some bugs

* Appease linter

* Try to fix linter again

* See if this works

* Try again

* See what actually fixed it

* See if allow list is necessary for replace in go.mod

* Ok fine just move it into place

* Refine retrieval keys a bit

* One more test
2025-09-02 11:18:39 -06:00
Matthew Holt
336ff7fae0
Fix new lint warnings
Must have been a change in golang-ci-lint
2025-07-01 15:41:07 -06:00
Matt Holt
def05a6cfa
Revise location processing and improve place entities (#101)
* Revise location processing and place entities

- New, more dynamic, recursive clustering algorithm
- Place entities are globally unique by name
- Higher spatial tolerance for coordinate attributes if entity name is the same (i.e. don't insert new attribute row for coordinate if it's sort of close to another row for that attribute -- but if name is different, then points have to be closer to not insert new attribute row)

There is still a bug where clustering is too aggressive on some data. Looking into it...

* Fix overly aggressive clustering

(...lots of commits that fixed the CI environment which changed things without warning...)
2025-06-17 16:13:44 -06:00
Matthew Holt
65268b5af9
Fix import job resumption 2025-05-21 12:22:43 -06:00
Matthew Holt
d268486f55
Several import fixes; metadata merging
- Quick unit tests for a function related to Google Takeout archives
- We now combine existing metadata with new according to the update policy, instead of either writing all or none of incoming metadata. This merging happens before the DB update query and is a bit of a special case as the policy is applied per-key.
- Special handling for corrupted timestamp in Google Photos data. This is a singular case I haven't observed more of, but seems like a reasonable heuristic. There might be thousands more out there, who knows.
- Fix job creation time (milliseconds)
- Hopefully make repeated imports faster by skipping duplicate items more intelligently based on update policies.
2025-05-19 12:47:18 -06:00
Matthew Holt
0d26c6eb31 Fix several bugs
- Obfuscation mode enabled would set a fake phone number in smsbackuprestore's DS options, which led to bad data. Now, the UI does not auto-fill that value. But that means we need...

- SMS Backup & Restore: Phone number can now be inferred from repo owner in the backend, if ds opt phone number is empty. This works even with obfuscation enabled.

- Aborting a scheduled job before it starts now stays aborted. (Unless you manually restart it.)

- Added a data validation error modal for DS options on the import page. For now, if smsbackuprestore has no phone number set, and the timeline repo owner doesn't have a phone number, an error will be shown.
2025-05-15 16:53:35 -06:00
Matthew Holt
cedfb68fe9
Allow resuming failed jobs 2025-05-15 07:53:15 -06:00
Matthew Holt
360e131fff
Recover panics during jobs/imports, and support base64 pics from vCard 2025-05-14 08:29:37 -06:00
Matthew Holt
15c55f0a8f
Improve pause/unpause behavior 2025-05-02 08:55:27 -06:00
Matthew Holt
72c8ede971
More improvements/fixes to thumbnail jobs 2025-05-01 22:18:50 -06:00
Matthew Holt
f0697d2d6b
Refactor embedding jobs; enhance tooltips; upgrade gofakeit to v7
The gofakeit upgrade uses the new math/rand/v2 package, which uses uint64 more than int64, so we had to change a bunch of row IDs from int64 to uint64.
2025-04-24 16:33:41 -06:00
Matthew Holt
ec87974576
Refactor thumbnails jobs to dynamically page through rows by import ID 2025-04-21 16:18:23 -06:00
Matthew Holt
b88485a84b
A few fixes/enhancements
googlelocation: Allow iOS on-device location filename to be renamed, but it should still contain "location-history" and be a .json file.

- Upgrade mapbox-gl-js to 3.11

- Run thumbnail+embedding jobs even if import failed; WIP
2025-04-19 13:44:51 -06:00
Matthew Holt
c20cf838d9
Fix data sources
Currently transitioning to letting them be dynamic
2025-02-07 13:08:55 -07:00
Matthew Holt
8c5d76dfad
Actually fix lint error 2025-02-07 12:08:25 -07:00
Matthew Holt
eb7717c843
Fix lint error; remove old build script
(Cross-compilation is now documented in our project wiki)
2025-02-07 12:02:47 -07:00
Matt Holt
35c5a63be4
Refactor python server code, update schema, rename config dir (#68)
* WIP

* Finish updating changes
2025-02-07 11:34:42 -07:00
Matthew Holt
1587a11bfb
Fix dropped messages in logs (fixes job-related UIs) 2025-01-15 12:03:03 -07:00
Matthew Holt
4bd08bd91c
WIP settings endpoints; use attr as alternate display name in messages 2024-12-31 10:09:20 -07:00
Matthew Holt
3d11d65b8d
WIP settings page; #map mobility; WIP interactive imports
Settings page is started; non-functional, but location picker works.

Moving maps between container elements is improved by moving to nearest to mouse pointer, rather than just most center to the viewport. It also emits an event when the map is moved, allowing us to change/reset map configurations for certain displays.

More progress on interactive imports. More thought is needed before continuing.

Upgraded Mapbox libraries.
2024-12-26 11:51:47 -07:00
Matthew Holt
ce297389b0
Thumbnail job streaming; WIP: interactive imports 2024-12-19 06:51:06 -07:00
Matthew Holt
e34667bcce
googlelocation: Fix checkpoints 2024-12-16 20:21:16 -07:00
Matthew Holt
d7b1d73796
Auto-resume jobs on start; improve checkpoint performance 2024-12-16 16:23:26 -07:00
Matthew Holt
a4d8bc923d
Data source checkpoints; refine import concurrency
And related improvements and fixes
2024-12-15 22:40:58 -07:00
Matthew Holt
024e9a4622
Implement job start/restart 2024-12-13 14:30:13 -07:00
Matthew Holt
fcaa238634
Implement pause/unpause 2024-12-13 13:02:06 -07:00
Matthew Holt
aa12d85c22
Fix job cancellation; wire up more of job UI 2024-12-10 23:13:14 -07:00
Matthew Holt
9ce1efa117
WIP live view of active jobs 2024-12-09 21:55:44 -07:00
Matthew Holt
063501c0f9
Fix stuck thumbnail loaders 2024-12-08 08:03:20 -07:00
Matthew Holt
37461545be
Fix fs.SkipDir usage; and minor bug in NMEA
fs.SkipDir documentation is a bit unclear: does it skip the remainder of files in the directory when returned from walking a file, or does it no-op on files and only skip going INTO dirs when on a dir?

I thought it was the latter, and thus, we didn't need to check whether the current DirEntry was a directory before returning (most commonly, when we are trying to skip hidden files/folders). But nope, it's the former -- SkipDir will skip the rest of the entries in the directory, which is NOT what we want. We just want to avoid going INTO a hidden directory in our case.

So unfortunately we now have to check IsDir() before returning.

Also fixed a slight bug with NMEA processing.
2024-12-07 21:16:36 -07:00
Matt Holt
746e5d6b5c
Refactored import flow, new import UI, thumbnails stored in timeline, etc. (close #3) (#43)
* Schema revisions for new import flow and thumbnails

* WIP settings

* WIP quick schema fix

* gallery: Image search using ML embeddings

Still very rough around the edges, but basically works.

'uv' gets auto-installed, but currently requires restarting Timelinize before it can be used.

Lots of tunings and optimizations are needed. There is much room for improvement.

Still migrating from imports -> jobs, so that part of the code and schema is still a mess.

* Implement search for similar items

* Finish import/planning rewrite; it compiles and tests pass

* Fix some bugs, probably introduce other bugs

* WIP new import planning page

* Fix Google Photos and Twitter recognition

* Finish most of import page UI; start button still WIP

* WIP: Start Import button

* Fixes to jobs, thumbnail job, import job, etc.

* Implement proper checkpointing support; jobs fixes
2024-12-06 11:03:29 -07:00