1
0
Fork 0
Commit graph

400 commits

Author SHA1 Message Date
Matt Holt
a85f47f1a3
Major processor refactor (#112)
* Major processor refactor

- New processing pipeline, vastly simplified
- Several edge case bug fixes related to Google Photos (but applies generally too)
- Major import speed improvements
- UI bug fixes
- Update dependencies

The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded.

In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor.

The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3.

The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases.

This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about.

On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size.

An import of about 200,000 text messages, including media attachments, finished in about 2 minutes.

My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB.

Also fixed several other minor bugs/edge cases.

This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge.

* Round out a few corners; fix some bugs

* Appease linter

* Try to fix linter again

* See if this works

* Try again

* See what actually fixed it

* See if allow list is necessary for replace in go.mod

* Ok fine just move it into place

* Refine retrieval keys a bit

* One more test
2025-09-02 11:18:39 -06:00
Matthew Holt
9554343b6f
Further tune heat map based on feedback and more sample data 2025-08-22 12:16:00 -06:00
Matthew Holt
1f73da0527
Fix lint errors 2025-08-21 15:39:36 -06:00
Matthew Holt
6c0abef275
Tuned heatmap to be more useful/accurate 2025-08-21 13:53:07 -06:00
Matthew Holt
3b670ff3f7
Allow opening timeline from parent folder
This is useful if a My Timeline subfolder is (sort-of) implicitly created for the user, and the user doesn't realize that is where their timeline is. They should be able to select the same folder to open the timeline as they did to create it.
2025-07-16 22:11:47 -06:00
Matthew Holt
a52fb35c4d
Data sources can honor job pauses; minor improvements to some errors, logs 2025-07-15 15:58:02 -06:00
Matthew Holt
b365dbbafc
Fix panics with obfuscation 2025-07-09 13:30:50 -06:00
Matthew Holt
eae7e1806d
Minor enhancements to logo/icon
Improves legibility and more optical balancing
2025-07-09 11:43:33 -06:00
Matthew Holt
1d59104ab7
Try using 8-bit color depth on Windows
Otherwise every encoded AVIF image gets a "10-bit colour depth not supported" on Windows.
2025-07-03 13:34:39 -06:00
Matthew Holt
d0f929cdc3
Like whack-a-mole with the linter 2025-07-01 15:46:32 -06:00
Matthew Holt
336ff7fae0
Fix new lint warnings
Must have been a change in golang-ci-lint
2025-07-01 15:41:07 -06:00
Matthew Holt
0758f9a588 Some map page improvements
- Hopefully (!?) fixed map element sizing bug on page load
- Hopefully (!?) fixed bug where polyline layers wouldn't render sometimes
- Added time labels between points
- Made marker tooltips/popups more informative, though they still require lots of work
- Made lines slightly more legible

I suspect there are still some weird/sporadic bugs in the map page... but it's harder to find them now. Not sure if good or bad, haha.
2025-07-01 14:00:57 -06:00
Nate Aune
6d5f44110e
conversations: Fix text search and add clear filter button (#107)
* Add search and filter functionality to conversations page

  - Add Search Conversations and Clear Filters buttons with icons
  - Implement text search support for conversation messages
  - Add event handlers for search button with loading feedback
  - Add clear filters functionality to reset all filter inputs
  - Support Enter key to trigger search from text input

* Remove unnecessary submit button

---------

Co-authored-by: Matthew Holt <mholt@users.noreply.github.com>
2025-06-27 13:27:33 -06:00
Matthew Holt
8388bb78b4
Fix typo 2025-06-20 08:01:37 -06:00
Matt Holt
f18d7cc5dd
WHY DOES CI KEEP BREAKING (#104)
* WHY DOES CI KEEP BREAKING

* aaaaaaa

* Oh, it's just a dynamic path

* Update comment
2025-06-19 16:23:21 -06:00
Matthew Holt
ebebafa68a ci: See if this fixes the workflow that worked 10 minutes ago?? 2025-06-19 15:54:07 -06:00
JP Hastings-Edrei
29f1ed3176
Importer for Flighty flight information (#90) 2025-06-19 15:05:18 -06:00
Sergio Rubio
5de93bbf85
Do not try to open a browser in headless mode (#93)
* Do not try to open a browser in headless mode

When running timelinize serve without a display/desktop, you get a
harmless error in the server log output:

Error: no DISPLAY environment variable specified

This comes from xdg-open trying to open the server URL.

* Move log

---------

Co-authored-by: Matthew Holt <mholt@users.noreply.github.com>
2025-06-19 09:23:51 -06:00
Matthew Holt
230fcb8583
Avoid inserting/updating with empty (not null) metadata 2025-06-19 09:10:18 -06:00
Matthew Holt
bf4b0950ab
WIP: Add semantic search features toggle to setup page
Not yet wired up.
2025-06-18 19:10:17 -06:00
Matthew Holt
9d75c2895f
Optimize entity loading in hot path of import job
Verified with EXPLAIN QUERY PLAN
2025-06-18 19:09:48 -06:00
Matthew Holt
bf7f0cdf3c
Upgrade python deps; get locale on Windows 2025-06-18 10:33:30 -06:00
Matthew Holt
056f813889
gpx: Mark place entity points as significant
Also still allow clustering significant points, since we do preserve them, the data source can just call ClusterPoints() to get it back...
2025-06-17 21:37:38 -06:00
Matthew Holt
1c14853317
Tune path simplification a little more 2025-06-17 16:43:22 -06:00
Matt Holt
def05a6cfa
Revise location processing and improve place entities (#101)
* Revise location processing and place entities

- New, more dynamic, recursive clustering algorithm
- Place entities are globally unique by name
- Higher spatial tolerance for coordinate attributes if entity name is the same (i.e. don't insert new attribute row for coordinate if it's sort of close to another row for that attribute -- but if name is different, then points have to be closer to not insert new attribute row)

There is still a bug where clustering is too aggressive on some data. Looking into it...

* Fix overly aggressive clustering

(...lots of commits that fixed the CI environment which changed things without warning...)
2025-06-17 16:13:44 -06:00
Henry Wilkinson
6b1618787c
Allow user to navigate to entity page for chat participants (#97) 2025-06-12 17:28:02 -06:00
Matthew Holt
4bec2e0b86
Fix lint, tweak email recognition a bit more 2025-06-10 11:19:36 -06:00
Matthew Holt
fa9ad482b3
Place entities from GPX sources; several other improvements/fixes
Location processing is still being revised (WIP).
2025-06-09 17:18:44 -06:00
Matthew Holt
457e3f48cc
Support env var for origin configuration 2025-06-02 16:58:44 -06:00
Matthew Holt
46f1c8f382 Navigate file picker when typing/pasting into path input 2025-06-02 16:52:00 -06:00
JP Hastings-Edrei
27a2f462cf
lint: bump golangci-lint version (#92)
* lint: bump golangci-lint version

- Bumps the version of golangci-lint that's used in the Github Action to be the most recent version (as installed with eg. `brew install golangci-lint` — v2.1.6)
- Migrates the `.golangci.toml` file, and manually moves the comments over
- `errchkjson` appears to work now, so added that back into the linter (the `forbidigo` and `goheader` linters I've left commented out)

* lint: remove checkers we don't like

Removes two static checkers that cause code changes we don't like.

* lint: remove old lint declaration

apparently `gosimple` isn't available any more, so I've removed its `nolint` declaration here.

* lint: swap location of `nolint:goconst`

This _seems_ to be an unstable declaration, because of he parallel & undeterministic nature of the linter. If this keeps causing trouble we can either remove the goconst linter, or change _both_ of these lines to hold `//nolint:goconst,nolintlint`.
2025-06-02 15:03:19 -06:00
Iris
e395b5cc32
Add dockerfile for dev environments (#78)
* feat: add dockerfile for dev environments

* feat: setup dev environment using dev containers

* fix: delete docker-compose-dev.yaml as well

* feat: add instructions for running project inside dev container
2025-06-02 14:57:06 -06:00
Matthew Holt
e6ac91870b Custom trusted origins (close #52) 2025-06-02 10:14:26 -06:00
Henry Wilkinson
54477b5163
Only run CI jobs if the PR is not in draft status (#98)
* Only run CI jobs if the PR is not in draft status

* Run on PR marked ready
2025-06-02 09:49:42 -06:00
Henry Wilkinson
1dc4e081af
timelinize → import (#95) 2025-06-01 20:54:22 -06:00
Matt Holt
c68f1f08e2
ci: Fix macOS builds and disable fail-fast
* See how wrong the AI is

* Man... the AI is just straight-up lying to me now

* Screw it

* grumble

* sadfsfd

* asdfsdf

* See what's actually necessary

* Test test

* Still pruning

* This better not work

* Hopefully final cleanup
2025-06-01 20:53:48 -06:00
Matthew Holt
31c575727c
apple_contacts: Improve recognition a bit 2025-05-31 07:06:33 -06:00
Matthew Holt
41ff81ceb6
Minor enhancements, fix howStored for items deduped by data file at end of pipeline 2025-05-30 16:20:26 -06:00
Matthew Holt
0c2b069e39
Bit of cleanup/comment enhancing 2025-05-30 11:42:18 -06:00
Matthew Holt
d4b71a35eb
Forgot to add IF NOT EXISTS to new indexes 2025-05-30 11:16:20 -06:00
Matthew Holt
ebc731d221
Vastly speed up imports ?? (WIP) 2025-05-30 11:14:09 -06:00
Matthew Holt
6aa6e5ee80
Reset job stats when closing repo
Avoids weird side-effects when switching repos and starting new jobs
2025-05-29 09:12:23 -06:00
Matthew Holt
31f003b3d4
Fix metadata updates for items and relationships
Also relocate data files if the item's timestamp changes
2025-05-28 18:09:46 -06:00
Matthew Holt
863d0e978b
Detect and handle corrupt timestamps a little better 2025-05-27 11:24:08 -06:00
Matthew Holt
dc30f0cf50
Render either face or body coords for "includes" relations 2025-05-26 22:02:10 -06:00
Matthew Holt
ab64f1eaee
googlephotos: Implement DS checkpoint 2025-05-26 07:51:31 -06:00
Matthew Holt
39afe39a27
Wow data out there be realllly bad 2025-05-25 12:51:21 -06:00
Matthew Holt
1bd7c2a5c8
Fix several bugs related to duplicates, lat/lon tolerances, etc.
Separate altitude out from latlon in unique constraints
2025-05-25 12:36:03 -06:00
Matthew Holt
2b586c56da
Treat lower precision input as unknown for coordinate uncertainty
Rather than treating them as significant 0s
2025-05-23 13:51:52 -06:00
Matthew Holt
4830650ded Couple of very minor UI fixes 2025-05-22 11:43:10 -06:00