cobertos/timelinize

Author	SHA1	Message	Date
Ralf Ebert	8218ab0a39	Don't use time_offset when searching items (#172 ) Fix gallery photo prev/next for photos with time zone When jumping between photos in the gallery with the prev/next button, the buttons were incorrectly enabled/disabled/wouldn't work even if enabled for me. This was caused by the photos having a time zone UTC+1 (time_offset = 3600). The js frontend uses formatted timestamps including the time zone (peekFromItem.timestamp in galleryFilterParams). These are converted to a unix timestamp via UTC().UnixMilli(), so the time zone is already taken care for and adding time_offset is not necessary. For the filtering in the gallery grid itself, a unix timestamp is generated client-side (something like &start=1762124400&end=1762210800) taking the system time zone into account, so this is correct without any extra logic as well.	2025-11-13 10:50:46 -07:00
Matthew Holt	e9a7c03c53	Fix ExFAT crashes; refactor sql.DB handling The crashes on ExFAT are caused by a bug in the MacOS ExFAT driver. It is unclear whether other OSes are affected too. https://github.com/mattn/go-sqlite3/issues/1355 We now utilize sqlite's concurrency features by creating a write pool (size 1) and a read pool, and can eliminate our own RWMutex, which prevents reads at the same time as writes. Sqlite's WAL mode allows reads concurrent with writes, and our code is much cleaner. Still need to do similar for the thumbnail DB. Also could look into using prepared statements for more efficiency gains.	2025-09-30 12:31:41 -06:00
Matthew Holt	039dfe5ba8	Fix and optimize entity processing; faster imports Some certain rare edge cases were problematic, like when importing a contact list / vcard dataset after importing multiple messaging data sets, and there are entities with multiple phone numbers... That, and a few other things are handled better. The loadEntities query has been cleaned up and corrected. I got rid of autolink stuff with entity_attributes in the DB because it was not useful or really correct either. Added complexity causing bugs. Imports are sometimes about 20-50% faster now.	2025-09-25 22:49:39 -06:00
Matthew Holt	2b5fd57259	Proper support for mixed timestamps and time zones This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only. Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app. Can now try to infer time zones during import, which is the default setting. This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.	2025-09-12 11:17:49 -06:00
Matthew Holt	b3376b5298	Fix pipeline bugs; rethink embeddings Fixed several bugs introduced by the pipeline refactoring. Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images. Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc. Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.	2025-09-04 21:40:50 -06:00
Matt Holt	a85f47f1a3	Major processor refactor (#112 ) * Major processor refactor - New processing pipeline, vastly simplified - Several edge case bug fixes related to Google Photos (but applies generally too) - Major import speed improvements - UI bug fixes - Update dependencies The previous 3-phase pipeline would first check for an existing row in the DB, then decide what to do (insert, update, skip, etc.), then would download data file, then would update the row and apply lots of logic to see if the row was a duplicate, etc. Very messy, actually. The reason was to avoid downloading files that may not need to be downloaded. In practice, the data almost always needs to be downloaded, and I had to keep hacking on the pipeline to handle edge cases related to concurrency and not having the data in many cases while making decisions regarding the item/row. I was able to get all the tests to pass until the final boss, an edge case bug in Google Photos -- but a very important one that happened to be exposed by my wedding album, of all things -- exhibited, I was unable to fix the problem without a rewrite of the processor. The problem was that Google Photos splits the data and metadata into separate files, and sometimes separate archives. The filename is in the metadata, and worse yet, there are duplicates if the media appears in different albums/folders, where the only way to know they're a duplicate is by filename+content. Retrieval keys just weren't enough to solve this, and I narrowed it down to a design flaw in the processor. That flaw was downloading the data files in phase 2, after making the decisions about how to handle the item in phase 1, then having to re-apply decision logic in phase 3. The new processing pipeline downloads the data up front in phase 1 (and there's a phase 0 that splits out some validation/sanitization logic, but is of no major consequence). This can run concurrently for the whole batch. Then in phase 2, we obtain an exclusive write lock on the DB and, now that we have ALL the item information available, we can check for existing row, make decisions on what to do, even rename/move the data file if needed, all in one phase, rather than split across 2 separate phases. This simpler pipeline still has lots of nuance, but in my testing, imports run much faster! And the code is easy to reason about. On my system (which is quite fast), I was able to import most kinds of data at a rate of over 2,000 items per second. And for media like Google Photos, it's a 10x increase from before thanks to the concurrency in phase 1: up from about 3-5/second to around 30-50/second, depending on file size. An import of about 200,000 text messages, including media attachments, finished in about 2 minutes. My Google Photos library, which used to take almost a whole day, now takes only a couple hours to import. And that's over USB. Also fixed several other minor bugs/edge cases. This is a WIP. Some more cleanup and fixes are coming. For example, my solution to fix the Google Photos import bug is currently hard-coded (it happens to work for everything else so far, but is not a good general solution). So I need to implement a general fix for that before this is ready to merge. * Round out a few corners; fix some bugs * Appease linter * Try to fix linter again * See if this works * Try again * See what actually fixed it * See if allow list is necessary for replace in go.mod * Ok fine just move it into place * Refine retrieval keys a bit * One more test	2025-09-02 11:18:39 -06:00
Matthew Holt	b365dbbafc	Fix panics with obfuscation	2025-07-09 13:30:50 -06:00
Matthew Holt	336ff7fae0	Fix new lint warnings Must have been a change in golang-ci-lint	2025-07-01 15:41:07 -06:00
Matt Holt	def05a6cfa	Revise location processing and improve place entities (#101 ) * Revise location processing and place entities - New, more dynamic, recursive clustering algorithm - Place entities are globally unique by name - Higher spatial tolerance for coordinate attributes if entity name is the same (i.e. don't insert new attribute row for coordinate if it's sort of close to another row for that attribute -- but if name is different, then points have to be closer to not insert new attribute row) There is still a bug where clustering is too aggressive on some data. Looking into it... * Fix overly aggressive clustering (...lots of commits that fixed the CI environment which changed things without warning...)	2025-06-17 16:13:44 -06:00
JP Hastings-Edrei	27a2f462cf	lint: bump golangci-lint version (#92 ) * lint: bump golangci-lint version - Bumps the version of golangci-lint that's used in the Github Action to be the most recent version (as installed with eg. `brew install golangci-lint` — v2.1.6) - Migrates the `.golangci.toml` file, and manually moves the comments over - `errchkjson` appears to work now, so added that back into the linter (the `forbidigo` and `goheader` linters I've left commented out) * lint: remove checkers we don't like Removes two static checkers that cause code changes we don't like. * lint: remove old lint declaration apparently `gosimple` isn't available any more, so I've removed its `nolint` declaration here. * lint: swap location of `nolint:goconst` This _seems_ to be an unstable declaration, because of he parallel & undeterministic nature of the linter. If this keeps causing trouble we can either remove the goconst linter, or change _both_ of these lines to hold `//nolint:goconst,nolintlint`.	2025-06-02 15:03:19 -06:00
Matthew Holt	9caa54dce9	Ability to filter by existence of relation (or lack thereof) This is useful on the gallery page where we do NOT want to show motion pictures. We will also need to block motion pictures from being displayed as separate items on other UI views when they do show non-root items.	2025-05-16 17:35:54 -06:00
Matthew Holt	914f24f6a6	Move calls to pythonServerReady	2025-04-28 21:08:16 -06:00
Matthew Holt	f0697d2d6b	Refactor embedding jobs; enhance tooltips; upgrade gofakeit to v7 The gofakeit upgrade uses the new math/rand/v2 package, which uses uint64 more than int64, so we had to change a bunch of row IDs from int64 to uint64.	2025-04-24 16:33:41 -06:00
Matthew Holt	b4d4f4b88e	Variety of minor UI enhancements	2025-04-13 22:51:30 -06:00
Matthew Holt	6d231fd0c2	Improved embeddings with SigLIP2; fix semantic search bug Still lots of room for improvement here, but I see way better results already.	2025-04-13 15:38:14 -06:00
Matthew Holt	a4ec710f9f	Various fixes and improvements (imessage, duplicate rows, etc)	2025-01-21 10:16:38 -07:00
Matthew Holt	bb9151628f	WIP: new entity page; delete almost all DB indexes Imports are now 4-5x faster and queries are still just about as fast. New indexes should only be created after proving their usefulness.	2025-01-07 13:42:05 -07:00
Matthew Holt	5ec783ecb8	Coupla fixes; also file picker now remembers last path	2025-01-02 20:33:50 -07:00
Matthew Holt	5816c571c6	WIP settings: save is starting to function	2024-12-31 20:58:07 -07:00
Matthew Holt	cbaa39b1b9	Implement proper ANALYZE maintenance	2024-12-13 20:55:22 -07:00
Matthew Holt	22628833a7	Refactor obfuscation mode and some processing logic	2024-12-13 07:19:27 -07:00
Matthew Holt	5844c5755b	Fix most (all?) lint warnings	2024-12-11 18:59:24 -07:00
Matthew Holt	53ca6063ab	Several fixes, performance improvements	2024-12-07 12:36:42 -07:00
Matt Holt	746e5d6b5c	Refactored import flow, new import UI, thumbnails stored in timeline, etc. (close #3 ) (#43 ) * Schema revisions for new import flow and thumbnails * WIP settings * WIP quick schema fix * gallery: Image search using ML embeddings Still very rough around the edges, but basically works. 'uv' gets auto-installed, but currently requires restarting Timelinize before it can be used. Lots of tunings and optimizations are needed. There is much room for improvement. Still migrating from imports -> jobs, so that part of the code and schema is still a mess. * Implement search for similar items * Finish import/planning rewrite; it compiles and tests pass * Fix some bugs, probably introduce other bugs * WIP new import planning page * Fix Google Photos and Twitter recognition * Finish most of import page UI; start button still WIP * WIP: Start Import button * Fixes to jobs, thumbnail job, import job, etc. * Implement proper checkpointing support; jobs fixes	2024-12-06 11:03:29 -07:00
Matthew Holt	3066ddbeb9	Major linting overhaul I've addressed most of the "fast" linters errors locally in my editor. Some linters are broken or buggy.	2024-08-29 16:43:52 -06:00
Matthew Holt	1daf6f4157	Initial open source commit	2024-08-11 08:02:27 -06:00

26 commits