The refactored processor had a bug where small, binary data files like images < 100 KB would be buffered entirely while peeking, and wouldn't end up being saved as a file. Fixed the logic around that and simplified a bit too.
- iCloud logo was just a rasterized SVG, derp. Now it's a true SVG. Much smaller.
- Contact lists and vcards are pretty slow to import due to downloading profile pictures, so move those last
The rendering seems inconsistent. If I refresh the page or load the results again, it fixes the color mismatches. I can't explain why they vary like this, other than potential mapbox bugs??
This does away with the experimental generation of thumbhashes during import. It's easier to generate the thumbnails and thumbhashes at the same time.
Does add a DB lock to phase1, but at this point the DB isn't the bottleneck in that phase.
We now import check-ins and tagged places, so, mainly location-related data.
Some of the concepts still need a good mapping to the project, e.g. whether to put the coordinates as the item (i.e. "I was here") or with the entity (i.e. "this place is here").
The code that imported post attachments is some of the oldest code in the base, about 10 years ago. Back then, an item could have both text and file contents, and we just combined all the various attachment data for an attachment into one related item. Now, we properly treat them as separate.
Still need to add posts in groups and FB stories.
Some directories, if they had an implicit structure in the archive, would be skipped on walks after the first walk. Oops.
This should actually get all your data now.
This will be a long-time WIP, but we now support full timestamps with local time offsets, absolute ones with UTC times only, and wall times only.
Several other fixes/enhancements. Making an effort to display time zone in time displays throughout the app.
Can now try to infer time zones during import, which is the default setting.
This will take a while to fully implement but it's a good start. Just have to be really careful about date crafting/manipulation/parsing.
I don't love that the type has to be stored in the table... it would be great if we could infer it, but I don't know how that would work for strings that look like another type.
Fixed several bugs introduced by the pipeline refactoring.
Updated goexif2 fork to use my latest commit which fixes not being able to find EXIF data on some JPEG images.
Embeddings now refer to the item they are for, rather than an item referring to a single embedding. This allows items to have multiple embeddings if necessary, which gives us some flexibility when models change/improve, etc.
Also reworked the Python server to use a smaller model (base siglip2 instead of so400m) so that it will fit on more GPUs, including my 4070; as well as a new "DeviceManager" that ChatGPT helped me figure out, to choose GPU when it has enough memory for it, as conditions change.