1
0
Fork 0
Commit graph

3 commits

Author SHA1 Message Date
Matthew Holt
962369382a
Hopefully appease new linter 2025-11-04 16:27:09 -07:00
JP Hastings-Edrei
855a0a702b
whatsapp: Fix tests & metadata keys (#88)
Well this is embarrassing, I forgot to actually test the metadata _and_ the keys emitted weren't correct!
2025-05-07 11:22:41 -06:00
JP Hastings-Edrei
2407333482
Add WhatsApp importer (#79)
* Add WhatsApp importer

A first pass at importing WhatsApp chat exports.

Some open questions:
- Do we want to import context messages ("you deleted this message")?
- In WhatsApp its possible to have groups with the same participants but a different group name. Is it possible to tag a conversation with a "group name" in Timelinize? If not, this may end up with different conversations being interleaved.
- Is it safe to assume the current location for timezone analysis on import? WhatsApp exports use timezoneless timestamps, which (I've confirmed manually) are just "what the time would have been where you are now" (for me, messages sent in summer are in BST, and in winter are GMT)

Annoying quirks of the export format we should find good ways to communicate to users:
- Any caption text sent with an attachment isn't exported by WhatsApp. (The text is lost and unavailable to Timelinize — I've opened a bug with Meta, for all the good that'll do)
- If there are silent members of a group chat, their presence isn't recorded in the data WhatsApp exports

Todo:
- I _think_ it's safe to assume there's only ever one attachment per message, this would change & simplify the way I parse attachment lines. I'll keep exploring my own exports to identify if this is reasonable.

* Include polls & locations in tests

Polls are currently ignored, but I'll move them to being imported as a message, or as some special datatype, after discussion.

* Add text formatting examples, and show they're not processed

* Fix lint issues

* WhatsApp: Add Retrieval keys to messages

The key on the message isn't perfect, as it'll change if the person exporting their chat history has changed the name of one of the participants between exports (this would mean that participant's name would be different between exports, and their retrieval key would be different).

This seems as close as we can get without exported IDs though.

(I can't find a good way to test that the retrieval key is set properly)

* WhatsApp: Polls, Locations, Metadata

- Correctly parses attachments (even those which have been omitted, as not being available on the device that performed the export)
- Parses Polls (only in English, for now), including adding metadata for the Poll
- Extracts location metadata (Foursquare ID for named locations, or Lat/Long)
- Adds more test data to demonstrate other kinds of messages included in exports

* WhatsApp: Handle other locales

- 🤦‍♂️ The timestamp format changes based on the locale of the device performing the export — which makes accurate extraction of dates impossible between DD/MM/YYYY and MM/DD/YYYY dates. This parser will assume DD/MM/YYYY date if the last set of digits is 4 long. Perhaps we need an import option for "I'm using American dates"?
- Swaps the Poll scraping structure to allow for the localised words used when the exporting phone is set to other locales (eg. OPCIÓN instead of OPTION)
- Added a chat line test fixture to illustrate this (though normally the entire file would only ever be in a single locale)

* WhatsApp: Correct Poll Structure & fix parsing

I had incorrect POLL lines in the test fixtures; this commit fixes them, and the importer so it can read them properly.

* Use snake case for datasource name

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>

* WhatsApp: Be cautious with matching

Be slightly less confident with matching `_chat.txt` files as WhatsApp exports!

* WhatsApp: Fix lint errors

Fix magic number linting errors

* WhatsApp: swap metadata namespaces

Switch to using "Pin" instead of "Location" to more accurately describe what's being tagged with the metadata.

---------

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>
2025-05-07 09:21:39 -06:00