base-data-manager/test/fixtures/discord-json-2021-01.md

2.1 KiB

discord-json-2021-01

Manual edits

  • images -> placeholders
    • accounts/avatar.png
  • manually scrub folder names
    • account/applications/0000000000000

Notes about files

  • activity/
    • All the .json are NDJSON so some json tools don't like them
    • Massive files. They hang scrub.ts for a long long time (had to run these piecemeal)
    • These files also have an incredible amount of shapes and variance.
      • Instead of outputing all the shapes I made a sort of "super-object" to capture the shape with jq -n '[inputs] | add' events-2021-00000-of-00001.json.tmp > unique_shape.json and then scrubbing unique_shape.json
  • messages/
    • I hand did these to keep all the ids the same
    • There are multiple types of chats. DMs, guild channels, etc
    • I hand did the csvs as I have no scrubber for that
    • These are only THE EXPORTING USERS MESSAGES, no other user, just fyi
    • Ids in messages.csv are just the id of the message, not of any user
    • There is the potential to derive missing info from a channel via @ tags sent or possibly via attachments. Maybe...
    • 11111111111111111
      • This one has a shorter id (it's an older one)
      • Has type: 0 but there's no guild information in channel.json
      • The user name was null in index.json
      • It's a really odd one
    • 222222222222222222
      • This was a dm channel (said direct message with xxx#7777 in index.json)
      • Has type: 1 and there are two recipients (just the ids) in channel.json
      • Unfortunately that's all the info in the export
    • 333333333333333333
      • This was a normal guild channel
      • type: 0 and there's guild information in channel.json
      • I kept a good set of messages around from this one to show how attachements and other stuff works
      • The last message seemed to be a link not as an attachment. Links just seem to be normal text
  • programs/
    • was empty...
  • `servers/``
    • Info about some of the guilds we have ids for
    • guild.json didn't really contain anything except the name
    • I kept around the only guild I noticed an audit-log.json with info in it