Investigate whether URLs in the file table could be hashed #4
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Changing this would save a bit of storage space. It would store a 64-bit integer (8 bytes storage + serial types), instead of a full URL (80-120 bytes storage + serial types).
When I was doing this in the reactions table, I analysed whether a birthday attack could happen. I should review these conclusions again for the file table. Also, tolerances could be sloppier on the reactions table. The file table may need larger safety margin, since it would be confusing if the wrong file appeared.
xxhash is not cryptographic. Specially crafted file names on Discord-side might be able to trick xxhash and use the wrong file.
Stats:
My bridge has been running for around 60 days with 2842 registered files. select sum(length(discord_url)) from file gives 270091 characters stored in URLs.
If I used a 64-bit integer instead, it would store 22736 bytes for all URLs. That's about a 90% reduction. Pretty neat!
The current approach costs 125 kb excess storage per month. That's not terrible.
Investigation:
This would remove the full URLs from the table. I won't be able to get them out again. If I make the switch, it will be impossible to look up a Discord URL by a MXC URL. I need to investigate if I'd ever need to do that.
In the past, it has been useful to compare the URLs in the file table against the IDs in the emoji table. If I make the switch, this will also become impossible. I should investigate this too and see if this is OK.