Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Designing DB partitions you don't have to babysit (explainanalyze.com)

57 points by rtolkachev 3 days ago | 8 comments

yasaheblasa 5 hours ago [-]

This is a topic that interests me a lot but there's a lot I find surprising since I finally started working with postgres dependent apps. Why for example is the id a good primary key? Joins are not uncommon, but I don't have anyone searching on id in my application and it is not even supposed to be user visible. I would think every possible user search would look at all partitions indexes if I did this instead of creation date.

jagged-chisel 2 hours ago [-]

Allow me to clarify a bit more.

You “search” for a record (perhaps based on username [and then you verify the password hash, but I digress]) and now you know the ID. Carry the ID in a session (the app doesn’t display this) and you can modify this specific user’s record. This is especially useful if you allow editing fields on which searches happen. Want to change a username? Great, you can.

If the username is your primary key, and you allow users to change them, there are edge cases and nuances and headaches … just use something immutable to identify records - autoincremented int, UUID, etc

yasaheblasa 4 minutes ago [-]

I'll certainly mull this over. I think the point of including the creation time in the primary key was to eliminate lookups in old partitions, I.e. if an active order can only be 90 days old it can only span 2 year based partitions. A cache of what id is 91 days old would seem similar but is even more partition details affecting app queries than saying 90 days directly. Similarly, I don't cache ids in any sort of session, an API call reusing cache is another kind of layer violation I don't want to get into.

jagged-chisel 4 hours ago [-]

Your primary key is there to uniquely identify a record. You need additional indexes on fields you will search on.

onchainbuilder 16 minutes ago [-]

[flagged]

djfobbz 6 hours ago [-]

Or you could just warehouse the daily data into something like ClickHouse and start fresh every day. It's built for this kind of workload and has demonstrated some absolutely insane analytical performance at massive scale. We're currently running it on an $170/month VPS, querying over 500+ billion rows daily without any issues. At that point, partitioning an ever-growing OLTP table starts looking like the harder problem.

mamcx 4 hours ago [-]

With the other comment about kdb+ and this show a misread of the article: The article AND ANY use of a columnar DB show the same issue: Point lookups are badly pessimized with both partitions on other columns and/or columnar.

That is why columnar DBs, not matter how impressive, are not used by OLTP workloads (and here this article point you can undo the advantages without pay attention at the consequences of partitions).

djfobbz 3 hours ago [-]

[flagged]

piterrro 6 hours ago [-]

For anyone interested in the topic I suggest reading about snowflake id [https://en.wikipedia.org/wiki/Snowflake_ID] or uuid7 the patterns from the article translate cleanly. The bigint is 64 bytes where uuid is 128. There are other caveats but its all about tradeoffs.

leprechaun1066 6 hours ago [-]

Or just use kdb+ and 1bn rows a day is par for the course.

Rendered at 23:57:46 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.