this post was submitted on 02 May 2025
17 points (87.0% liked)
Programming
20342 readers
617 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Did this because of the convincing replies in this thread. Migrating to modern hardware and switch SQL server with PostgreSQL (because its used by the other system we work with already, and there is know-how available in this domain).
But how can we then ensure that I am not adding/processing products which are already in the "final" table, when I have no knowledge about ALL the products which are in this final table?
This is helpful and also what I experienced. In the peak of the period where the server was overloaded the CPU load was pretty much zero - all processing happened related to disk read/write. Which was because we implemented poor query design/architecture.
May you elaborate what you mean with read replicas? Storage in memory?
Yes, I will swallow the pill. but thanks to the replies here I have many starting points on where to start.
RTFM is nice - but starting with page 0 is overwhelming.
Without knowledge about your schema, I don't know enough to answer this. However, the database doesn't need to scan all rows in a table to check if a value exists if you can build an index on the relevant columns. If your products have some unique ID (or tuple of columns), then you can usually build an index on those values, which means the DB builds what is basically a lookup table for those indexed columns.
Without going into too much detail, you can think of an index as a way for a DB to make a "contains" (or "retrieve") operation drop from O(n) (check all rows) to some much faster speed like O(log n) for example. The tradeoff is that you need more space for the index now.
This comes with an added benefit that uniqueness constraints can be easily enforced on indexed columns if needed. And yes, your PK is indexed by default.
Read more about index in Postgres's docs. It actually has pretty readable documentation from my experience. Or read a book on indexes, or a video, etc. The concept is universal.
This highly depends on your needs. I'll link PG's docs on replication though.
If you're migrating right now, I wouldn't think about this too much. Replicas basically are duplicates of your database hosted on different servers (ideally in different warehouses, or even different regions if possible). Replicas work together to stay in sync, but depending on the kind of replica and the kind of query, any replica may be able to handle an incoming query (rather than a single central database).
If all you need are backups though, then replicas could be overkill. Either way, you definitely don't want prod data all stored in a single machine, usually. I would talk to your management about backup requirements and potentially availability/uptime requirements.