this post was submitted on 02 May 2025
17 points (87.0% liked)

Programming

20253 readers
993 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
 

hi my dears, I have an issue at work where we have to work with millions (150 mln~) of product data points. We are using SQL server because it was inhouse available for development. however using various tables growing beyond 10 mln the server becomes quite slow and waiting/buffer time becomes >7000ms/sec. which is tearing our complete setup of various microservices who read, write and delete from the tables continuously down. All the stackoverflow answers lead to - its complex. read a 2000 page book.

the thing is. my queries are not that complex. they simply go through the whole table to identify any duplicates which are not further processed then, because the processing takes time (which we thought would be the bottleneck). but the time savings to not process duplicates seems now probably less than that it takes to compare batches with the SQL table. the other culprit is that our server runs on a HDD which is with 150mb read and write per second probably on its edge.

the question is. is there a wizard move to bypass any of my restriction or is a change in the setup and algorithm inevitable?

edit: I know that my questions seems broad. but as I am new to database architecture I welcome any input and discussion since the topic itself is a lifetime know-how by itself. thanks for every feedbach.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Xanza@lemm.ee 24 points 2 weeks ago* (last edited 2 weeks ago) (2 children)

All the stackoverflow answers lead to - its complex. read a 2000 page book.

This is an exceptionally good answer and you're doing everything possible to avoid doing it, when you could have been half way done with the book by now probably. Database administration is a profession, not a job. It requires specialized training to do it well and doing everything possible to avoid that training and knowledge won't help you one bit.

my queries are not that complex.

It doesn't matter. Your database is very complex.

they simply go through the whole table to identify any duplicates

You search 10 million records on every request and you wonder why it's slow?

is there a wizard move to bypass any of my restriction or is a change in the setup and algorithm inevitable?

No. Database administration is very difficult. Reading that 2000 page book is essential for setting up infrastructure to avoid a monolithic setup like this in the first place.

the other culprit is that our server runs on a HDD which is with 150mb read and write per second probably on its edge.

lol wtf

Realistically, this setup is 10 years too old. How large is your database? Is there any reason why it can't be run in memory? 10 million lines isn't insurmountable. Full text with a moderate number of tables could be ~10GB--no reason that can't be run in memory with Redis or other in-memory database or to update to a more modern in-memory database solution like Dice.

Your biggest problem is the lack of deduplication and normalization in your database design. If it's not fixed now, it'll simply get worse YOY until it's unusable. Either spend the time and money now, or spend even more time and money later to fix it. ๐Ÿคทโ€โ™‚๏ธ

tl;dr: RTFM.

[โ€“] PoisonedPrisonPanda@discuss.tchncs.de 4 points 1 day ago (1 children)

Realistically, this setup is 10 years too old

thanks for this input. This was the winning argument for my boss for migrating to a modern server. While I admit that I see many flaws in our design, we are now working on refactoring our architecture and approach itself.

Thanks to the other numerous answers leading me to the right direction (hopefully).

[โ€“] Xanza@lemm.ee 2 points 1 day ago (1 children)

This was the winning argument for my boss for migrating to a modern server.

Exceptionally good news! Glad it's working out. Be sure to make a new post when you decide what you go with, I'm sure people here would enjoy hearing about your approach.

Well. For now the system is not yet running on the new hardware.

It is now a pondering process of whether migrating everything as it is to the new hardware and then optimize/refactor.

Or refactor before (or at least develop a plan) and then improve during migration....

Be sure to make a new post when you decide what you go with, Iโ€™m sure people here would enjoy hearing about your approach.

Nice to hear. Thanks. I will share updates.

[โ€“] msage@programming.dev 3 points 2 weeks ago (2 children)

Sort of harsh approach, but I get it.

Though I did learn the most while having a lot of data and had issues with performance.

Studying Postgres in that job was the absolute best part, I learned so much, and now I can't find a problem Postgres can't fix.

There was a running joke in my last office that I was paid to promote Pg because every time MySQL fucked something up, I would bring up how Postgres would solve it. I even did several presentations.

Then we migrated to Postgres and suddenly everything is stable as a rock, even under worse conditions and way more data.

I just love Postgres so much.

Sort of harsh approach, but I get it.

Yeah. To me it feels like we used a powertool as a hammer. Brute force in the wrong way.

As an update: I was able to convince my people to migrate to a modern server - altogether we also switch from SQL server to PostgreSQL. During this migration we also try to refactor our workflow since it was flawed by design.

So, many thanks for the input.

[โ€“] towerful@programming.dev 3 points 2 weeks ago (1 children)

Sometimes it feels like postgres is cheating (in a good way)

[โ€“] msage@programming.dev 2 points 2 weeks ago

Compared to MySQL most definitely.

Granted, Oracle has pushed some fresh air into it, but still it has a long way to go.