this post was submitted on 26 Feb 2026
397 points (99.0% liked)

Technology

81933 readers
3158 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Zorque@lemmy.world 30 points 1 day ago (1 children)

That implies it'd even be available to consumers.

[–] Trilogy3452@lemmy.world 3 points 1 day ago (2 children)

Or even be released in any form (I'm assuming here AI doesn't use DDR memory but some oyher related type)

[–] PabloSexcrowbar@piefed.social 7 points 22 hours ago (1 children)

Fingers crossed that we might see HBM as system RAM finally.

[–] tal@lemmy.today 1 points 2 hours ago

I'm not sure that memory


and I'm speaking more-broadly than HBM, even


optimized for running neural nets on parallel compute hardware and memory optimized for conventional CPUs overlap all that much. I think that, setting aside the HBM question, that if we long-term wind up with dedicated parallel compute hardware running neural nets, that we may very well wind up with different sorts of memory optimized for different things.

So, if you're running neural nets, you have extremely predictable access patterns. Software could tell you what its next 10 GB of accesses to the neural net are going to be. That means that latency is basically a total non-factor for neural net memory, because the software can request it in huge batches and do other things in the meantime.

That's not the case for a lot of the memory used for, say, playing video games. Part of the reason, aside from hardware vendors using price discrimination that PCs (as opposed to servers) don't use registered memory (which makes it easier to handle more memory) is because it increases latency a little bit, which is bad when you're running software where you don't know what memory you're going to need next and have a critical path that relies on that memory.

On the other hand, parallel compute hardware doing neural nets are extremely sensitive to bandwidth. They want as much as they can possibly get, and that's where the bottleneck is today for them. Back on your home computer, a lot of software is oriented around doing operations in serial, and that's more prone to not saturate the memory bus.

I'd bet that neural net parallel compute hardware does way more reading than writing of memory, because edge weights don't change at runtime (on current models! That could change!).

searches

Yeah.

https://arxiv.org/html/2501.09605v1

AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read bandwidth, and also has significant energy per bit overheads. It is also expensive, with lower yield than DRAM due to manufacturing complexity.

But there are probably a lot of workloads where your CPU wants to do a ton of writes.

I'd bet that cache coherency isn't a huge issue for neural net parallel compute hardware, because it's going to be a while until any value computed by one part of the hardware is needed again, until we reach the point where we can parallel-compute an entire layer at one go (which...I suppose we could theoretically do. Someone just posted something that I commented on about someone making an ASIC with Llama edge weights hard-coded into the silicon, which probably is a step in that direction). But with CPUs, a big problem is making sure that a value written by one CPU core reaches another CPU core, that the second doesn't use a stale value. That's gonna impact the kind of memory controller design that's optimal.

[–] EtherWhack@lemmy.world 4 points 22 hours ago (1 children)

Not sure if the AI side is any different, but all the data and compute servers I've built used DDR DIMMs, only they needed to be ECC to even POST.

[–] Dultas@lemmy.world 3 points 18 hours ago

If it was server hardware they probably needed more than just ECC. They were probably RDIMM or LRDIMM which have built in registers to handle addressing larger amounts of memory.