this post was submitted on 07 May 2025
23 points (92.6% liked)

retrocomputing

5077 readers
11 users here now

Discussions on vintage and retrocomputing

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] xyzzy@lemm.ee 17 points 1 month ago

The 1B parameter version of Llama 3.2 showed even slower results at 0.0093 tokens per second, based on the partial model run with data stored on disk.

I mean, cool? They got a C interface library to compile using an older C standard, and the 1B model predictably runs like trash. It will take hours to do anything meaningful at that rate.