LLaMA Now Goes Faster on CPUs (justine.lol)

submitted 5 months ago by ylai@lemmy.ml to c/localllama@sh.itjust.works

1 comments fedilink hide all child comments

top 1 comments

sorted by: hot top controversial new old

[-] fhein@lemmy.world 15 points 5 months ago* (last edited 5 months ago)

Very nice speedups for people running CPU inference on supported hardware, but unfortunately does not help CPU+GPU split according to comment on one of the PRs.. That person says that for prompt evaluation, where these kernels would make a difference, llama.cpp performs all the calculations on the GPU. And during token generation it is IO-bound, so the faster CPU calculation becomes negligible.

this post was submitted on 08 Apr 2024

37 points (100.0% liked)

LocalLLaMA

2218 readers

2 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works