this post was submitted on 22 May 2026
673 points (98.8% liked)

Technology

84875 readers
3165 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments

I am using llamma.cpp with QWEN 3.6 27B MTP, with a 64k context window on a 4090 that OpenCode talks to and then it in term talks to the Unity Game engine via MCP. Getting 80/112 tokens/second work 90 average which is shocking to me as it really does feel as fast as cloud AI (well faster for me as I am in Vietnam and round trips to US data centers really adds up in a session). The only really issue is you pretty much have to one shot prompts as follow up prompts will easily go over the context window size. If I cannot one shot prompts them use cloud AI both that is very rare for my use case. Maybe 1 in 50 or so and only when the tasks touches a lot of large scripts and scenes.