Technology

86757 readers

3455 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

501

A $200 ChatGPT subscription could cost OpenAI $14,000 if you actually used it to its full potential (www.techspot.com)

submitted 1 month ago by sanitation@lemmy.today to c/technology@lemmy.world

127 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] TechLich@lemmy.world 4 points 1 month ago (1 children)

While this advice is true for all models, when it comes to agentic tasks (add this small feature/write this test harness/find bugs/suggest improvements), open source models are still way behind, vibe code or not.

Claude Fable or even Opus in an editor like Zed have a 1 million token context window and will "think" through the goals of the application, test their changes, work through debugging processes the way a programmer would, stop to ask for clarification, check diagnostic tools and linters, prompt to run test code, etc.

Llama, Gemma and Qwen etc. Do lack a lot of the world knowledge to get the goals of the application, but they also just don't have the debugging skills, won't test their code, don't always tool call correctly, get confused as the context increases and nobody has enough vram to run on large context sizes locally.

They can do autocomplete on small functions but aren't really there for more complex tasks.

On top of that, the biggest problem is that the best open source models are trained and released by the same giant tech conglomerates that have an interest in not competing with their own products. Qwen is Alibaba, Llama is Meta, gpt-oss is OpenAI. Even the more "independent" ones, kimi (Moonshot) and GLM (z.ai) are mostly funded by Alibaba and Tencent. They're released for research and marketing purposes and to please their corporate backers with inflated stock. Almost nobody has the resources to train new models from scratch. People make lots of merges and fine tunes but AI is not democratised the way that traditional programming tools have been.

Maybe some day there will be enough cheap compute for open source communities to pool together resources to build competing models but they're not really there yet :(

[–] MalReynolds@slrpnk.net 3 points 1 month ago

Context management is a huge part of making smaller models viable (and likely a big part of making frontier models better). Tricks like structured context libraries for thinking improve things a lot, I like approaches that output things like an Obsidian vault that let you dig in and correct bad assumptions easily, even if it's a bit slower. It's a useful deliverable that can (mostly) be reused with updated models.

Things like 'the debugging skills, won’t test their code, don’t always tool call correctly' are tangibly improving model to model, framework to framework, and are problems that will be solved in time, but yes they need handholding ATM.

Things like

test their changes, work through debugging processes the way a programmer would, stop to ask for clarification, check diagnostic tools and linters, prompt to run test code

are mostly down to framework, not model (except for failing to tool call, which is improving), and falling at a respectable rate.

That said, sure, frontier models get more in one go, personally I'm fine with only a 3-4x force multiplier instead of 10 to keep it local, but YMMV. For a business with resources for a bigger server it'll be more like 8 times. Remember that some businesses handle sensitive data and can't (or damn well shouldn't) use frontier models, so the market is there.

Maybe some day there will be enough cheap compute for open source communities to pool together resources to build competing models but they’re not really there yet :(

Not wrong, decentralized inference is mostly solved (with latency penalties), but without decentralized training true democratization will remain out of reach. Hopefully a breakthrough will ensue, but until then we are dependent on the kindness of corporations (or them rugpulling competitors).

This could also be a part of the RAMpocalypse thing, 'if there's not a moat I'll fucking dig one, damn everyone else' (and damn SamA). I doubt that's sustainable long term, but it might get them through to IPO, more's the pity.