this post was submitted on 30 May 2026

70 points (100.0% liked)

news

24776 readers

435 users here now

Welcome to c/news! We aim to foster a book-club type environment for discussion and critical analysis of the news. Our policy objectives are:

To learn about and discuss meaningful news, analysis and perspectives from around the world, with a focus on news outside the Anglosphere and beyond what is normally seen in corporate media (e.g. anti-imperialist, anti-Zionist, Marxist, Indigenous, LGBTQ, people of colour).
To encourage community members to contribute commentary and for others to thoughtfully engage with this material.
To support healthy and good faith discussion as comrades, sharpening our analytical skills and helping one another better understand geopolitics.

We ask community members to appreciate the uncertainty inherent in critical analysis of current events, the need to constantly learn, and take part in the community with humility. None of us are the One True Leftist, not even you, the reader.

Newcomm and Newsmega Rules:

The Hexbear Code of Conduct and Terms of Service apply here.

Link titles: Please use informative link titles. Overly editorialized titles, particularly if they link to opinion pieces, may get your post removed.
Content warnings: Posts on the newscomm and top-level replies on the newsmega should use content warnings appropriately. Please be thoughtful about wording and triggers when describing awful things in post titles.
Fake news: No fake news posts ever, including April 1st. Deliberate fake news posting is a bannable offense. If you mistakenly post fake news the mod team may ask you to delete/modify the post or we may delete it ourselves.
Link sources: All posts must include a link to their source. Screenshots are fine IF you include the link in the post body. If you are citing a Twitter post as news, please include the Xcancel.com (or another Nitter instance) or at least strip out identifier information from the twitter link. There is also a Firefox extension that can redirect Twitter links to a Nitter instance, such as Libredirect or archive them as you would any other reactionary source.
Archive sites: We highly encourage use of non-paywalled archive sites (i.e. archive.is, web.archive.org, ghostarchive.org) so that links are widely accessible to the community and so that reactionary sources don’t derive data/ad revenue from Hexbear users. If you see a link without an archive link, please archive it yourself and add it to the thread, ask the OP to fix it, or report to mods. Including text of articles in threads is welcome.
Low effort material: Avoid memes/jokes/shitposts in newscomm posts and top-level replies to the newsmega. This kind of content is OK in post replies and in newsmega sub-threads. We encourage the community to balance their contribution of low effort material with effort posts, links to real news/analysis, and meaningful engagement with material posted in the community.
American politics: Discussion and effort posts on the (potential) material impacts of American electoral politics is welcome, but the never-ending circus of American Politics© Brought to You by Mountain Dew™ is not welcome. This refers to polling, pundit reactions, electoral horse races, rumors of who might run, etc.
Electoralism: Please try to avoid struggle sessions about the value of voting/taking part in the electoral system in the West. c/electoralism is right over there.
AI Slop: Don't post AI generated content. Posts about AI race/chip wars/data centers are fine.

founded 5 years ago

MODERATORS

Alaskaball@hexbear.net

carpoftruth@hexbear.net

Breath_Of_The_Snake@hexbear.net

Infamousblt@hexbear.net

Redcuban1959@hexbear.net

Microsoft’s quiet Claude Code retreat and the real cost of enterprise AI (thenextweb.com)

submitted 2 days ago by RedWizard@hexbear.net to c/news@hexbear.net

29 comments fedilink hide all child comments

top 29 comments

sorted by: hot top controversial new old

[–] Infamousblt@hexbear.net 66 points 1 day ago (1 children)

A few months ago at my job they told us all to use AI as much as we could and they would be measuring and talking to the bottom 10% of AI users to figure out "how we can help you use more AI".

This month we ran out of tokens and they changed the policy to "please use cheap models only" and "please be more thoughtful in your AI usage so everyone has enough tokens."

So which is it? Use it as much as possible anywhere I can, or use it only where I know it will give me a good result?

As someone who is forced to use AI to hit my token quotas, I'll tell you that there's a reason the cheap models are cheap.

The bubble is bursting faster than I expected. Companies are going to be cutting back. They'll be forced to re-hire real people. They'll be forced to only use AI where it is actually cost effective instead of where they dream it might be cost effective.

[–] Lenins_Dumbbell@lemmygrad.ml 1 points 18 hours ago

Don't be ridiculous. Companies won't be hiring anyone anytime soon. The C-suites have made huge bonuses by cutting "labour costs" the past few years. Admitting now that they fucked up is practically a guarantee that they'll get sacked

The only option for most companies now is a government bailout

[–] chgxvjh@hexbear.net 34 points 1 day ago (1 children)

It's kind of weird that they are using Claude Code given how heavily they are invested in OpenAI.

[–] jackmaoist@hexbear.net 14 points 1 day ago

I think everyone just realized that OpenAI is pure trash. They were first to the market and that's their only distinguishing factor. They're not good at anything else.

[–] hellinkilla@hexbear.net 21 points 1 day ago (1 children)

Is this one of the places that gave workers targets for minimum AI use?

From how I understand the article, its saying that the AI is doing much more effort per given request than previously. I guess all the extra work is put in place by the vendor to justify price hike. But is there a technical reason why this couldn't be rolled back or restrained within the system?

[–] ZWQbpkzl@hexbear.net 33 points 1 day ago (5 children)

AI is doing much more effort per given request than previously

More or less, and the result is it does generally accomplish the task now. At first if you ask an AI tool to implement a feature, it would spit what the average solution would be, completely ignorant of your existing codebase unless you told it.

Some models allowed 'Thinking' where it would talk over its response and iterate on how it should respond before responding. Better results from the same sized model, but more token usage per request

Then 'RAG' allowed the AI to read parts of your codebase and maybe even do a google search. Now it's got fresh code in its context specific to your problem and not buried in some weights from training data. You get really good results but that's a lot more tokens parsed. And way more tokens in the context window. If you run out of context window, the session crashes.

Now agentic coding allows your AI session (or Agent) to spin up sub-agents to answer specific questions for it. So if the AI wants to match house style, it will spin up a sub agent to read your source files and provide a summarized style guide to the main session. The sub agent can fill up its context window but only the summary is in the main agents context window. This is allows the AI to bypass the context window limit and consume more tokens.

More and more and more tokens. Anthropics Claude is the most expensive per-token by a long shot. $30 per million tokens (via cursor) and it will aggressively use as many tokens as possible to produce the best results in the field. The average developer doesn't care about cost because that's from the employer. They only care about quality and speed.

AI providers are currently taking a loss at their current token rates and client companies are finding those rates too high to be profitable. Unless some token efficiency unlock happens there will be a correction.

fwiw I suspect the solution will be to decrease the model size to lower token rates. Models that fit on a 64GB macbook can compete with the monstrosities that run in data-centers when you give them all the improvements I listed above.

[–] segfault11@hexbear.net 27 points 1 day ago (1 children)

Now agentic coding allows your AI session (or Agent) to spin up sub-agents to answer specific questions for it

even AI is making AI do its job 🙄

[–] AnarchoAnarchist@hexbear.net 17 points 1 day ago (1 children)

Nobody wants to work anymore

[–] SchillMenaker@hexbear.net 5 points 1 day ago

It's six-fingered turtles all the way down

[–] jack@hexbear.net 9 points 1 day ago (1 children)

They only care about quality and speed.

Are you saying I can't have both speed, quality, and low cost? Wtf

[–] hellinkilla@hexbear.net 6 points 1 day ago

To have fast + cheap + good, you have to turn the triangle upside down: 🔻

[–] krakhead@hexbear.net 12 points 1 day ago (4 children)

So with cheaper models being easily accessible and viable, is junior software engineering cooked in your opinion?

[–] darkmode@hexbear.net 15 points 1 day ago

I think for a lot of companies the sugar rush from AI is ending. Coworkers i know who have been interviewing still get leetcode style problem solving questions, standard system design interviews, etc. The job market being tough for SWEs rn is because it is invariably compared to the era of 0% interest rate hiring. Persistence and a solid base of knowledge will land you something decent but getting a beginner job is absolutely tougher than before

[–] fox@hexbear.net 11 points 1 day ago (1 children)

It's going to be a bad time for software devs for the next two to five years imo. Not just because the business idiots think they can stop hiring junior devs but because mass layoffs mean junior devs are competing with middle and senior devs for the same positions. However there's always going to be a need for juniors because there's no senior devs without junior devs training to become senior.

[–] hellinkilla@hexbear.net 8 points 1 day ago (2 children)

there's always going to be a need for juniors because there's no senior devs without junior devs training to become senior.

Sound like development jobs will soon be like other skilled industries (trades, healthcare, social services etc).

You always see stuff about "shortage" of workers yet everyone I know is constantly out of work, working part time/casual, stringing together multiple jobs. What the employers want is people with 20 years of experience and lots of training in there. But nobody wants to hire a fresh young person for 20 years and be paying them to upskill regularly. Older people either dont want to teach/train or it can't be integrated with their existing jobs.

Everyone wants to reap but nobody wants to sow. Employers will do anything to avoid having to invest in people because it indicates a commitment to their value that makes workers upity and demanding. Its more important that the workers feel off balance and insecure all the time.

[–] imogen_underscore@hexbear.net 2 points 19 hours ago (1 children)

programmers have been thoroughly proletarianised i think. this was the goal of "learn to code" and all the hyping up of the career path as a free ticket to $$$

[–] hellinkilla@hexbear.net 1 points 18 hours ago

That was the goal but it was never enough.

"Learn to code" was like telling people to buy a household spinning wheel to earn extra income out of the home during early capitalism.

Now someone has invented the spinning jenny and they're building mills everywhere. The proling can really get going.

Trawling all the internet and existing code to devlo the models was digital primitive accumulation I guess?

Look out for Jacquard.

[–] queermunist@lemmy.ml 8 points 1 day ago* (last edited 1 day ago)

I think the whole reason we see stuff about a """shortage""" of skilled trades workers is to encourage young people to go into the skilled trades, which increases the supply and thus increases precarity and reduces wages. They tell everyone to go into plumbing because they want plumbers to be cheaper.

[–] ZWQbpkzl@hexbear.net 6 points 1 day ago* (last edited 1 day ago)

I don't think cheaper models will cook devs more than AI has already. Cheaper local models mean junior devs can get the means of production on their laptop without a Claude or cursor license from an employer. If anything its mitigating.

AI in general is going to cook the junior code monkey at big firms. Senior devs are going to write their designs and diagrams and give that to an AI system to develop instead of a team of junior devs. It'll become more PhD or go home like Physics. Smaller more flexible firms will still need devs.

[–] chgxvjh@hexbear.net 8 points 1 day ago

I don't think so, not more than the rest of the industry.

[–] christian@hexbear.net 5 points 1 day ago (1 children)

Then 'RAG' allowed the AI to read parts of your codebase and maybe even do a google search.

I'm sure it's reassuring to know that the entire process isn't just having the AI talk to itself nonstop and that it does reach out to get opinions from a second AI from time-to-time.

[–] ZWQbpkzl@hexbear.net 3 points 22 hours ago

Often they do that within the model's architecture itself. You can save drastically on the filesize of the model itself by using this Mixture of Experts (MoE) strategy. Its how deepseek was able to break into the market.

[–] Assian_Candor@hexbear.net 7 points 1 day ago* (last edited 1 day ago) (1 children)

I actually prefer gpt 5.5 to opus 4.7 for coding (haven't tried 4.8 yet because what am I made of money?) and it is way way more token efficient.

When Claude says it's "bloviating" it really means it

A big part of the problem is the harness which you allude to. Claude locks you into Claude code which is bloated and absolute ass at context management.

Once the Chinese models reach parity with the current generation models it'll be a race to the bottom. Deepseek v-4 pro is right there but not quite. The models now are strong enough to be generalist problem solvers. Anything stronger will only benefit niche applications.

I'd like to see something like the gpt chat interface within a coding harness where the model is capable of selecting what to delegate to based on the task. This is where we will go in the future. A lot of enterprises incinerate tokens with Claude because people are using opus to write emails or whatever. It's like taking a Ferrari to the grocery store.

These will become commodities though I think at which point it's all about the integrations. You can already see the big providers pivoting into these value added services with GPT leaning into the consumer market with apps and Claude going heavy on business/enterprise

[–] ZWQbpkzl@hexbear.net 7 points 1 day ago (2 children)

It's been a few months but I gave the then latest Claude Opus and GPT Max the same task and they both used the same absurd amount of tokens just to translate some curl commands to some other code. Gemini did about the same with like 10% of the tokens, so I'm a little impressed by them. I believe Claude is burning so many tokens just reading the entire code base to match house style.

I'd like to see something like the gpt chat interface within a coding harness where the model is capable of selecting what to delegate to based on the task.

You can do this with multimodal harnesses like opencode or pi by restricting the model for each agent you've configured. You've picked the model per agent, not the AI. But you could probably define duplicates per model per agent if you wanted.

I personally think Apple has the right long term idea with making consumer hardware that can run adequate local models. All the data centers can get fucked. I haven't experimented with AI writing emails, but I suspect Gemma E2 Flash can do it just fine.

[–] hellinkilla@hexbear.net 4 points 1 day ago (1 children)

hardware that can run adequate local models

I don't use AI for anything except translating language. I just use the local models. I have no fucking idea how an entire language (and the info needed to properly transform to another language) can be contained in a file of 10-70mb. But it works adequately on my trashy old hardware.

Also tried a bit of text to speech, transcription and similar. Those were more of a pain in the ass and needed a lot more stuff to download and set up but it isn't clear to me if that is the nature of it or maybe the front end user experience just hasn't been polished yet. But my laptop can handle it anyway. It seems like these companies really want the SAAS model and that is directing how they build. More than it being an inherent requirement.

To me it kind of seems like most individual people have a limited set of tasks they want to accomplish so you could just have whatever you need locally. Probably could even generate a custom local model and send it updates from an interface of some kind. If you want something to write emails or whatever you have it pull and digest what it needs once and then by incremental upgrades after that.

Its hard for me to imagine individual users really need so much power from a data center because its not like you are using the whole thing yourself. Probably data centers are really for huge comprehensive tasks that humans could never do unaided like analyzing massive datasets.

[–] ZWQbpkzl@hexbear.net 3 points 21 hours ago* (last edited 21 hours ago) (1 children)

Yeah you're kind of getting where I think or at least am hoping the market will go. You can already retrain public models locally but the tooling is either way too low level, or completely vibe coded.

I think the main contradiction in the market is that AI models are too big and expensive to run profitabily. But if they make them more efficient then the models can run on consumer hardware and that fucks up the SaaS AI business.

Don't worry, all the data centers will be put back to use mining crypto as god intended.

[–] DornerStan@lemmygrad.ml 1 points 17 hours ago

Too expensive to run profitably + efficient models undermine the SaaS model + Chinese tech is always threatening to surpass them for a fraction of the cost. Even if they try to gatekeep efficient and local models, deepseek is right there.

[–] Assian_Candor@hexbear.net 4 points 1 day ago* (last edited 1 day ago) (1 children)

Yeah probably a better pi setup would be to create more granular agent types. I tried creating a worker powered by deepseek but found it wasn't really that good. Something like an agent team spec (use a technical product manager agent to coordinate modules, review the worker output, and to make necessary corrections) then you could use a frontier model to plan and issue work instructions to the TPM agent

Not sure if this would be more economical in practice but would be interesting to try

Long run I think you will always need the data centers to handle big training loads but we might go back to on-prem computing for enterprises to run frontier models. Or even edge nodes that have enough muscle that folks can subscribe to locally.

Something in your city you can sub to for $30 a month that lets you run any flagship open source model could be very compelling

[–] ZWQbpkzl@hexbear.net 1 points 21 hours ago

Something in your city you can sub to for $30 a month that lets you run any flagship open source model could be very compelling

The admins at db0 have some sort of AI service mesh running already so thats way more viable than you'd think.