rook

joined 2 years ago
[–] rook@awful.systems 6 points 5 days ago

I think part of the issue is that historical software quality was an artefact of its time… if you can’t easily patch your released products, you need to work harder to ensure they’re functional. If the only way for people to learn about how your product works in the documentation you ship with it, the docs need to be useful and comprehensive.

The combination of software needing no guarantee of merchantability or fitness for any particular purpose and the internet rendered those pressures obsolete. Ship shit, fix later. Mass-scale a/b testing over past decade or two shows that most people seemingly don’t care if their software runs like absolute garbage, and is covered in adverts, and harvests all their personal data and the leaks all of it that wasn’t sold.

An incident-to-pr ratio that’s up by 250% is unfortunate, but it is not yet so bad that the end-users actually care enough to do anything about it, even assuming they can do anything.

[–] rook@awful.systems 9 points 6 days ago (2 children)

This is by an llm-boosting firm, so be aware that it’ll have a lot of marketing in it. It doesn’t say nice things about vibe code (presumably because the authors want to sell you a solution) but the numbers are interesting even so.

https://www.faros.ai/blog/ai-acceleration-whiplash-takeaways

A few choice snippets, none of which will surprise anyone here:

  1. For every code change merged, the probability of a production incident has more than tripled.

The incidents-to-PR ratio is up 242.7% as teams move from low to high AI adoption.

  1. Bugs are accelerating, not stabilizing.

In our 2025 AI engineering report on the AI Productivity Paradox, bugs per developer were up 9% as AI adoption grew. In this dataset, that figure has risen to 54%

  1. The most experienced people in your organization are being buried.

Median time to first PR review is up 156.6%. Average time spent in code review is up 199.6%. Median time in review is up 441.5%. The engineers with the deepest knowledge of the system are spending their most valuable hours unraveling plausible-looking code that should never have reached them in the state it did.

[–] rook@awful.systems 5 points 2 weeks ago (4 children)

This seems like it is probably a good thing.

https://leidendeclaration.ai/

It does feel a bit “art of war” though… someone patiently explaining to a bunch of people who really should know better that they shouldn’t do obviously bad and wrong things.

[–] rook@awful.systems 11 points 3 weeks ago (7 children)

It’s probably a coincidence, but there have been a whole bunch of minor regression bugs in recent point releases of rsync, and also there are a whole bunch of commits from “tridge and claude”.

[–] rook@awful.systems 11 points 3 weeks ago (6 children)

because there’s no economic incentive to hire them to do that kind of work.

isn’t that the old “basic science is boring and unsexy” issue though? There are economic incentives, but not in a short term-big-bux sort of way, so capitalism can’t be trusted with it.

To conjure up a recent example, something like “The number of curves of genus two with elliptic differentials”, published back in 1997, probably had limited commercial value at the time, but 20 years later completely sunk a promising post-quantum cryptography algorithm (“An efficient key recovery attack on SIDH”) which might have had some non-trivial commercial implications if SIKE had got through the key exchange algorithm competition.

Anyway, the Erdős problems are good candidates for llm work because they have been specified in a careful and formal way, which requires a reasonably competent mathematician to do. That then opens up mathematics to the same deskilling problem that other sectors afflicted with llms have, and because capitalism is shortsighted and stupid we don’t know what the future economic impact of that will be, right?

[–] rook@awful.systems 12 points 3 weeks ago (5 children)

In the same way that lazy studios need to produce a film for each element of the powerset of character IPs they own, I guess we were overdue a Rationalist x Pickup Artist episode. I’m slightly surprised the whole “model women as quasi-sentient deterministic sex machinery” idea wasn’t already very popular there, but maybe I’ve just missed that part of their culture.

[–] rook@awful.systems 1 points 1 month ago

you know how sometimes people that weren't exposed to religion as children sometimes convert and get really weird about it as adults (eg: the extremely online california tradcaths) and because they were never socialized in a religion they speedrun committing every medieval heresy? rationalism is that but for philosophy.

https://feed.hella.cheap/@bob/statuses/01KRM0NVXCFT80AVFBRSB1G6G4

[–] rook@awful.systems 11 points 1 year ago (1 children)

New lucidity post: https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-article-on-ai/

The author is entertaining, and if you’ve not read them before their past stuff is worth a look.

[–] rook@awful.systems 5 points 1 year ago (1 children)

It isn’t clear to me at this point that such research will ever be funded in english-speaking places without a significant set of regime changes… no politician or administrator can resist outsourcing their own thinking to llm vendors in exchange for funding. I expect the US educational system will eventually provide a terrible warning to everyone (except the UK, whose government looks at the US and says “oh my god, that’s horrifying. How can we be more like that?”).

I’m probably just feeling unreasonably pessimistic right now, though.

[–] rook@awful.systems 5 points 1 year ago (3 children)

Some people casting their eyes over this monster of a paper have less than positive thoughts about it. I’m not going to try and summarise the summaries here, but the threads aren’t long (and are vastly shorter than the paper) so reading them wouldn’t take long.

Dr. Cat Hicks on mastodon: https://mastodon.social/@grimalkina/114690973548997443

Ashley Juavinett on bluesky: https://bsky.app/profile/analog-ashley.bsky.social/post/3lru5sua3fk25

[–] rook@awful.systems 10 points 1 year ago (3 children)

It is related, inasmuch as it’s all generated from the same prompt and the “answer” will be statistically likely to follow from the “reasoning” text. But it is only likely to follow, which is why you can sometimes see a lot of unrelated or incorrect guff in “reasoning” steps that’s misinterpreted as deliberate lying by ai doomers.

I will confess that I don’t know what shapes the multiple “let me just check” or correction steps you sometimes see. It might just be a response stream that is shaped like self-checking. It is also possible that the response stream is fed through a separate llm session when then pushes its own responses into the context window before the response is finished and sent back to the questioner, but that would boil down to “neural networks pattern matching on each other’s outputs and generating plausible response token streams” rather than any sort of meaningful introspection.

I would expect the actual systems used by the likes of openai to be far more full of hacks and bodges and work-arounds and let’s-pretend prompts that either you or I could imagine.

[–] rook@awful.systems 36 points 1 year ago (6 children)

It’s just more llm output, in the style of “imagine you can reason about the question you’ve just been asked. Explain how you might have come about your answer.” It has no resemblance to how a neural network functions, nor to the output filters the service providers use.

It’s how the ai doomers get themselves into a flap over “deceptive” models… “omg it lied about its train of thought!” because if course it didn’t lie, it just edited a stream of tokens that were statistically similar to something classified as reasoning during training.

view more: next ›