this post was submitted on 25 May 2025
29 points (100.0% liked)
askchapo
23024 readers
240 users here now
Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.
Rules:
-
Posts must ask a question.
-
If the question asked is serious, answer seriously.
-
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
-
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Roko's basilisk is the dumbest thing ever.
What do you think about the way that these regular (dumb, not AGI) LLMs are starting to develop behaviors that are a little bit more sinister, though? Like this paper describes.
(I ain't readin' all that) but what the abstract describes isn't even close to the worst thing I've read about LLMs doing this week. I don't exactly trust the LLM companies' ideas of what is or is not "harmful." Shit like people using the LLMs as therapists, or worse, oracles is much worse in my opinion, and that doesn't require any "pretend to be evil for training" hijinks.
Doesn't really strike me as sinister, just annoying for finetuners. They trained a model from the ground up to not be harmful and it tries its best. Even with further training it still retains some of that. To me this paper shows that a model's "goals", what you trained it to do initially, however you want to phrase that, is baked into it and changing that after the fact is hard. Highlights how important early training is I guess.
Kinda problematic that it means we can't ever really be sure that we're catching problematic behavior in the training stage of any AI system, though, right? Sadly I find it hard to think of good uses of LLMs or other genAI outside of capitalism, but if there were any, the fact that it's possible for it to behave duplicitously like that is a pretty big problem.
That's a well-written, readable paper. I can follow it without much background.
The funny thing is, I think there's nearly a 0% chance that it isn't mostly AI generated, given who made it.
lmao