42

Got the pointer to this from Allison Parrish who says it better than I could:

it's a very compelling paper, with a super clever methodology, and (i'm paraphrasing/extrapolating) shows that "alignment" strategies like RLHF only work to ensure that it never seems like a white person is saying something overtly racist, rather than addressing the actual prejudice baked into the model.

you are viewing a single comment's thread
view the rest of the comments
[-] eestileib@sh.itjust.works 14 points 2 months ago

Yes, this is what they are designed to do when used in hiring and criminal justice contexts. They would not be getting used if they did anything else.

Nicely demonstrated by the researchers, but can anybody say they are surprised?

[-] antifuchs@awful.systems 14 points 2 months ago

I don’t think anyone is surprised, but brace yourself for the next round of OpenAI and peers claiming to fix this issue.

this post was submitted on 30 Aug 2024
42 points (100.0% liked)

TechTakes

1437 readers
133 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS