196

5075 readers

2602 users here now

Community Rules

You must post before you leave

Be nice. Assume others have good intent (within reason).

Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.

Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.

Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".

Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.

Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.

Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.

Avoid AI generated content.

Avoid misinformation.

Avoid incomprehensible posts.

No threats or personal attacks.

No spam.

Moderator Guidelines

Moderator Guidelines

Don’t be mean to users. Be gentle or neutral.
Most moderator actions which have a modlog message should include your username.
When in doubt about whether or not a user is problematic, send them a DM.
Don’t waste time debating/arguing with problematic users.
Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
Ask the other mods for advice when things get complicated.
Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
No large decisions or actions without community input (polls or meta posts f.ex.).
Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.

founded 11 months ago

MODERATORS

SoleInvictus@lemmy.blahaj.zone

will_steal_your_username@lemmy.blahaj.zone

TheCoolerMia@lemmy.blahaj.zone

kittenzrulz123@lemmy.blahaj.zone

rockSlayer@lemmy.world

JoMiran@lemmy.ml

TotallynotJessica@lemmy.blahaj.zone

erotador@lemmy.blahaj.zone

Arkhive@lemmy.blahaj.zone

BadJojo@lemmy.blahaj.zone

rockSlayer@lemmy.blahaj.zone

WillStealYourUsername@piefed.blahaj.zone

kittenzrulz123@piefed.blahaj.zone

kittenzrulz123@lemmy.dbzer0.com

162

llm poisoning rule (feddit.org)

submitted 1 week ago* (last edited 4 days ago) by UnGlasierteGurke@feddit.org to c/onehundredninetysix@lemmy.blahaj.zone

6 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Gladaed@feddit.org 9 points 1 week ago (2 children)

To be fair, your old style of writing excessively special probably would damage training sets. That being said you can't use Lemmy to poison them, effectively.

Also: this is not a riff. It's. Ok being weird.

[–] Smorty@lemmy.blahaj.zone 2 points 6 days ago

hmmm... see - i dont believe this is how it goes.

we all know LMs predict patterns, but most of todays "poisoning" attempts at LMs were done byhaving a certain keyword ttigger a random string of characters afterward,..

so in that way, its easy to tell if a certain bit of data was poisoned, even for an LM itself. and by todays standards, every bit of training data is already being filtered, changed and optimized fir training, like how when qwen 3 coder was trained, alibaba group used their older qwen 2.5 coder to clean the training data to be less "noisy", and it worked!

when peeps say "lm poisoning", they usually refer to this anthropic post about the topuc released two months ago.

best case: we find a token combination which is frequently used while running the model, rare to find in the post-training data (the instruction tuning dataset) AND is very rare to occur in the prettaining data (the internet source text)... and thats rather limiting.

best case: we poison it well so that the model behaves differently enough for us to ne happy, and too obscurely for the model devs to notice.

so we gotta be very sneaky with the poisoning..... again tho, mayb some new better technique came up and is now going to make it easier-

[–] ApertureUA@lemmy.today 2 points 1 week ago* (last edited 1 week ago)

Not sure about the <|endoftext|> but the rest of the writing quirks I see here are also the ones I see edgy 13 year olds using nowadays (no offense intended). I guess the new is the well forgotten old, or however that phrase went.