this post was submitted on 14 Jun 2026
219 points (98.2% liked)

196

6390 readers
546 users here now

Community Rules

You must post before you leave

Be nice. Assume others have good intent (within reason).

Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.

Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.

Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".

Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.

Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.

Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.

Avoid AI generated content.

Avoid misinformation.

Avoid incomprehensible posts.

No threats or personal attacks.

No spam.

Moderator Guidelines

Moderator Guidelines

  • Don’t be mean to users. Be gentle or neutral.
  • Most moderator actions which have a modlog message should include your username.
  • When in doubt about whether or not a user is problematic, send them a DM.
  • Don’t waste time debating/arguing with problematic users.
  • Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
  • Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
  • Ask the other mods for advice when things get complicated.
  • Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
  • Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
  • Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
  • Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
  • Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
  • First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
  • Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
  • No large decisions or actions without community input (polls or meta posts f.ex.).
  • Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
  • Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.

founded 1 year ago
MODERATORS
 

the original said reggae but i misread it as regex and got this idea lol

original comic artist: thisstupidtwink@insta

you are viewing a single comment's thread
view the rest of the comments
[–] SkybreakerEngineer@lemmy.world 9 points 2 days ago (2 children)

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the transgression of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

Have you tried using an XML parser instead?

[–] tutter@lemmy.blahaj.zone 6 points 2 days ago

Running a thousand regex based grep proceses on random html files to summon the scarlet king 100% speedrun

[–] ChaoticNeutralCzech@feddit.org 4 points 2 days ago* (last edited 2 days ago)

By "inside first" I mean this Regex:

b"<(sampletag\d+)>([^<]*?)</\\1>"
# ^^^^^^^^^^^^^^^^ capture tag opening
#                 ^^^^^^^^ capture content, make sure no children
#                         ^^^^^^ detect tag closing

(part of a Python script; b because I'm parsing a mmapped binary file with NUL bytes that would ruin strings)

Yes, it only works for bottom-level XML tags, I'd need to remove each level with a Regex replace and re-run it to detect parent nodes. Presumably, the middle part could be improved to also detect tags as long as they don't contain tags of the same type inside. Fortunately, the specific schema and the limited data I needed (strings) allowed me to just go over bottom-level elements.

I'd use an XML library but it's not a valid XML file, it's part of a raw image of a damaged drive with XML files. Very cursed. It worked in a pinch but you shouldn't ever parse XML/HTML with Regex if you can avoid it with libraries like BeautifulSoup. By the way, some have used Regex to parse HTML, see Chad Scraper meme.