this post was submitted on 20 Jun 2026

355 points (97.8% liked)

Technology

85600 readers

3782 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

355

Low-skilled attacker used Claude, Codex to breach 14 companies (www.helpnetsecurity.com)

submitted 1 day ago by sanitation@lemmy.today to c/technology@lemmy.world

81 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] beveradb@sh.itjust.works 15 points 23 hours ago (3 children)

Most people on lemmy seem to condemn use of LLMs in any way for anything, I wonder what those folks opinion of this stance is - should companies use the tools or not?

[–] marzhall@lemmy.world 4 points 14 hours ago (2 children)

Finding holes in software has employed "fuzzing", where you send completely random payloads, as a research tactic for quite a while (and it has found exploits). LLMs just seem like "educated" fuzzing, I don't see why anyone would complain about updating your suite with them.

[–] borari@lemmy.dbzer0.com 2 points 9 hours ago* (last edited 9 hours ago)

I’ve been fucking around with using Claude to solve CTF challenges. I’m using a harness built out of a custom agent I wrote that progressively loads specific a specific skill for the challenge category, cryptography, binary exploitation, reverse engineering, forensics, etc.

It’s solving the simple shit in <1m using sonnet. It’s solved some shit that I couldn’t figure out at all during the CTF in the time limit we had in ~20 minutes. There’s been 2 challenges that after about 25 minutes I’ll kill the agent working on it, change to opus, then opus solved them in about 20m. One crypto challenge was so math heavy i never would have figured it out. One bin exp challenge didn’t provide a local binary, everything was remote. There was a catch that I never would have solved bc it was remote only and I couldn’t locally debug it.

It’s fucking scary good at solving these things. I just prompt with “use to solve ./category/challenge/“ and it fully just does everything. It’s definitely akin a fuzzer that can be used for way more than just finding crashes and memory leaks. It takes some work and understanding to make it context/token efficient I think, but it lowers the bar so tremendously that I definitely see why there’s concern here. And again it’s solving most of these things with sonnet, not even opus and definitely not fable.

All told, this feels like the same panic that happened when metasploit first got released/demo’d at defcon back in the day.

[–] ozymandias117@lemmy.world 2 points 9 hours ago

As long as they produce a PoC like fuzzing tools, I don't think anyone is complaining

It's the theoretical attacks that nearly always turn out to be impossible, wasting time, and making it harder to find the real issues that need investigation that's the problem with slop reports

[–] village604@adultswim.fan 14 points 23 hours ago (2 children)

Cybersecurity is actually one of the few fields that can benefit from AI. There are companies like Horizon3 who are using it alongside their other threat models to do continuous pen testing.

[–] Chronographs@lemmy.zip 11 points 22 hours ago (1 children)

Yeah imo the one thing ai is legitimately useful for is finding answers to difficult problems that can be trivially verified as correct.

[–] MalReynolds@slrpnk.net 3 points 20 hours ago

In this case hallucinations actually help...

[–] Duke_Nukem_1990@feddit.org -2 points 19 hours ago (1 children)

Gonna take a guess here that what is used in cybersecurity is not LLMs but one of the more useful machine learning applications. Just a nitpick cause today "ai" and "LLM" are sadly synonymous.

[–] boonhet@sopuli.xyz 9 points 18 hours ago (1 children)

No, LLMs can definitely be useful for cyber too. It's the whole reason the US government banned Claude Fable for export.

An LLM can not just try existing exploits like a script kiddy, but with iteration it can try variations and if you know what runs on the server, inspect the source for potential exploits.

They can also look at your setup and say what issues they see (reverse proxy config, etc).

Doesn't replace an expert, but can be useful for a first pass before you get the highly paid people involved.

[–] Duke_Nukem_1990@feddit.org 2 points 18 hours ago

You know what, fair enough. I don't know enough about that particular one.

[–] DeadDigger@lemmy.zip 4 points 22 hours ago (1 children)

Well the problem is that for example curl got flooded with generated security reports where only 5% had some true security potential. So your llm will basically flood you with false positives

[–] ByteJunk@lemmy.world 5 points 21 hours ago (1 children)

If 5% of the reports are genuine security vulnerabilities that they wouldn't have found otherwise, that's looking like a big win to me, not sure how you see it differently.

[–] frongt@lemmy.zip 3 points 21 hours ago (1 children)

The problem is identifying which 5%. Nobody wants to filter that much AI slop.

[–] AwesomeLowlander@sh.itjust.works 7 points 20 hours ago (3 children)

If you're working for a company's cybersec, that's your job. And a much preferable one to waiting for an attacker to do it for you.

[–] borari@lemmy.dbzer0.com 3 points 9 hours ago

If you’re submitting a vulnerability to a public repo, that’s also your job. These slop reports that are wasting maintainers time should never have been reported. The person tasking the LLM is out of their depth and can’t be the human in the loop that verifies the vulnerability report before submitting because they don’t have the required knowledge to do that. It’s a shame, because if people who had the requisite knowledge were the ones submitting, the ratio of valid reports to noise would be way higher than 5% and open source maintainers wouldn’t be feeling burned the fuck out.

[–] ByteJunk@lemmy.world 5 points 13 hours ago

Exactly. If you go through 100 tickets and find 5 real vulnerabilities to patch, that sounds incredibly good...

[–] frongt@lemmy.zip 1 points 10 hours ago

Sure, but nobody wants to do that, even at fair pay. Unpaid open source volunteer projects REALLY don't want to do that, and risk burning out what is typically a solo main dev.