this post was submitted on 07 Jun 2026
122 points (85.1% liked)

Technology

85243 readers
4168 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Mondez@lemdro.id 101 points 1 day ago (4 children)

What these articles never say is how many hallucinated bugs the LLM found that either weren't real or were actually exploitable. The LLM didn't find these with any confidence it highlighted areas of interest that actual security researchers then needed to investigate and confirm or rule out.

[–] hard_zero1@discuss.tchncs.de 20 points 1 day ago* (last edited 1 day ago) (1 children)

In the article it says the ffmpeg vulns were found by an "autonomous" agent and that it produced a proof-of-concept for each. So what do you base your claims on? They seem quite contrary to that.

Even if there was still a lot of human work involved, it seems that the LLM-Agents can help a lot with security research, as the number of (real) zero-days that are beeing found recently (with the help of AI) seems to spike (telling from what I read, e.g. here on Lemmy, or the number of security updates for my distro).

[–] Mondez@lemdro.id 4 points 1 day ago (1 children)

It's states they were produced which I'm taking to mean found and it's ambigously worded so it's unclear if the article is actually claiming it generated PoC for all of them. It doesn't say how many if any hallucinated results were produced or how much effort was needed to fully confirm, basically down played the human involvement.

It's great that so many bugs are being found and squashed but it's implied LLMs are doing all the work when what they are actually doing is assisting as a tool.

[–] hard_zero1@discuss.tchncs.de 5 points 1 day ago* (last edited 1 day ago)

I agree that the wording is a bit ambiguous, I interpreted it the way it seems more natural to me. In the post by the researcher(s) themselves, it says in the tldr paragraph that the "agent produces concrete, reproducible PoC inputs to confirm its findings" but also that they (probably humans) "explored the exploitability of the issues and developed a PoC demonstrating a RCE exploit primitive". Apparently it finds the vulnerabilities very concretely but humans were involved for the full-blown exploit. It also doesn't say much about the number of false-positives.

I'm not in the business, so I can't tell how much of the work such agents are actually saving. Since the articles don't say much about the amount of human involvement, the imagination conveyed by them probably depends strongly on the (knowledge of the) reader. But in my opinion it is a bit of stretch to say this is downplaying it. It should be noted though, that the article probably sources its information from a post by the company selling that AI.

With that information, the "without any confidence" and "area of interest" parts of your previous post still seem misleading.

[–] Cocodapuf@lemmy.world 2 points 1 day ago (1 children)

What these articles never say is how many hallucinated bugs the LLM found that either weren't real or were actually exploitable.

It literally wouldn't matter if it did.

The fact that it found exploitable bugs means that these bugs need to be addressed. To be clear, I care much more about the security flaws and fixing them than how they were discovered.

[–] wholookshere@lemmy.blahaj.zone 8 points 18 hours ago (1 children)

I feel like you missed the forest for the trees.

The question is how many were made up?

[–] Cocodapuf@lemmy.world 0 points 17 hours ago (1 children)

I saw that, and you're right, I wasn't answering that question. What I was saying was that I thought the question was irrelevant and ignoring a bigger issue.

[–] wholookshere@lemmy.blahaj.zone 2 points 17 hours ago

I disagree that its ignoring the bigger problem, which is that slop like this is overwhelming devs to get fixes out ASAP faster than they can fix.

So now we have AI big reports feeding AI big fixes in a lot of projects.

The assumption that what AI finds is correct in the first place is.... Probably wrong.

It makes stuff up all the bloody time, so how many of these bugs were made up, or not actually bugs?