this post was submitted on 31 May 2025
84 points (78.4% liked)

Linux

55051 readers
665 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 6 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] WalnutLum@lemmy.ml 60 points 1 week ago (15 children)

The Blog Post from the researcher is a more interesting read.

Important points here about benchmarking:

o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives. For comparison, Claude Sonnet 3.7 finds it 3 out of 100 runs and Claude Sonnet 3.5 does not find it in 100 runs.

o3 finds the kerberos authentication vulnerability in 1 out of 100 runs with this larger number of input tokens, so a clear drop in performance, but it does still find it. More interestingly however, in the output from the other runs I found a report for a similar, but novel, vulnerability that I did not previously know about. This vulnerability is also due to a free of sess->user, but this time in the session logoff handler.

I'm not sure if a signal to noise ratio of 1:100 is uh... Great...

[–] drspod@lemmy.ml 24 points 1 week ago (7 children)

If the researcher had spent as much time auditing the code as he did having to evaluate the merit of 100s of incorrect LLM reports then he would have found the second vulnerability himself, no doubt.

[–] DarkDarkHouse 1 points 1 week ago (4 children)

And if Gutenberg had just written faster, he would've produced more books in the first week?

[–] WalnutLum@lemmy.ml 5 points 1 week ago (1 children)

I'm not sure if the Gutenberg Press had only produced one readable copy for every 100 printed it would have been the literary revolution that it was.

[–] DarkDarkHouse -1 points 1 week ago

I agree not brilliant, but It's early days. If one is looking to mechanise a process like finding bugs, you have to start somewhere. Determine how to measure success, set performance baselines and all that.

load more comments (2 replies)
load more comments (4 replies)
load more comments (11 replies)