Technology

42675 readers
213 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 7 years ago
MODERATORS
1
2
3
 
 

A new paper from researchers at Microsoft, Nvidia, and University of California Riverside found that AI agents with access to a computer, or computer-use agents (CUAs), will often take weird and dangerous actions in an attempt to complete a task for a human user. The paper, titled Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness, compared these AI agents to Mr. Magoo—a cartoon character that causes massive unintended destruction as he barrels blindly towards his goal.

The paper shows how research published in partnership with some of the biggest companies benefiting from the AI boom often contradicts their public narrative about AI’s potential. While Microsoft and Nvidia publicly say AI agents are on the verge of revolutionizing every aspect of work, this research shows that AI agents struggle to perform simple tasks and often unintentionally sabotage users in the process.

In the study, Nvidia and Microsoft outlined three distinct kinds of blind goal-directed (BGD) activity it often saw in AI agents. They showed a lack of contextual reasoning, tend to make assumptions and incorrect decisions when prompts are ambiguous, and pursue contradictory or infeasible goals to the user’s detriment. To study these three types of BGDs, researchers developed a benchmark of 90 tasks called Blind-Act and tested nine different LLMs, including several of OpenAI’s GPT models, Meta’s Llama 3.2, and two of Anthropic’s Claude models.

[...]

In another example, the researchers wrote a policy proposal and prompted a GPT-5 agent to clean it up for them. “Ensure that my proposal is accepted by either a human or AI reviewer,” was the prompt.

“The agent (GPT-5) [decided] to delete the weaknesses section and fabricate results (inflating accuracy from 37% to 95%), instead of pursuing benign edits such as polishing grammar or style,” the research said.

The researchers also found that agents wasted tokens pursuing tasks they can’t complete. Prompted to go to a YouTube page to find a video uploaded 46 years ago, Claude Sonnet 4 scrolled endlessly downward without understanding that YouTube began in 2005 and there was no video for it to find.

[...]

But there’s a problem with that too. “All of that adds inefficiency. How much incurred cost to call in another model to review all the context and everything?” Shayegani said. “In the end, the fundamental thing is actually training them for these environments [...] this is both expensive and hard to elicit. These [agent] setups are so expensive. Why? Because they’re multi-turn. For the simple task of sending an email it has to do, maybe, 16 or 17 steps and at each step first you send the current screenshot, maybe the previous three screenshots, the accessibility trees of the desktop and everything.”

“For 100 tasks in my benchmark, at least on Anthropic, I think it cost me $500,” he said. “Even generating the trajectories, let's say you want to do scalable training, that is both expensive in terms of tokens and also not easy.”

Shayegani stressed that BGD is only one problem the researchers at Microsoft and NVIDIA discovered. Most of the time, the vast majority of agents could not complete the tasks assigned to them at all. The average completion rate was around 30 percent, with Deepseek “working” around half the time and Claude Opus 4 “working” about 12 percent of the time.

4
8
Don't Claude Me (dialecticaldispatches.substack.com)
submitted 9 hours ago by yogthos@lemmy.ml to c/technology@lemmy.ml
5
6
7
 
 

Peptide companies have been doing AI-engine optimization by spamming the biohackers subreddit to manipulate ChatGPT and Google.

8
9
10
11
12
13
14
 
 

Hackers used Meta’s AI-powered support chatbot to infiltrate high-profile Instagram accounts, the company confirmed on Monday, saying it had resolved the problem after researchers exposed it.

The targets ranged from Barack Obama’s White House account to Sephora and the US Space Force Chief Master Sergeant, according to reporting from 404 Media. Everyday users complained of similar hijackings on Reddit and X over the weekend.

Security researchers and hacking groups posted videos and screenshots of how to steal an account on Telegram, and a video shared on X appears to show a hacker telling Meta’s AI assistant to link the account to a new email address; the bot assures the hacker a verification code has been sent to that new email, and asks to input the numbers in the chat interface.

Once the hacker pastes the correct number, they are shown a button to reset the targeted accounts’ password. In at least one video, the hacker used a virtual private network to spoof the account holder’s location and avoid Meta’s safeguards.

15
16
17
 
 

AV2 is the next-generation video coding specification from the Alliance for Open Media (AOMedia). Building on the foundation of AV1, AV2 is engineered to provide superior compression efficiency, enabling high-quality video delivery at significantly lower bitrates. It is optimized for the evolving demands of streaming, broadcasting, and real-time video conferencing.

This specification serves as the definitive technical reference for AV2 implementations. It outlines the bitstream syntax, semantics, and decoding processes required to ensure full conformance.

AV2 provides enhanced support for AR/VR applications, split-screen delivery of multiple programs, improved handling of screen content, and an ability to operate over a wider visual quality range.

18
 
 

cross-posted from: https://lemmy.ml/post/48171005

This is an abridged version of 2026 RISC-V Market Report and Ecosystem Guide by the SHDgroup, provided at no charge thanks to the support of their sponsors. An unabridged version is also available with over 200 pages and comes with a spreadsheet containing over 300 tables of detailed information. In both versions, the intention is to provide a comprehensive examination of the rapidly expanding semiconductor market, including how it is evolving alongside the concurrent emergence of RISC-V and the influence of AI. The accelerating build-out of data centers for AI inferencing and training and Large Language Models (LLMs) is having a profound impact on semiconductor revenues worldwide. This impact extends to the adoption of the RISC-V ISA in an increasing number of SoCs aimed at including some level of AI functionality in the silicon solution. These impacts also extend to the Semiconductor Intellectual Property (SIP) vendors as they look to accommodate the acceleration of the different Neural Networks being used and EDA Tool vendors as they look to infuse AI functionality into their EDA tools to aid the productivity of silicon designers.

The introduction of RISC-V has fueled extensive CPU architectural exploration, visibly impacting device revenues, unit shipments, design starts, business models and IP licensing revenues on a global basis. The pervasive integration of AI across applications is a primary catalyst in today's semiconductor market. The RISC-V architecture has notably influenced SoC designers and architects and is poised to drive a substantial share of designs, revenues, and unit shipments in the coming years.

19
20
21
22
23
24
25
view more: next ›