AI

6458 readers
1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 5 years ago
1
3
submitted 2 days ago* (last edited 2 days ago) by yogthos@lemmy.ml to c/artificial_intel@lemmy.ml
 
 

A new paper from Moonshot AI tackles a key bottleneck in how language models handle depth. Standard residual connections just add up the outputs of all previous layers using fixed uniform weights, and uniform addition creates a problem where hidden states grow uncontrollably as the network gets deeper. As a result, the contributions of early layers end up getting completely buried and diluted by the time the data reaches the end of the model.

This happens to be the exact same issue older recurrent neural networks faced over time before attention mechanisms came along. Naturally, they tackle the problem in a similar way using attention residuals instead of a fixed accumulation and applying a softmax attention mechanism over the outputs of preceding layers. Now, every single layer gets a learned pseudo query vector that lets it selectively pick and choose which earlier representations it actually needs to look at. This allows the network to naturally retrieve information from anywhere in its depth depending on the specific input.

However, applying this over every individual layer is called Full AttnRes and it comes with a massive catch which is that saving all those individual layer outputs creates memory and communication bottlenecks during large scale distributed training because the overhead scales linearly with the number of layers. So, in order to make the architecture actually usable they grouped the layers into chunks and summed up the outputs inside each block. The cross layer attention is then only applied over these compressed block level summaries rather than every single layer drastically reducing the memory and communication footprint.

By combining a block structure with a smart cross stage caching system and a two phase computation strategy the setup becomes a practical drop in replacement with practically zero training overhead. Their experimental results show that the performance boost holds up consistently across different model sizes.

2
 
 

I literally ended up creating one of the most advanced high tech ai trailer I think this is gonna be the future of movie filmmaking ai will take over https://youtu.be/4Q2b1-CyC2Q just look at this it’s only 2 min you can leave your honest opinion in the comment like this is just way too good the characters speak realism what do y’all think about this are we all cooked now but for me I think this is pretty amazing

3
4
5
 
 

cross-posted from: https://lemmy.world/post/47499598

We’ll soon get a chance to see whether, frankly, our last hope, evil corp Google, can still distinguish content created by AI from Human one 🤖

Here’s how I would rank the detection difficulty: 1️⃣ Text 2️⃣ Code 3️⃣ Images 4️⃣ Gifs 5️⃣ Videos If they already fail at level 5, we have a SERIOUS problem.

6
 
 

cross-posted from: https://sopuli.xyz/post/46320128

Literally speaking, there's some truth to Olah's musing. AI systems are not cold – Blackwell chips idle at 32 to 38°C. They are not calculating – they're bad at math. And they're not robots – AI models are specialized binary blobs of tensors and metadata that can be instantiated across multiple servers.

But the notion that there's some AI mystery in the spiritual sense is just hot garbage. 

AI systems are indeed "made from us, from our words" and that is why Anthropic and its rivals have been named in more than 100 lawsuits. One of the reasons those systems remain mysterious is that Anthropic and its rivals don't disclose where they got their training data.

7
 
 

Here is an excerpt of of David Gerard and Amy Castor's post about it:

So what does this mean? What follows from this?

An encyclical isn’t a set of religious directives. It’s a position paper. The Pope has not charged Catholics worldwide to burn down the data centres. Cool as that would be.

Catholics probably can’t go into work and claim a religious exemption from Claude Code. Though the encyclical does give Catholics who hate AI an excellent set of talking points to answer that AI bro who just doesn’t stop.

The encyclical does not have direct consequences. Functionally, this is just a letter. But it’s a letter that’s in every newspaper this week. It’s going to be influential.

The Pope calls for government regulation to stop AI abuses. And up against that, we have a ton of money. But the encyclical will still give politicians a bit of think tank input on things they have to consider politically.

Even the rich tech bros are treating this encyclical as a threat. The AI companies lobbied the Vatican quite hard in the leadup to the encyclical. We’re not sure they got a lot of what they wanted. [Politico]

But again, an encyclical is just a letter. Pope Francis did a quite good encyclical on climate change in 2015. Then he followed that up in 2023 annoyed that nothing much had been done. There’s only so much a letter, even from the Pope, can do in the face of the money.

This encyclical will help swing the vibe against AI, however. Maybe J.D. Vance will excommunicate the Pope. You know he wants to.

8
9
10
11
12
 
 

cross-posted from: https://lemmy.world/post/47166185

cross-posted from: https://lemmy.world/post/47165287

It's pretty interesting to see how China can order a US company to unwind the purchase of another company in Singapore, after it was already complete.

🇺🇸 Meta > 🇸🇬 Manus ❌ 🇨🇳 Government

#SingaporeWashing

13
14
15
 
 

The change is a result of MTP support landing in llama.cpp. The Qwen3.6 Unsloth GGUFs are now out of experimental mode, with llama.cpp has merged many PRs, and MTP is now properly supported in Unsloth.

https://unsloth.ai/docs/models/qwen3.6#mtp-guide

16
17
18
 
 

Delta-mem tackles a really annoying problem with current LLMs dealing with long contexts. Usually when we want an agent or assistant to remember things over a long conversation we just shove all the past text into the prompt. The problem is that standard attention gets computationally expensive as the context grows and the models often suffer from context rot where they just forget or ignore the middle stuff anyway. Other approaches like RAG or LoRA edits either bring in noisy retrieval steps or lock the memory into static weights that do not update well on the fly.

The authors built something called delta-mem which keeps the main LLM completely frozen and bolts on a tiny dynamic memory state. Instead of saving raw text it compresses the history into a really small 8x8 matrix representing associative memory. As new tokens come in it updates this matrix using a delta learning rule which basically checks if the current memory can predict the new information and only writes the residual difference into the state. It even has a forget gate to handle old info naturally. When the model generates a response it reads from this compressed state to tweak the query and output of the standard attention mechanism. It's a clever way to inject memory directly into the forward pass without messing with the core weights.

They also tested a few ways to write to this memory. You can update it token by token which is great for local details but prone to noise. You can average out a whole message segment and write that which smooths things out for stronger models. Or you can split the memory into multiple parallel states so facts and task progress do not overwrite each other which turned out to be really helpful for smaller backbones.

They tested it on Qwen models and it bumped the average scores significantly especially on memory heavy benchmarks like LoCoMo and Memory Agent Bench. The coolest finding is the context recovery test. They actually deleted the explicit textual history from the prompt and the model could still answer multi-hop questions using just the compressed 8x8 state. It heavily implies that we might not need massive million token context windows if we can figure out how to compress and stream memory directly into the attention layers efficiently. Plus the parameter overhead is microscopic at roughly 0.12 percent of the backbone size.

19
20
21
 
 

The AI Layoff Bill Is Coming Due, And CTOs Are Going To Pay It Twice

>We keep talking about vibe coding and AI adoption as if the only question is whether developers will be replaced. That framing misses the story. The story is that a specific kind of executive, the one who needs a progressive headline every quarter, has been running an uncontrolled experiment on your workforce. The data on that experiment is now in, and it is not flattering.

https://www.forbes.com/councils/forbestechcouncil/2026/05/14/the-ai-layoff-bill-is-coming-due-and-ctos-are-going-to-pay-it-twice/
@artificial_intel

22
23
 
 

After testing about 40 platforms the pattern is always the same. Memory is the most marketed and least delivered feature. Most apps claim to remember you but reset completely between sessions or just store what you typed in your profile manually. The ones that actually carry conversational context across weeks are rare and the difference in experience is significant. An AI that references something you mentioned three weeks ago without prompting feels qualitatively different from one treating every session like a first meeting. Just published a full breakdown: medium.com/@companaya/nomi-ai-review-2026-is-it-worth-it-tested-c91811dcb24a

24
25
view more: next ›