Technology

42722 readers
218 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 7 years ago
MODERATORS
1
2
 
 

X has refused to take down dozens of social media posts reported as “hate, abuse or harassment” in which prominent UK politicians, including Kemi Badenoch, have been racially abused.

In May, researchers from the social inclusion thinktank British Future reported 30 posts from this year in which the Conservative party leader was called the N-word. In each case the researchers used the platform’s “hate, abuse or harassment” reporting option. X refused to act in the majority of cases, despite repeated requests.

The Guardian understands X routinely takes action only when posts are reported to it as illegal under the UK’s Online Safety Act. In those cases, it restricts visibility in the UK, leaving the post unrestricted in other jurisdictions.

3
4
5
6
 
 

Anthropic said it will “abruptly disable” its most advanced AI models for all users after the US government ordered it to suspend access to the models for foreign nationals, citing national security concerns.

The company received the export control directive to suspend access to Fable 5 and Mythos 5 for all foreign nationals, without being given specific details of the national security concern, Anthropic said in a statement.

It is Anthropic’s understanding that the government believes there is a method of bypassing, or “jailbreaking”, a safeguard that would prevent Fable 5 from being used in identifying software vulnerabilities, the company said.

7
 
 

There's a really interesting quirk in modern architecture that a lot of people have been noticing lately referred to as the Curse of Depth in the paper. Basically if you look at popular models like Llama or Qwen or DeepSeek you will find that the deeper layers are surprisingly useless. You can completely prune away huge chunks of the later transformer blocks without actually hurting the performance of the model. The representations in these deep layers end up looking practically identical to each other, and it's a massive waste of GPU hours because we are training billions of parameters that end up doing almost nothing.

The authors trace the root cause directly to Pre-Layer Normalization. Pre-LN makes training massive transformers way more stable than the old Post-LN setups, but the catch is that as you pass data through more and more Pre-LN layers the output variance explodes exponentially. Because of how the math works out this exploding variance forces the derivatives in deep blocks to essentially become an identity matrix turning the layer into a pass-through filter that cannot learn any meaningful new transformations.

And turns out that the problem can be fixed using a remarkably simple tweak called Layer Norm Scaling. They literally just scale the output of the layer norm inversely by the square root of the layer depth. This completely stops the variance from blowing up as you go deeper into the network. Because the variance stays under control the deep layers actually wake up and start contributing to the representation learning.

They tested this trick on models ranging from tiny 130M parameter setups all the way to 7B parameter models. In every case Layer Norm Scaling beat out standard Pre-LN and other normalization tricks. The pre-training loss drops significantly and those gains carry right over into supervised fine-tuning tasks. Best of all it requires zero new hyperparameters or learnable weights. It is just a clean mathematical fix to a fundamental architectural flaw.

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 
 

cross-posted from: https://hexbear.net/post/8729236

A German court has ruled that Google is directly liable for what its AI search overviews say. Previous case law shielding search engine operators from liability doesn't apply to AI overviews.

michael-laugh

view more: next ›