Technology

1332 readers
1 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
1
2
3
4
5
6
7
8
9
10
11
12
13
 
 

PE-AV - Audiovisual Perception with Code

  • Meta's perception encoder for audio-visual understanding with open code release.
  • Processes both visual and audio information to isolate sound sources.
  • Paper | Code

https://preview.redd.it/k6lp7cgbou8g1.png?width=1456&format=png&auto=webp&s=f928bbd8d184e9094e7130cb36adff5f51830a80

T5Gemma 2 - Open Encoder-Decoder

  • Next generation encoder-decoder model with full open-source weights.
  • Combines bidirectional understanding with flexible text generation.
  • Blog | Model

Qwen-Image-Layered - Open Image Decomposition

  • Decomposes images into editable RGBA layers with full model release.
  • Each layer can be independently manipulated for precise editing.
  • Hugging Face | Paper | Demo

https://reddit.com/link/1ptg2x9/video/72skjufkou8g1/player

N3D-VLM - Open 3D Vision-Language Model

  • Native 3D spatial reasoning with open weights and code.
  • Understands depth and spatial relationships without 2D distortions.
  • GitHub | Model

https://reddit.com/link/1ptg2x9/video/h1npuq1mou8g1/player

Generative Refocusing - Open Depth Control

  • Controls depth of field in images with full code release.
  • Simulates camera focus changes through 3D scene inference.
  • Website | Demo | Paper | GitHub

StereoPilot - Open 2D to 3D Conversion

  • Converts 2D videos to stereo 3D with open model and code.
  • Full source release for VR content creation.
  • Website | Model | GitHub | Paper

https://reddit.com/link/1ptg2x9/video/homrv9tmou8g1/player

Chatterbox Turbo - MIT Licensed TTS

  • State-of-the-art text-to-speech under permissive MIT license.
  • No commercial restrictions or cloud dependencies.
  • Hugging Face

https://reddit.com/link/1ptg2x9/video/iceqr03jou8g1/player

FunctionGemma - Open Function Calling

  • Lightweight 270M parameter model for function calling with full weights.
  • Creates specialized function calling models without commercial restrictions.
  • Model

FoundationMotion - Open Motion Analysis

  • Labels spatial movement in videos with full code and dataset release.
  • Automatic motion pattern identification without manual annotation.
  • Paper | GitHub | Demo | Dataset

DeContext - Open Image Protection

  • Protects images from unwanted AI edits with open-source implementation.
  • Adds imperceptible perturbations that block manipulation while preserving quality.
  • Website | Paper | GitHub

EgoX - Open Perspective Transformation

  • Transforms third-person videos to first-person with full code release.
  • Maintains spatial coherence during viewpoint conversion.
  • Website | Paper | GitHub

https://reddit.com/link/1ptg2x9/video/2h8x59qpou8g1/player

Step-GUI - Open GUI Automation

  • SOTA GUI automation with self-evolving pipeline and open weights.
  • Full code and model release for interface control.
  • Paper | GitHub | Model

IC-Effect - Open Video Effects

  • Applies video effects through in-context learning with code release.
  • Learns effect patterns from examples without fine-tuning.
  • Website | GitHub | Paper
14
 
 

cross-posted from: https://lemmygrad.ml/post/10152204

I really enjoy seeing that people are indeed working on anticpaitalist licensing. I can already see the argument that it will not be enough or even a hindrance. I personally think it is a great step in the right direction but only the first.

Feel free to disagree, criticize and make suggestions.

o7

15
16
 
 

Beijing is pouring vast resources into fusion research, while the U.S. wants private industry to lead the way. The winner could reshape civilization.

Archive link: https://archive.is/20251216100920/https://www.nytimes.com/2025/12/13/climate/china-us-fusion-energy.html

17
 
 

Moore Threads just pulled the wraps off its next-gen Flower Harbor architecture — and the numbers are brutal for US chip dominance. 15× gaming performance. 50× ray tracing. 64× AI compute. 4× memory capacity.

Their Lushan gaming GPU jumps straight into modern territory with full DirectX 12 Ultimate, AI-driven rendering, and next-gen ray tracing — areas US firms claimed China would never master. Meanwhile, the Huashan AI GPU is being benchmarked directly against NVIDIA Hopper and Blackwell, matching bandwidth, pushing memory access beyondB200, and scaling to 100,000-GPU clusters.

Even today’s Moore Threads S5000 is already hitting 1,000 tokens/sec decode and 4,000 tokens/sec prefill on DeepSeek models — squarely in Hopper territory.

Washington bet export controls would freeze China in time. Instead, Chinese companies rebuilt the stack, rewrote the architecture, and are now sprinting.

18
19
20
 
 

Jesus Christ they scraped 99.6% of all of spotify

21
22
23
24
 
 

Chinese scientists have unveiled an optical computing chip that outperformed Nvidia’s leading AI hardware by over a hundredfold in speed and energy efficiency – particularly for generative tasks such as video production and image synthesis.

The LightGen chip was developed by a team from Shanghai Jiao Tong University and Tsinghua University, harnessing the speed of light to execute complex artificial intelligence workloads.

With more than 2 million photonic neurons integrated into a compact chip, LightGen can generate high-resolution images, including 3D scenes, and create videos.

The research, led by Professor Chen Yitong from Shanghai Jiao Tong University, was published in the journal Science on Friday.

Chen said LightGen could be “further scaled up” and added: “It provides a new way to bridge the new chip architectures to daily complicated AI without impairment of performance and with speed and efficiency that are orders of magnitude greater, for sustainable AI.”

With artificial intelligence advancing rapidly, generative AI can now produce realistic images and even videos – but it needs immense computing capacity and consumes large amounts of energy.

As a result, scientists have turned to photonic computing as conventional electronic chips reach their limits.

Traditional computers rely on the flow of electrons to send and process information, while photonic computing uses laser pulses instead of electrons, performing operations at the speed of light.

Optical signals also have the advantage of minimising power consumption and offering rapid responses to user requests.

However, although photonic computing systems have shown potential in specific tasks, they previously struggled to handle high-complexity generative AI tasks – such as synthesising images and generating videos – because of limitations in their computing architecture and underdeveloped training algorithms.

The LightGen team’s work focused on developing three areas: building a new architecture, developing a novel training algorithm and giving the chip high integration density.

Architecturally, the team created an “optical latent space” – similar to an expandable “highway hub” for light – where data can flow rapidly in its most compact form, allowing for the efficient compression and reconstruction of information, according to the study.

The researchers also developed a generative training algorithm that, compared with conventional versions, removed the need for massive labelled data sets.

Instead, they used an unsupervised training algorithm that allowed LightGen to learn and create by discerning statistical patterns in data along similar lines to the human learning process.

The team packed more than 2 million photonic “neurons” onto a chip of 136.5 sq mm (0.2 square inches), constructing a sophisticated network capable of handling high-resolution image generation.

Experiments highlighted some of LightGen’s abilities, including the generation of animal images at 512×512 pixel resolution with diverse categories, colours, expressions and backgrounds, which were rich in detail and logically correct.

The study said: “LightGen experimentally implemented high-resolution semantic image generation, denoising [making grainy images appear cleaner and sharper], style transfer, three-dimensional generation and manipulation.”

At a conservative estimate, LightGen achieved a system computing speed of 3.57×10⁴ Tera Operations Per Second (TOPS) and an energy efficiency of 6.64×10² TOPS/watt.

This meant its overall performance surpassed that of leading electronic chips, such as Nvidia’s market-leading A100, by more than a hundredfold.

“The improvement in computing speed and energy efficiency of LightGen corresponded well with the experimentally measured end-to-end reduction in time and energy cost when LightGen experimentally achieved generation quality comparable with that of real-world electronic AI models on Nvidia A100,” the paper said.

The researchers said LightGen could mark a significant shift in the hardware used for generative AI by making photonic computing a core platform capable of independently executing complex creative tasks.

They added that its extraordinary energy efficiency also offered a practical pathway to alleviate the growing energy demands of AI computing.

25
view more: next ›