EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus 2 September, Switzerland’s first large-scale, open, multilingual language model — a milestone in generative AI for transparency and diversity.

Researchers from EPFL, ETH Zurich and CSCS have developed the large language model Apertus – it is one of the largest open LLMs and a basic technology on which others can build.

In brief Researchers at EPFL, ETH Zurich and CSCS have developed Apertus, a fully open Large Language Model (LLM) – one of the largest of its kind. As a foundational technology, Apertus enables innovation and strengthens AI expertise across research, society and industry by allowing others to build upon it. Apertus is currently available through strategic partner Swisscom, the AI platform Hugging Face, and the Public AI network. ...

The model is named Apertus – Latin for “open” – highlighting its distinctive feature: the entire development process, including its architecture, model weights, and training data and recipes, is openly accessible and fully documented.

AI researchers, professionals, and experienced enthusiasts can either access the model through the strategic partner Swisscom or download it from Hugging Face – a platform for AI models and applications – and deploy it for their own projects. Apertus is freely available in two sizes – featuring 8 billion and 70 billion parameters, the smaller model being more appropriate for individual usage. Both models are released under a permissive open-source license, allowing use in education and research as well as broad societal and commercial applications. ...

Trained on 15 trillion tokens across more than 1,000 languages – 40% of the data is non-English – Apertus includes many languages that have so far been underrepresented in LLMs, such as Swiss German, Romansh, and many others. ...

Furthermore, for people outside of Switzerland, the external pagePublic AI Inference Utility will make Apertus accessible as part of a global movement for public AI. "Currently, Apertus is the leading public AI model: a model built by public institutions, for the public interest. It is our best proof yet that AI can be a form of public infrastructure like highways, water, or electricity," says Joshua Tan, Lead Maintainer of the Public AI Inference Utility."

top 50 comments

sorted by: hot top controversial new old

[–] frongt@lemmy.zip 65 points 3 months ago (5 children)

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

Available doesn't mean licensed for AI training.

[–] schnurrito@discuss.tchncs.de 33 points 3 months ago (6 children)

and yet it is still a legally unsettled question whether LLM training requires a copyright license at all; and it is my opinion that no one should want that to be the case, why would people on the Internet want to argue for an expansion of copyright law?

[–] Fedizen@lemmy.world 35 points 3 months ago

Saying an expensive product that requires servers to run is the only thing exempt from copyright is just handing a bunch of giant corporations a get out of jail free card.

Either reform copyright so more things are public domain or require AI companies to pursue licenses to training data.

Giving an unfair exemption to copyright laws solely to giant tech companies is just another corporate handout.

[–] finalarbiter@lemmy.dbzer0.com 18 points 3 months ago* (last edited 3 months ago)

What I want is consistency, either apply the law equally and fairly or reform the whole system. Nobody, especially not big business, should be getting special carve-outs to be exempt from copyright infringement outside of 'fair use' considerations.

In my ideal world, IP law would be framed to protect novel ideas just long enough for inventors or creators to capitalize on their ideas and prevent outright 1:1 copying without any sort of innovative or transformational changes. It would also discourage squatting on things like patents- patent squatting and the like should lead to losing rights.

[–] Cethin@lemmy.zip 14 points 3 months ago (2 children)

As with all things, nuance and context is required. I don't think we should be taxing poor people that heavily (if at all), but does that mean I should be against taxing the ultra-wealthy more? Obviously not.

I support copyright to protect developers and not hinder users, hobbyists, or the average person. I don't support it to only help massive companies who can manipulate the law to protect them from competition, but also not hinder them from stealing from the masses. They can afford to pay. If AI is actually as valuable as they say, the price of paying for the training data is trivial.

Copyright shouldn't only be helpful to big businesses. It should be most helpful to the average person. We have the opposite here. I support modifying copyright law to bind big businesses and liberate individuals. I don't need to be totally against it like you imply.

[–] chicken@lemmy.dbzer0.com 6 points 3 months ago* (last edited 3 months ago) (2 children)

But we can't afford to pay. I don't think open models like the one in the OP article would be developed and released for free to the public if there was a complex process of paying billions of dollars to rightsholders in order to do so. That sort of model would favor a monopoly of centralized services run only by the biggest companies.

load more comments (2 replies)

[–] partofthevoice@lemmy.zip 5 points 3 months ago* (last edited 3 months ago) (1 children)

Sadly, we’ll most likely see an influx of regulation right when it’s broadly accessible to the general public to run locally.

[–] Cethin@lemmy.zip 3 points 3 months ago (5 children)

Yeah, most likely, and it'll only bind users and protect the businesses, as always.

It already is broadly accessible to the general public. They just don't know about it or just accept using one of the cloud versions. It's trivial to get up and running at this point.

load more comments (5 replies)

[–] frongt@lemmy.zip 6 points 3 months ago (2 children)

Why would it be an expansion? If you're using someone else's work, why wouldn't you need a license? If I write a book and publish it under CC-BY-NC, should Google be allowed to take my work for their commercial product without compensation or even attribution? Should Microsoft be allowed to create closed-source commercial Copilot off GPL source code?

[–] schnurrito@discuss.tchncs.de 6 points 3 months ago (1 children)

It's an expansion to say that LLM training constitutes a derivative work. You are of course entitled to your opinion that it should be the case; all I can say to that is that in the 2000s and 2010s nearly everyone on the Internet tended to argue for more limitations, not further expansions, of copyright law, and I wonder what happened to that attitude.

[–] frongt@lemmy.zip 5 points 3 months ago* (last edited 3 months ago) (2 children)

Well, this being the open source community, I would expect most people here to be on the side of respecting the rights of content creators. Like I said, if I write some GPL software, I don't think Microsoft should be able to disrespect my license just because they're also disrespecting everyone else's license too through automation at scale.

Edit: forgot to mention, since their product is wholly dependent on the other works, that's the very definition of a derivative work. While you could argue it's transformative, it certainly fails the other tests for fair use.

[–] General_Effort@lemmy.world 3 points 3 months ago

I find it very unexpected. It used to be understood that IP laws favor monopolies. EG I don't remember the OS community being on the side of Oracle in https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_Inc.

Maybe it just passed me by.

load more comments (1 replies)

[–] BaroqueInMind@piefed.social 3 points 3 months ago* (last edited 3 months ago)

BSD license allows for this and still thrives (PS5 OS, Apple iOS and MacOS [Darwin], TrueNAS, OPNsense, and several enterprise-level commercial router operating systems use it and contribute significant code back into BSD project to ensure CVE safety). I'm not agreeing with it, just providing an alternate perspective.

[–] JackbyDev@programming.dev 4 points 3 months ago (1 children)

"I didn't steal and distribute your work, I just made a machine distill it down and able to copy everything meaningful about it!"

load more comments (1 replies)

[–] benagain@lemmy.ml 20 points 3 months ago

"Your honor, my archive of Linux ISOs were acquired under the pretense that they were 'publicly available' and the copyright holders didn't 'opt-out' using the 'up-for-grabs.txt' standard I invented."

[–] exu@feditown.com 11 points 3 months ago

Still much better, especially with respecting opt-outs, than most other LLMs

[–] pennomi@lemmy.world 4 points 3 months ago

Legally, it seems it does, at least in the US and EU. I assume China too.

Whether or not it should is a different argument, but copyright is a legal framework, not an ethical one.

load more comments (1 replies)

[–] E_coli42@lemmy.world 21 points 3 months ago (2 children)

Is this hosted somewhere? Maybe distributed? I would love a privacy respecting distributed LLM chatbot.

[–] xcjs@programming.dev 19 points 3 months ago* (last edited 3 months ago) (4 children)

In case you're not aware, there are a decent number of open weight (and some open source) large language models.

The Ollama project makes it very approachable to download and use these models.

[–] Xylight@lemdro.id 11 points 3 months ago (13 children)

Ollama has taken a bad turn lately (such is the nature of VC backed software). Maybe recommend kobold.cpp for LLM noobs instead

[–] xcjs@programming.dev 4 points 3 months ago

I'm keeping an eye on Ollama's service offerings - I don't think they're in enshittification territory yet, but I definitely share the concern.

I still don't believe the other LLM engines out there have reached an equivalent ease of use compared to Ollama, and I still recommend it for now. If nothing else, it can be a stepping stone to other solutions for some.

load more comments (12 replies)

[–] PandaInSpace@kbin.earth 3 points 3 months ago (1 children)

Other than Apertus, are there any truly open source models - mainly what I want to know is models that list their training data publicly to ensure no theft of art and stuff. (i replied to your comment as you seem to know about these models, I have no clue abou this stuff)

[–] xcjs@programming.dev 3 points 3 months ago* (last edited 3 months ago)

Deepseek R1 and OpenThinker are two more examples. There's also SmolLM, which I believe also open sources its training data and ensures proper licensing for it.

load more comments (2 replies)

[–] Cooper8@feddit.online 8 points 3 months ago

Links in the article. Hugging Face and Swiss Telecoms host

[–] StrixUralensis@tarte.nuage-libre.fr 20 points 3 months ago

aperture science logo from the videogame series "Portal"

[–] ABetterTomorrow@sh.itjust.works 11 points 3 months ago (3 children)

I can’t find any hardware requirements for this. What will it take to run this smoothly?

[–] ArsonButCute@lemmy.dbzer0.com 13 points 3 months ago* (last edited 3 months ago) (2 children)

8b parameter models are relatively fast on 3rd gen RTX hardware with at least 8gigs of vram, CPU inferencing is slower and requires boatloads of ram but is doable on older hardware. These really aren't designed to run on consumer hardware, but the 8b model should do fine on relatively powerful consumer hardware.

If you have something that would've been a high end gaming rig 4 years ago, you're good.

If you wanna be more specific, check huggingface, they have charts. If you're using linux with nvidia hardware you'll be better off doing CPU inferencing.

Edit: Omg y'all I didn't think I needed to include my sources but this is quite literally a huge issue on nvidia. Nvidia works fine on linux but you're limited to whatever VRAM is on your video card, no RAM sharing. Y'all can disagree all you want but those are the facts. Thays why AMD and CPU inferencing are more reliable, and allow for higher context limits. They are not faster though.

Sources for nvidia stuff https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618

https://forums.developer.nvidia.com/t/shared-vram-on-linux-super-huge-problem/336867/

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/758

https://forums.opensuse.org/t/is-anyone-getting-vram-backed-by-system-memory-with-nvidia-drivers/185902

[–] ABetterTomorrow@sh.itjust.works 2 points 3 months ago

Thanks for the reply. Never been on the HF site and doing it on mobile of the first time I seem lost. I couldn’t find it but I’m sure I will.

[–] Jakeroxs@sh.itjust.works 2 points 3 months ago (8 children)

Disagree on Linux nvidia support, it works fine

load more comments (8 replies)

[–] General_Effort@lemmy.world 9 points 3 months ago

For fastest inference, you want to fit the entire model in VRAM. Plus, you need a few GB extra for context.

Context means the text (+images, etc) it works on. That's the chat log, in the case of a chatbot, plus any texts you might want summarized/translated/ask questions about.

Models can be quantized, which is a kind of lossy compression. They get smaller but also dumber. As with JPGs, the quality loss is insignificant at first and absolutely worth it.

Inference can be split between GPU and CPU, substituting VRAM with normal RAM. Makes it slower, but you'll probably will still feel that it's smooth.

Basically, it's all trade-offs between quality, context size, and speed.

load more comments (1 replies)

load more comments