this post was submitted on 22 Feb 2026
82 points (98.8% liked)
Fuck AI
6212 readers
3724 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ah I see what you're getting at.
I'd like to preface by apologising, because this became a very lengthy comment. I've written a TL:DR in the bottom that I think carries the main point across, all the rest is a semi-technical rather loosey-goosey rundown of how language models work. I just hope it's coherent enough for someone to understand what I'm trying to convey.
So without further ado.
A language model doesn't really train on text. It trains on what's called tokens. As you feed it training data, before it reaches the ML algorithm it goes through a tokeniser. Huggingface has a functional browser-based example here.
A tokeniser essentially splits up the input characters (including whitespace, tabulators, carriage returns etc.) and assigns them numerical identifiers. This is done for the entire dataset before you train the model.
It could look something like this
Onceuponatime.Thus while you read
"strawberry"as its own thing, an LLM might get the input1, 496, 675, 15717, 1In essence instead of checking each character individually, you end up with a large dictionary of character groupings, with numerical equivalents, allowing you to do maths with them.
Which is what you do. After the tokenisation the algorithm generates embeddings, which is essentially meant to capture the semantics of language, that is tokens represent individual building blocks of language, and embeddings is what defines the relationship between these tokens. This is stored in something called a tensor, which is in essence a multi-dimensional map. Just like how we map locations in 2D/3D space, machine learning algorithms map "concepts" in sometimes many hundred-dimensional maps.
The embeddings is how an LLM can infer that the word "conceal" and "hide" are related, and that the former is generally considered fancier than the latter. I can almost guarantee that if you were to ask an LLM to rephrase
Jane stashed the goods behind the crapperin a fancier, more professional manner, it'd come up with something likeJane concealed the items in the bathroomThis is in part what makes it so hard to glean information from a model, you can't just open up the weights and extract the original training data, it's been chunked, processed, and categorised, and what you end up with is just many different pointers to and between a (relative) few tokens.
For a very long time the context window of these models was very small, and as a result you ended up with outputs that weren't very related to one another. I'm sure you've seen these memes where someone goes "Type
I wishand press the middle word in your keyboard and see what you get" kind of memes, and they usually spiral off into nonsense.That's where the transformer architecture (the T in GPT) came into play. In short, it allowed the models to have a larger "working memory" and thus they could retain and extend that semantic context further. They could build more advanced networks of relationships and it's the source of the current "AI" craze. The models started inferring more distant relationships with words, which is what has given rise to this illusion of intelligence.
Once you have a model trained it's very hard to modify that. You can train auxiliary models to kind of bias the model in various directions. You can write system prompts to try and coax the model into a certain kind of output, but since it isn't actually a thinking thing, they can still go off script. You can do a sort of reverse-engineering of sorts, toggling on and off certain neurons in the model to see how a concept might relate to another, though just like with regular brains a single neuron doesn't typically handle a single thing, and so this is a very time-consuming task.
In the end, the model you train is entirely deterministic, because it's all mathematics. Computers are by their very nature deterministic. The model you train isn't intelligent, and given a particular input it will always produce the same output.
If you've played Minecraft you're probably familiar with the concept of seeds. Just like an LLM, Minecraft's world generation algorithm is deterministic, and if you provide a particular seed value for the randomiser, it will always produce the same world. If you don't input a seed value the game generates a random value and uses that, which is why whenever you start a new world you'll always end up with something new.
That's basically what LLMs do to. When selecting words to continue the given input, it uses a process called stochastic sampling. In essence, for each token input, it gets a bunch of probable tokens that might follow, it organises these in a probability distribution based on a variable called temperature, and then it selects a token from that distribution.
The temperature value essentially controls how randomly it can select words. The lower the temperature setting is the more curved the distribution gets. With a really low temperature setting the deterministic nature of the model shines through. As the temperature increases, the curve flattens and more random tokens might get selected.
At this point, the big "AI" companies have basically sucked the data well dry. They're trying to find more ways of making more data to train on, because what gave them the biggest, most remarkable progress in the past was increasing the quantity of training data. More and more LLM generated text is making it into these models, and existing patterns get reinforced.
TL:DR
I've written this entire comment myself. It is in a sense a mirror of me as a person; the way I punctuate things, the words I choose, the structure in which I've decided to describe things. You can infer bits and pieces about me from it; I've obviously had an interest in machine learning for a while, given the markdown usage I'm perhaps a bit more technically inclined, I might not be an English native, but I've a preference for British English.
Now, I could feed this entire comment through an LLM and you'd get a coherent output. It'd likely change my verbiage, fix the way I punctuate things, perhaps restructure things and make the text overall neater.
However, anything that was me in this text would be lost. There'd no longer be a person to infer anything about. No choices were made in the process of outputting the text. There is no inherent preference on anything because it's all just normalised pseudo-random output from a weighted probability matrix based on a corpus of as much text as whoever trained the LLM could get their hands on, be that legally or otherwise.
That is, I think, essentially what the article is talking about.
Thanks for the comprehensive write-up! I guess that makes a lot of sense. I mean if we're just talking about regular AI assistant output, Sure. I see that as well. I mean I also have an additional issue with how these things are tuned... I never liked the tone, especially ChatGPT does. It is way to repetitive but in an annoying generic tone, mixed with know-it-all vibes. But it doesn't know it all. And then it talks to me like I'm 4 years old, and it's my helpful sycophant. Outputs 3 pages of text to any simple task/question and there's about no substance to all the many sentences. Unless it decides to lecture me on ethics... Saying my email is phrased way too harshly. And then it goes ahead to replace my witty sarcasm with some bland phraseology like we're doing some customer support hell... I see no reason to use it as a tool to "refine" my emails. Though, I think that's mainly due to the role-playing as a "helpful assistant" which people seem to prefer?! Not sure if that's necessarily in the maths. But it's enough to deter me... Well, that, and the fact that it removes key information in some ill-suited attempt to "summarize" or brush over important paragraphs.