2954
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 06 Oct 2023
2954 points (98.2% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
55056 readers
819 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 2 years ago
MODERATORS
it's not about feeling intellectually superior; words matter. I'll grant you one thing, it's definitely "artificial", but it's not intelligence!
LLMs are an evolution of Markov Chains. We have known how to create something similar to LLMs for decades, getting close to a century, we just lacked the raw horse power and the literal hundreds of terabytes of data needed to get there. Anyone who knows how markov chains work can figure out how an LLM works.
I'm not downplaying the development needed to get an LLM up and running, yes, it's harder than just taking the algorithm for a markov chain, but the real evolution is how much computer power we can shove into a small amount of space now.
Calling LLMs AI would be the same as calling a web crawler AI, or a moderation bot, or many similar things.
I recommend you to read about the chinese room experiment
LLMs are not markovian, as the new word doesn't depend only on the previous one, but it depends on the previous n words, where n is the context length. I.e. LLMs have a memory that makes the generation process non markovian.
You are probably thinking about reinforcement learning, which is most often modeled as a markov decision process
yes, as I said it's an EVOLUTION of markov chains, but the idea is the same. As you pointed out one major difference is that instead of accounting for only the last 1-5 words, it accounts for a larger context window. The LSTM is just a parler trick. Read the paper on the original transformer model https://browse.arxiv.org/pdf/1706.03762.pdf
A markov chain models a process as a transition between states were transition probabilities depends only on the current state.
A LLM is ideally less a markov chain, more similar to a discrete langevin dynamics as both have a memory (attention mechanism for LLMs, inertia for LD) and both a noise defined by a parameter (temperature in both cases, the name temperature in LLM context is exactly derived from thermodynamics).
As far as I remember the original attention paper doesn't reference markov processes.
I am not saying one cannot explain it starting from a markov chain, it is just that saying that we could do it decades ago but we didn't have the horse power and the data is wrong. We didn't have the method to simulate writing. We now have a decent one, and the horse power to train on a lot of data
I think we're splitting hairs here. Look, you're technically correct, but none of what you said disproves my point does it? Perhaps I should edit my comment to make it even more clear that it's not EXACTLY the same technology, but I don't think you'd argue with me that it's an evolution of it, right?
Common Reinforcement learning methods definitely are.
LLMs are an evolution of a markov chain as any method that is not a markov chain... I would say not directly. Clearly they share concepts as any method to simulate stochastic processes, and LLMs definitely are more recent than markov processes. Then anyone can decide the inspirations.
What I wanted to say is that, really, we are discussing about a unique new method for LLMs, that is not just "old stuff, more data".
This is my main point.