running the LLM, which is done with a random number generator and a massive matrix of probable next words.
Not true. Inference is done by providing the context to the pre-trained neutral network (technically a transformer network not your daddy's old multilayer perceptron) to generate possible outcomes with logprobs that are then selected based on their likelihood. If it was just frequency-based RNG, they wouldn't have any semantics in the responses and would sound more like traditional Markov chains (like when you mash a button on predictive text and it spits out correct but meaningless gibberish).
If it were just selecting random words from a matrix of probabilities without the network and attentions, it would also be waaay faster and easier to run on a potato.
The stuff about human learning also isn't quite right. There are different types of "learning" and different kinds of memory.
Sleep is generally understood physiologically to be required to formulate long term memory (eg. as described in this paper).
The previous commentator was analogising human short and mid-term memory with LLM context windows (also things like vector databases etc.) and long term memory with retraining/merging/fine tuning of LLMs. It's not totally the same but the analogy is accurate. Brain behaviour is a big influence and inspiration on how machine learning techniques are designed.
Human memory is also notoriously inaccurate and unreliable and tasks done by humans often needs to be double checked and externally verified.
This isn't to say LLMs are trustworthy or reliable. They are not. More that humans think much more highly of themselves than is really warranted.
This is a small terminology misconception. The LLM is not doing "training" during inference. It's still a "machine learning" system.
In terms of learning/retaining information in the short/mid term while the user is using it, as the context grows, it retains that information during the current session. In a lot of systems, sections of that context are then summarised and stored, indexed by a vector, to be retrieved into future contexts that have similar semantics. That's why some systems seem to be able to "remember" things from previous "conversations". Your message is vectorised and then that vector used to look up similar past interactions. The model isn't fine tuning on that, so it's not "long term" memory, but the model can take it into account for future interactions.
AI companies do then use that (and full conversation histories) to regularly fine tune the models, as well as train new ones. It might not be fresh trained every day but certainly more often than you might think.
They're a little more reliable than that and are getting significantly more capable at an alarming rate. We absolutely agree that they shouldn't be trusted and are not very accurate (nor should most humans be trusted or are accurate) but I also think it's dangerous to underestimate them.