this post was submitted on 25 Jan 2026
14 points (88.9% liked)
Technology
1359 readers
51 users here now
A tech news sub for communists
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The more I look into this space, the more I think the main limiting factor is the context window. Test-time compute is key to squeezing extra performing out of smaller models, but context rot is still a problem, especially with techniques like GQA or MQA sacrificing quality for memory usage. Of course, you can just use MHA instead, but the high memory usage defeats the point of a small local model, and it still suffers from context rot to a lesser degree. I'm not sure if this problem can be fixed while staying with transformers.
I expect that the approach going forward will be to break up large applications into small components that can be reasoned about in isolation. Such components can then be composed together to accomplish tasks of increased complexity. I like to think of this as a Lego model of software development. Each component can be viewed as a Lego block, and we can compose these Lego blocks in many different ways as we solve different problems.
The problem being solved can be expressed in terms of a workflow represented by a graph where the nodes compute the state, and the edges represent transitions between the states. Each time we enter a node in this graph, we look at the input, decide what additional data we may need, run the computation, and transition to the next state. Each node in the graph is a Lego block that accomplishes a particular task. These nodes are then connected by a layer of code governs the data flow. Individual agents can then manage each of these components, and only have to worry about a fixed scope problem. Then a separate agent could be used to manage the connections between these components and overall data flow within the application.
Incidentally, this is already an approach that's become popular in the industry with stuff like microservices, which create hard boundaries by limiting shared state within the application. The internal mechanics of individual components are abstracted over by the API, and they're only a concern of the agent working on them. The idea here is to keep individual contexts small which helps keep agents sane since it avoids context rot. Each agent has a small and relatively simple program to maintain that does one thing, while a high level coordinator manages the flow of data through the whole system. And this whole architecture can then be layered as well. You can take a whole system like this and treat it as a black box where you only care about the surface level API, and plug it together into a nigh higher level graph. So, you can keep scaling indefinitely.
What you'd really want to do is design a system specifically around this workflow instead of trying to make something general purpose. Create as many restrictions as you can to keep it on track. The key bit to making the whole thing work will be to have a strong specification language which allows encoding the constraints you want. Languages like Haskell get you a lot of the way there as you can encode complex rules directly in the type system. Once you have a contract the LLM has to fill, it can't cheat and pretend that it did the work without doing it.
There would also be a massive benefit to picking a single language like Lisp or Erlang which has simple grammar, and a live REPL. This sort of environment allows for fast iteration, and let's the agent try things and see feedback immediately. Erlang OTP in particular is very appealing here since it's already designed around the concept of micro processes where you can literally spin up millions of them. They're all isolated, and communicate via message passing, so you have no shared state between them. The OTP system manages their orchestration, restarts them when they fail, manages hot swapping running processes with new versions, and so on.