technology

24171 readers
375 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
1
15
Hexbear Code-Op (hexbear.net)
submitted 10 months ago* (last edited 10 months ago) by RedWizard@hexbear.net to c/technology@hexbear.net
 
 

Where to find the Code-Op

Wow, thanks for the stickies! Love all the activity in this thread. I love our coding comrades!


Hey fellow Hexbearions! I have no idea what I'm doing! However, born out of the conversations in the comments of this little thing I posted the other day, I have created an org on GitHub that I think we can use to share, highlight, and collaborate on code and projects from comrades here and abroad.

  • I know we have several bots that float around this instance, and I've always wondered who maintains them and where their code is hosted. It would be cool to keep a fork of those bots in this org, for example.
  • I've already added a fork of @WhyEssEff@hexbear.net's Emoji repo as another example.
  • The projects don't need to be Hexbear or Lemmy related, either. I've moved my aPC-Json repo into the org just as an example, and intend to use the code written by @invalidusernamelol@hexbear.net to play around with adding ICS files to the repo.
  • We have numerous comrades looking at mainlining some flavor of Linux and bailing on windows, maybe we could create some collaborative documentation that helps onboard the Linux-curious.
  • I've been thinking a lot recently about leftist communication online and building community spaces, which will ultimately intersect with self-hosting. Documenting various tools and providing Docker Compose files to easily get people off and running could be useful.

I don't know a lot about GitHub Orgs, so I should get on that, I guess. That said, I'm open to all suggestions and input on how best to use this space I've created.

Also, I made (what I think is) a neat emblem for the whole thing:

Todos

  • Mirror repos to both GitHub and Codeberg
  • Create process for adding new repos to the mirror process
  • Create a more detailed profile README on GitHub.

Done

spoiler

  • ~~Recover from whatever this sickness is the dang kids gave me from daycare.~~
2
 
 
3
4
 
 

When you employ AI agents, there’s a significant volume problem for document study. Reading one file of 1000 lines consumes about 10,000 tokens. Token consumption incurs costs and time penalties. Codebases with dozens or hundreds of files, a common case for real world projects, can easily exceed 100,000 tokens in size when the whole thing must be considered. The agent must read and comprehend, and be able to determine the interrelationships among these files. And, particularly, when the task requires multiple passes over the same documents, perhaps one pass to divine the structure and one to mine the details, costs multiply rapidly.

Matryoshka is a tool for document analysis that achieves over 80% token savings while enabling interactive and exploratory analysis. The key insight of the tool is to save tokens by caching past analysis results, and reusing them, so you do not have to process the same document lines again. These ideas come from recent research, and retrieval-augmented generation, with a focus on efficiency. We'll see how Matryoshka unifies these ideas into one system that maintains a persistent analytical state. Finally, we'll take a look at some real-world results analyzing the anki-connect codebase.


The Problem: Context Rot and Token Costs

A common task is to analyze a codebase to answers a question such as “What is the API surface of this project?” Such work includes identifying and cataloguing all the entry points exposed by the codebase.

Traditional approach:

  1. Read all source files into context (~95,000 tokens for a medium project)
  2. The LLM analyzes the entire codebase’s structure and component relationships
  3. For follow-up questions, the full context is round-tripped every turn

This creates two problems:

Token Costs Compound

Every time, the entire context has to go to the API. In a 10-turn conversation about a codebase of 7,000 lines, almost a million tokens might be processed by the system. Most of those tokens are the same document contents being dutifully resent, over and over. The same core code is sent with every new question. This redundant transaction is a massive waste. It forces the model to process the same blocks of text repeatedly, rather than concentrating its capabilities on what’s actually novel.

Context Rot Degrades Quality

As described in the Recursive Language Models paper, even the most capable models exhibit a phenomenon called context degradation, in which their performance declines with increasing input length. This deterioration is task-dependent. It’s connected to task complexity. In information-dense contexts, where the correct output requires the synthesis of facts presented in widely dispersed locations in the prompt, this degradation may take an especially precipitous form. Such a steep decline can occur even for relatively modest context lengths, and is understood to reflect a failure of the model to maintain the threads of connection between large numbers of informational fragments long before it reaches its maximum token capacity.

The authors argue that we should not be inserting prompts into the models, since this clutters their memory and compromises their performance. Instead, documents should be considered as external environments with which the LLM can interact by querying, navigating through structured sections, and retrieving specific information on an as-needed basis. This approach treats the document as a separate knowledge base, an arrangement that frees up the model from having to know everything.


Prior Work: Two Key Insights

Matryoshka builds on two research directions:

Recursive Language Models (RLM)

The RLM paper introduces a new methodology that treats documents as external state to which step-by-step queries can be issued, without the necessity of loading them entirely. Symbolic operations, search, filter, aggregate, are actively issued against this state, and only the specific, relevant results are returned, maintaining a small context window while permitting analysis of arbitrarily large documents.

Key point is that the documents stay outside the model, and only the search results enter the context. This separation of concerns ensures that the model never sees complete files, instead, a search is initiated to retrieve the information.

Barliman: Synthesis from Examples

Barliman, a tool developed by William Byrd and Greg Rosenblatt, shows that it is possible to use program synthesis without asking for precise code specifications. Instead, input/output examples are used, and a solver engine is used as a relational programming system in the spirit of miniKanren. Barliman uses such a system to synthesize functions that satisfy the constraints specified. The system interprets the examples as if they were relational rules, and the synthesis engine tries to satisfy them. This approach makes it possible to describe what is desired for concrete test cases.

The approach is to simply show examples of the kind of behavior one wishes the system to exhibit, letting it derive the implmentation on its own. Thus, the emphasis shifts from writing long and detailed step-by-step recipes for behavior to simply portraying, in a declarative fashion, what the desired goal is.


Matryoshka: Combining the Insights

Matryoshka incorporates these insights into a functioning system for LLM agents. A practical tool is provided that enables agents to decompose challenging tasks into a sequence of smaller and more manageable objectives.

1. Nucleus: A Declarative Query Language

Instead of issuing commands, the LLM describes what it wants, using Nucleus, a simple S-expression query language. This changes the focus from describing each step to specifying the desired outcome.

(grep "class ")           ; Find all class definitions
(count RESULTS)           ; Count them
(map RESULTS (lambda x    ; Extract class names
  (match x "class (\\w+)" 1)))

We observe that the declarative interface retains its robustness even when the LLM employs different vocabulary or sentence structures. This robustness originates from the system’s commitment to elucidating the underlying intent of a request, independent of superficial linguistic variations.

2. Pointer-Based State

The key new insight is that we can separate the results from the context. Results are now stored in the REPL state, rather than in the context.

When the agent runs (grep "def ") and gets 150 matches:

  • Traditional tools: All 150 lines are fed into context, and round-tripped every turn
  • Matryoshka: Binds matches to RESULTS in the REPL, returning only "Found 150 results"

The variable RESULTS is bound to the actual value in the REPL. This binding acts as a pointer, revealing the location of the data within the server's memory. Subsequent operations, queries, for example, or updates, use this reference to access the data. But the data itself never actually enters the conversation:

Turn 1: (grep "def ")         → Server stores 150 matches as RESULTS
                              → Context gets: "Found 150 results"

Turn 2: (count RESULTS)       → Server counts its local RESULTS
                              → Context gets: "150"

Turn 3: (filter RESULTS ...)  → Server filters locally
                              → Context gets: "Filtered to 42 results"

The LLM never sees the 150 function definitions, just the aggregated answers from these functions.

3. Synthesis from Examples

When queries need custom parsing, Matryoshka synthesizes functions from examples:

(synthesize_extractor
  "$1,250.00" 1250.00
  "€500" 500
  "$89.99" 89.99)

The synthesizer learns the pattern directly from examples, obtaining numerical values straight from the currency strings and entirely circumventing the need to construct manual regex.


The Lifecycle

A typical Matryoshka session:

1. Load Document

(load "./plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

The document is parsed and stored server-side. Only metadata enters the context.

2. Query Incrementally

(grep "@util.api")
→ "Found 122 results, bound to RESULTS"
   [402] @util.api()
   [407] @util.api()
   ... (showing first 20)

Each query returns a preview plus the count. Full data stays on server.

3. Chain Operations

(count RESULTS)           → 122
(filter RESULTS ...)      → "Filtered to 45 results"
(map RESULTS ...)         → Transforms bound to RESULTS

Operations chain through the RESULTS binding. Each step refines without re-querying.

4. Close Session

(close)
→ "Session closed, memory freed"

Sessions auto-expire after 10 minutes of inactivity.


How Agents Discover and Use Matryoshka

Matryoshka integrates with LLM agents via the Model Context Protocol (MCP).

Tool Discovery

When the agent starts, it launches Matryoshka as an MCP server and receives a tool manifest:

{
  "tools": [
    {
      "name": "lattice_load",
      "description": "Load a document for analysis..."
    },
    {
      "name": "lattice_query",
      "description": "Execute a Nucleus query..."
    },
    {
      "name": "lattice_help",
      "description": "Get Nucleus command reference..."
    }
  ]
}

The agent sees the available tools and their descriptions. When a user asks to analyze a file, it decides which tools to use based on the task.

Guided Discovery

The lattice_help tool returns a command reference, teaching the LLM the query language on-demand:

; Search commands
(grep "pattern")              ; Regex search
(fuzzy_search "query" 10)     ; Fuzzy match, top N
(lines 10 20)                 ; Get line range

; Aggregation
(count RESULTS)               ; Count items
(sum RESULTS)                 ; Sum numeric values

; Transformation
(map RESULTS fn)              ; Transform each item
(filter RESULTS pred)         ; Keep matching items

The agent learns capabilities incrementally rather than needing upfront training.

Session Flow

User: "How many API endpoints does anki-connect have?"

Agent: [Calls lattice_load("plugin/__init__.py")]
        → "Loaded: 2,244 lines"

Agent: [Calls lattice_query('(grep "@util.api")')]
        → "Found 122 results"

Agent: [Calls lattice_query('(count RESULTS)')]
        → "122"

Agent: "The anki-connect plugin exposes 122 API endpoints,
         decorated with @util.api()."

Each tool invocation maintains its own state within the conversation. So, for example, when a document is loaded, that content is retained in memory. Similarly, the results of any query that is executed are saved and available for later use.


Real-World Example: Analyzing anki-connect

Let's walk through a complete analysis of the anki-connect Anki plugin. Here we have a real-world codebase with 7,770 lines across 17 files.

The Task

"Analyze the anki-connect codebase: find all classes, count API endpoints, extract configuration defaults, and document the architecture."

The Workflow

The agent uses Matryoshka's prompt hints to accomplish the following workflow:

  1. Discover files with Glob
  2. Read small files directly (<300 lines)
  3. Use Matryoshka for large files (>500 lines)
  4. Aggregate across all files

Step 1: File Discovery

Glob **/*.py → 15 Python files
Glob **/*.md → 2 markdown files

File sizes:
  plugin/__init__.py    2,244 lines  → Matryoshka
  plugin/edit.py          458 lines  → Read directly
  plugin/web.py           301 lines  → Read directly
  plugin/util.py          107 lines  → Read directly
  README.md             4,660 lines  → Matryoshka
  tests/*.py           11 files      → Skip (tests)

Step 2: Read Small Files

Reading util.py (107 lines) reveals configuration defaults:

DEFAULT_CONFIG = {
    'apiKey': None,
    'apiLogPath': None,
    'apiPollInterval': 25,
    'apiVersion': 6,
    'webBacklog': 5,
    'webBindAddress': '127.0.0.1',
    'webBindPort': 8765,
    'webCorsOrigin': None,
    'webCorsOriginList': ['http://localhost/'],
    'ignoreOriginList': [],
    'webTimeout': 10000,
}

Reading web.py (301 lines) reveals the server architecture:

  • Classes: WebRequest, WebClient, WebServer
  • JSON-RPC style API with jsonschema validation
  • CORS support with configurable origins

Step 3: Query Large Files with Matryoshka

; Load the main plugin file
(load "plugin/__init__.py")
→ "Loaded: 2,244 lines, 71.5 KB"

; Find all classes
(grep "^class ")
→ "Found 1 result: [65] class AnkiConnect:"

; Count methods
(grep "def \\w+\\(self")
→ "Found 148 results"

; Count API endpoints
(grep "@util.api")
→ "Found 122 results"

; Load README for documentation
(load "README.md")
→ "Loaded: 4,660 lines, 107.2 KB"

; Find documented action categories
(grep "^### ")
→ "Found 13 sections"
   [176] ### Card Actions
   [784] ### Deck Actions
   [1231] ### Graphical Actions
   ...

Complete Findings

Metric Value
Total files 17 (15 .py + 2 .md)
Total lines 7,770
Classes 8 (1 main + 3 web + 4 edit)
Instance methods 148
API endpoints 122
Config settings 11
Imports 48
Documentation sections 8 categories, 120 endpoints

Token Usage Comparison

Approach Lines Processed Tokens Used Coverage
Read everything 7,770 ~95,000 100%
Matryoshka only 6,904 ~6,500 65%
Hybrid 7,770 ~17,000 100%

The hybrid method achieves a 82% savings in tokens while retaining 100% of the original coverage. This approach combines two different strategies, one for compressing redundant information and one for preserving unique insights.

The pure Matryoshka approach ends up missing details from small files (configuration defaults, web server classes), because the agent only uses the tool to query large ones. The hybrid workflow does direct, full-content reads on small files, while leveraging Matryoshka to analyze bigger files, in a kind of divide-and-conquer strategy. All that's needed is to provide the agent an explicit hint on the strategy to use.

Why Hybrid Works

Small files (<300 lines) contain critical details:

  • util.py: All configuration defaults, the API decorator implementation
  • web.py: Server architecture, CORS handling, request schema

These fit comfortably in context, and there's no need to do anything different. Matryoshka adds value for:

  • __init__.py (2,244 lines): Query specific patterns without loading everything
  • README.md (4,660 lines): Search documentation sections on demand

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Adapters                             │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────────┐ │
│  │   Pipe   │  │   HTTP   │  │   MCP Server          │ │
│  └────┬─────┘  └────┬─────┘  └───────────┬───────────┘ │
│       │             │                     │             │
│       └─────────────┴─────────────────────┘             │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │   LatticeTool     │                    │
│                │   (Stateful)      │                    │
│                │   • Document      │                    │
│                │   • Bindings      │                    │
│                │   • Session       │                    │
│                └─────────┬─────────┘                    │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │  NucleusEngine    │                    │
│                │  • Parser         │                    │
│                │  • Type Checker   │                    │
│                │  • Evaluator      │                    │
│                └─────────┬─────────┘                    │
│                          │                               │
│                ┌─────────┴─────────┐                    │
│                │    Synthesis      │                    │
│                │  • Regex          │                    │
│                │  • Extractors     │                    │
│                │  • miniKanren     │                    │
│                └───────────────────┘                    │
└─────────────────────────────────────────────────────────┘

Getting Started

Install from npm:

npm install matryoshka-rlm

As MCP Server

Add to your MCP configuration:

{
  "mcpServers": {
    "lattice": {
      "command": "npx",
      "args": ["lattice-mcp"]
    }
  }
}

Programmatic Use

import { NucleusEngine } from "matryoshka-rlm";

const engine = new NucleusEngine();
await engine.loadFile("./document.txt");

const result = engine.execute('(grep "pattern")');
console.log(result.value); // Array of matches

Interactive REPL

npx lattice-repl
lattice> :load ./data.txt
lattice> (grep "ERROR")
lattice> (count RESULTS)

Conclusion

Matryoshka embodies the principle, emerging from RLM research, that documents are to be treated as external environments rather than as contexts to be parsed. This principle alters the fundamental character of the model’s engagement, no longer a passive reader but an active agent, navigating through and interrogating a document to extract specific information, somewhat as a programmer would browse through code. Combined with Barliman-style synthesis, in which a solution is built up in a series of small, well-defined steps, and pointer-based state management, it achieves:

  • 82% token savings on real-world codebase analysis
  • 100% coverage when combined with direct reads for small files
  • Incremental exploration where each query builds on previous results
  • No context rot because documents stay outside the model

We observe that variable bindings such as RESULTS refer to REPL state rather than holding data directly in model context. As we formulate and submit queries, what is sent to the server are mere pointers, placeholders indicating where the actual computation should occur. It is the server that executes the substantive computational tasks, returning only the distilled results.

source here: https://git.sr.ht/~yogthos/matryoshka

5
 
 

GM CEO Mary Barra says fully electric vehicles are the true end goal, arguing that PHEVs add cost and complexity without delivering the full benefits of EVs. Her stance reinforces GM’s belief that long-term investment should focus on pure EV platforms, not transitional powertrains.

6
7
 
 

Energy insecurity caused by geopolitical conflict will only lead to diminishing LNG and oil imports in key markets around the world.

8
9
10
11
 
 
12
 
 

cross-posted from: https://news.abolish.capital/post/20103

If you’re a typical American, you get home from work and start flipping switches and turning knobs — doing laundry, cooking dinner, watching TV. With so many other folks doing the same, the strain on the electrical grid in residential areas is highest at this time. That demand will only grow as the world moves away from fossil fuels, with more people buying induction stoves, heat pumps, and electric vehicles.

That’s a challenge for utilities, which are already managing creaky grids across the United States, all while trying to meet a growing demand for power. So they’re now trying to turn EVs from a burden into a boon. More and more models, for instance, feature “vehicle-to-grid,” or V2G, capabilities, meaning they can send power to the grid as needed. Others are experimenting with what’s called active managed charging, in which algorithms stagger when EVs charge, instead of them all drawing energy as soon as their owners plug in. The idea is for some people to charge later, but still have a full battery when they leave for work in the morning.

A new report from the Brattle Group, an economic and energy consultancy, done for EnergyHub, which develops such technology, has used real-world data from EV owners in Washington state to demonstrate the potential of this approach, both for utilities and drivers. They found that an active managed charging program saves up to $400 per EV each year, and the vehicles were still always fully charged in the morning. Utilities, too, seem to benefit, as the redistributed demand results in less of a spike in the early evening. That, in turn, would mean that a utility can delay costly upgrades — which they need in order to accommodate increased electrification — saving ratepayers money.

Active managed charging works in conjunction with something called “time of use,” in which a utility charges different rates depending on the time of day. Between 4 pm and 9 pm, when demand is high, rates are also high. But after 9 pm, they fall. EV owners who wait until later in the evening to charge pay less for the same electricity.

Time-of-use pricing discourages energy use when demand is highest, lightening the load and reducing how much electricity utilities need to generate. But there’s nothing stopping everyone from plugging in as soon as cheaper rates kick in at 9 p.m. As EV adoption grows, that coordination problem can create a new spike in demand. “An EV can be on its own twice the peak load of a typical home,” said Akhilesh Ramakrishnan, managing energy associate at the Brattle Group. “You get to the point where they start needing to be managed differently.”

That’s where active managed charging comes in. Using an app, an EV owner indicates when they need their car to be charged, and how much charge their battery needs for the day. (The app also learns over time to predict when a vehicle will unplug.) When they get home at 6 pm, the owner can plug in, but the car won’t begin to charge. Instead, the system waits until some point in the night to turn on the juice, leaving enough time to fully charge the vehicle by the indicated hour. “If customers don’t believe that we’re going to get them there, then they’re not going to allow us to control their vehicle effectively,” said Freddie Hall, a data scientist at EnergyHub.

The typical driver only goes 30 miles in a day, Hall added, requiring about two hours of charging each night. By actively managing many cars across neighborhoods, the system can more evenly distribute demand throughout the night: Folks will leave for work earlier or later than their neighbors, vehicles with bigger batteries will need more time to charge, and some will be almost empty while others may need to top up.

Read Next

collage of a photo of a truck bed sticking out of a garage and photo of a hand holding a phone showing an energy app

How EVs can fix the grid and lower your electric bill

Matt Simon

They’re all still getting the lower prices with time of use rates, but they’re not taxing the grid by all charging at 9 pm. “The results are actually very, very promising in terms of reducing the peak loads,” said Jan Kleissl, director of the Center for Energy Research at the University of California, San Diego, who wasn’t involved in the report. “It shows big potential for reducing costs of EV charging in general.”

Active managed charging would allow the grid to accommodate twice the number of EVs before a utility has to start upgrading the system to handle the added load, according to the report. (And consider all the additional demand for energy from things like data centers.) Those costs inevitably get passed down to all ratepayers. But, the report notes, active managed charging could delay those upgrades by up to a decade. “As EVs grow, if you don’t implement these solutions, there’s going to be a lot more upgrades, and that’s going to lead to rate impacts for everyone,” Ramakrishnan said.

At the same time, EVs could help reduce those rates in the long term, thanks to V2G, a separate emerging technology. It allows a utility to call on EVs sitting in garages as a vast network of backup power. So when demand surges, those vehicles can send power to the grid for others to use, or just power the house they’re sitting in, essentially removing the structure from the grid and lowering demand. (And think of all the fleets of electric vehicles, like school buses, with huge batteries to use as additional power.) With all that backup energy, utilities might not need to build as many costly battery facilities of their own, projects that ratepayers wouldn’t need to foot the bill for.

Active managed charging and V2G could work in concert, with some batteries draining at 6 pm as they provide energy, then recharging later at night. But that ballet will require more large-scale experimentation. “How are we going to fit in discharging a battery, as well as charging it overnight?” Hall said. “Because you do want it available the next day.”

To cut greenhouse gas emissions as quickly as possible, the world needs more EVs. Now it’s just a matter of making them benefit the grid instead of taxing it.

This story was originally published by Grist with the headline This tech could keep EVs from stressing the grid — and save everyone money on Jan 15, 2026.


From Grist via This RSS Feed.

13
14
15
16
17
 
 

18
19
 
 

Full article textTAIPEI, Jan 14 (Reuters) - Taiwan prosecutors have issued an arrest warrant for the chief executive officer of Chinese smartphone maker OnePlus, alleging he was involved in illegal business and recruitment activities in Taiwan. Taiwan’s Shilin District Prosecutors Office said in a document it had indicted two Taiwanese citizens for helping OnePlus CEO Pete Lau illegally operate a business and recruit more than 70 employees in Taiwan. The allegations fall under Taiwanese law governing relations with China.

The document, dated November 2025, was first reported by Taiwan local media on Tuesday. Over 70 employees were hired in Taiwan to conduct smartphone software application research and development, verification and testing for the Chinese smartphone maker, prosecutors said. OnePlus is headquartered in the southern Chinese city of Shenzhen. It became an independent sub-brand under Oppo in 2021, according to its website. Oppo and OnePlus did not immediately respond to requests for comment from Reuters. Reuters could not reach Lau for comment. Beijing claims democratically governed Taiwan as its own territory and has never renounced the use of force to bring the island under its control. Taiwan rejects China’s sovereignty claims and says only the island’s people can decide their future.

But Taiwan's tech expertise has made it a magnet for Chinese companies seeking talent, prompting Taiwanese authorities to block such efforts, which they say have included using shell companies registered in Hong Kong or foreign entities, or dispatching staff through hiring agencies to conceal their identities. In August 2025, Taiwan authorities said they were investigating 16 Chinese companies for allegedly poaching semiconductor and other high-tech talent, amid growing concerns over technology outflows.

(This story has been corrected to clarify that the document from Taiwan’s Shilin District Prosecutors Office was issued last November, not on Tuesday, in paragraphs 1, 2 and 3)

Reporting by Wen-Yee Lee; Editing by Stephen Coates

20
21
22
23
24
 
 
25
 
 

Most people in the field know that models usually fall apart after a few hundred steps because small errors just keep adding up until the whole process is ruined. The paper proposes a system called MAKER which uses a strategy they call massively decomposed agentic processes. Instead of asking one big model to do everything they break the entire task down into the smallest possible tiny pieces so each microagent only has to worry about one single move.

For their main test they used a twenty disk version of the Towers of Hanoi puzzle which actually requires over a million individual moves to finish. They found that even small models can be super reliable if you set them up correctly. One of the main tricks they used is a voting system where multiple agents solve the same tiny subtask and the system only moves forward once one answer gets a specific number of votes more than the others. This acts like a safety net that catches random mistakes before they can mess up the rest of the chain.

Another interesting part of their approach is red flagging which is basically just throwing away any response that looks suspicious or weird. If a model starts rambling for too long or messes up the formatting they just discard that attempt and try again because those kinds of behaviors usually mean the model is confused and likely to make a logic error. By combining this extreme level of task breakdown with constant voting and quick discarding of bad samples they managed to complete the entire million step process with zero errors.

And it turns out that you do not even need the most expensive or smartest models to do this since relatively small ones performed just as well for these tiny steps. Scaling up AI reliability might be more about how we organize the work rather than just making the models bigger and bigger. They even did some extra tests with difficult math problems like large digit multiplication and found that the same recursive decomposition and voting logic worked there as well.

view more: next ›