12

Test-time training (TTT) significantly enhances language models' abstract reasoning, improving accuracy up to 6x on the Abstraction and Reasoning Corpus (ARC). Key factors for successful TTT include initial fine-tuning, auxiliary tasks, and per-instance training. Applying TTT to an 8B-parameter model boosts accuracy to 53% on ARC's public validation set, nearly 25% better than previous public, neural approaches. Ensemble with recent program generation methods achieves 61.9% accuracy, matching average human scores. This suggests that, in addition to explicit symbolic search, test-time training on few-shot examples significantly improves abstract reasoning in neural language models.

top 2 comments
sorted by: hot top controversial new old
[-] SocialistDovahkiin@hexbear.net 6 points 1 week ago

achieving similar statistical accuracy when training off of large datasets which probably have the answers to a lot of the parts of these benchmarks doesn't seem too impressive

The training datasets don't have the answers because the benchmark is diverse enough. That's why other models struggled to perform as well as humans until they applied the approach outlined in the paper. This is the benchmark: https://liusida.github.io/ARC/

this post was submitted on 15 Nov 2024
12 points (92.9% liked)

technology

23313 readers
28 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS