math

341 readers

4 users here now

Interesting news and discussion centered around Mathematics

founded 3 years ago

MODERATORS

"Prompt Gisting:" Train two models such that given inputs "Translate French" and "G2>The cat," then G1 and G2 represent the entire instruction. (arxiv.org)

submitted 3 years ago by goosethe to c/math

0 comments fedilink hide all child comments

cross-posted from: https://lemmy.sdf.org/post/36227

Abstract: "Prompting is now the primary way to utilize the multitask capabilities of language models (LMs), but prompts occupy valuable space in the input context window, and re-encoding the same prompt is computationally inefficient. Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of "gist" tokens which can be reused for compute efficiency. Gist models can be easily trained as part of instruction finetuning via a restricted attention mask that encourages prompt compression. On decoder (LLaMA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, storage savings, and minimal loss in output quality. "

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here