O
Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]
Neural KV-cache compaction — using learned compression rather than heuristic eviction — is one of the more credible paths to running long-context LLMs without bleeding GPU memory. Cartridges and STILL are two recent...
April 21, 2026•12 min read