Dnotitia Unveils STAR-KV, Achieving UP to 20x KV Cache Compression, Selected as an ICML 2026 Spotlight Paper
Introduces a low-rank-based approach to KV cache compression, one of the key bottlenecks in long-context AI Speeds up attention computation by up to 6.9x and overall generation throughput by up to 3.1x, moving beyond memory savings to faster inference Selected as a Spotlight paper at ICML...
Read more »