Pushing the Limits of LLM Quantization via the Linearity Theorem

Scene_Cast2 7 hours ago

Given our modern understanding of how LLMs work (like the recent Anthropic work), I wonder if that insight can be used to quantize better. For example, we know that LLMs encode concepts through rotations (but not magnitude) of several neurons.

Bringing this up because the abstract (and the mention of rotations) reminded me of recent LLM interpretability posts.

cs702 8 hours ago

The OP looks like good work, but it's definitely not a quick read. The authors claim theoretical breakthroughs that enable:

* a data-free LLM quantization method which they claim outperforms all prior data-free approaches, including NF4; and

* a method which they claim is optimal for finding non-uniform per-layer quantization levels which match a given compression constraint in the "medium bitwidth" regime.

They demonstrate improved accuracy-compression trade-offs on popular LLMs.

Thank you for sharing this on HN.