🤖 Harold's Notes

Search

❯

❯

❯

❯

Post training quantization (PTQ)

❯

EoRA - LoRA-style adapter quantization error compensation

EoRA - LoRA-style adapter quantization error compensation

Oct 19, 20251 min read

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
It’s a plug-in LoRA-style adapter that reconstructs the errors introduced by compression (especially 3- / 4-bit GPTQ) without any gradient-based fine-tuning
Main idea is:
- Compressed model + small calibration set → project the compression error into the activation eigenspace → fit a low-rank adapter where large eigen-values get priority.

Without fusion, this scheme leads to a noticeable increase in latency. This is primarily because input and output must transfer between L2 cache and DRAM twice as often compared to that without a low-rank residual path, shifting the inference process from being computation-bound to memory-bound.

This is how they fuse it into the quantized linear

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025