nvidia-h100 | Techlist.io

cloudflare Apr 17, 2026

Unweight: how we compressed an LLM 22% without sacrificing quality (opens in new tab)

Unweight: how we compressed an LLM 22% without sacrificing quality 2026-04-17 Mari Galicer Ivan Nikulin Chris Branch Running inference within 50ms of 95% of the world's Internet-connected population means being ruthlessly efficient with GPU memory. Last year we improved memory u…

nvidia-h100 database-design llm rust+4