cloudflare Apr 17, 2026 Unweight: how we compressed an LLM 22% without sacrificing quality (opens in new tab) database-designllmrustmodel-inferencequantizationnvidia-h100memory-compressiongpu-kernels