model-optimization | Techlist.io

line Mar 30, 2026

Image Content Moderation in Large-Scale Service Environments (feat. Multimodal LLM) (opens in new tab)

들어가며 콘텐츠 모더레이션은 사용자가 생성하거나 업로드하는 텍스트, 이미지, 영상, 음성 등의 콘텐츠 중 서비스 정책 및 법적 기준에 위배되는 내용을 탐지해 적절한 조치를 수행하기 위한 기술적 운영 체계를 의미합니다. 단순히 부적절한 콘텐츠를 차단하는 기능을 넘어, 사용자를 보호하고 서비스의 신뢰를 유지하기 위한 핵심 인프라라고 할 수 있습니다. 플랫폼 규모가 커지면서 사용자가 생성하는 콘텐츠의 양이 폭발적으로 증가했고, 그에 따라 유해 콘텐츠가 생성되고 확산되는 속도 또한 과거와 비교할 수 없…

model-optimization database-design machine-learning deep-learning+4

meta Mar 17, 2026

Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation (opens in new tab)

Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models. This post covers REA’s ML experimentation capabilities: autonomously generating hypotheses, launching training jobs, debugging failur…

model-optimization machine-learning deep-learning autonomous-agents+4

dropbox Feb 12, 2026

How low-bit inference enables efficient AI (opens in new tab)

How low-bit inference enables efficient AI In just the past few years, large machine learning models have made incredible strides. Today’s models are not only remarkably capable but also achieve impressive results across a range of applications, from software engineering and sci…

model-optimization llm gpu-acceleration quantization+3

google Feb 4, 2026

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy (opens in new tab)

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy February 4, 2026 Thomas Fu, Principal Engineer, and Kyriakos Axiotis, Senior Scientist, Google Research We introduce a subset selection algorithm for making large scale ML models more efficien…

model-optimization deep-learning subset-selection sequential-attention+4

google May 21, 2025

Google Research at Google I/O 2025 (opens in new tab)

Google Research at I/O 2025 showcases the "research to reality" transition, highlighting how years of foundational breakthroughs are now being integrated into Gemini models and specialized products. By focusing on multimodal capabilities, pedagogy, and extreme model efficiency, Google aims to democratize access to advanced AI while ensuring it remains grounded and useful across global contexts. ## Specialized Healthcare Models: MedGemma and AMIE * **MedGemma:** This new open model, based on Gemma 3, is optimized for multimodal medical tasks such as radiology image analysis and clinical data summarization. It is available in 4B and 27B sizes, performing similarly to much larger models on the MedQA benchmark while remaining small enough for efficient local fine-tuning. * **AMIE (Articulate Medical Intelligence Explorer):** A research AI agent designed for diagnostic medical reasoning. Its latest multimodal version can now interpret and reason about visual medical information, such as skin lesions or medical imaging, to assist clinicians in diagnostic accuracy. ## Educational Optimization through LearnLM * **Gemini 2.5 Pro Integration:** The LearnLM family of models, developed with educational experts, is now integrated into Gemini 2.5 Pro. This fine-tuning enhances STEM reasoning, multimodal understanding, and pedagogical feedback. * **Interactive Learning Tools:** A new research-optimized quiz experience allows students to generate custom assessments from their own notes, providing specific feedback on right and wrong answers rather than just providing solutions. * **Global Assessment Pilots:** Through partnerships like the one with Kayma, Google is testing the automatic assessment of short and long-form content in regions like Ghana to scale quality educational tools. ## Multilingual Expansion and On-Device Gemma Models * **Gemma 3 and 3n:** Research breakthroughs have expanded Gemma 3’s support to over 140 languages. The introduction of **Gemma 3n** targets extreme efficiency, capable of running on devices with as little as 2GB of RAM while maintaining low latency and low energy consumption. * **ECLeKTic Benchmark:** To assist the developer community, Google introduced this novel benchmark specifically for evaluating how well large language models transfer knowledge across different languages. ## Model Efficiency and Factuality in Search * **Inference Techniques:** Google Research continues to set industry standards for model speed and accessibility through technical innovations like **speculative decoding** and **cascades**, which reduce the computational cost of generating high-quality responses. * **Grounded Outputs:** Significant focus remains on factual consistency, ensuring that the AI models powering features like AI Overviews in Search provide reliable and grounded information to users. As Google continues to shrink the gap between laboratory breakthroughs and consumer products, the emphasis remains on making high-performance AI accessible on low-cost hardware and across diverse linguistic landscapes. Developers and researchers can now leverage these specialized tools via platforms like HuggingFace and Vertex AI to build more targeted, efficient applications.

model-optimization ai gen-ai gemini+5

Image Content Moderation in Large-Scale Service Environments (feat. Multimodal LLM) (opens in new tab)

Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation (opens in new tab)

How low-bit inference enables efficient AI (opens in new tab)

​Sequential Attention: Making AI models leaner and faster without sacrificing accuracy (opens in new tab)

Google Research at Google I/O 2025 (opens in new tab)

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy (opens in new tab)