theoretical-computer-science

2 posts

google

Gemini provides automated feedback for theoretical computer scientists at STOC 2026 (opens in new tab)

Google Research launched an experimental program for the STOC 2026 conference using a specialized Gemini model to provide automated, rigorous feedback on theoretical computer science submissions. By identifying critical logical errors and proof gaps within a 24-hour window, the tool demonstrated that advanced AI can serve as a powerful pre-vetting collaborator for high-level mathematical research. The overwhelmingly positive reception from authors indicates that AI can effectively augment the human peer-review process by improving paper quality before formal submission. ## Advanced Reasoning via Inference Scaling - The tool utilized an advanced version of Gemini 2.5 Deep Think specifically optimized for mathematical rigor. - It employed inference scaling methods, allowing the model to explore and combine multiple possible solutions and reasoning traces simultaneously. - This non-linear approach to problem-solving helps the model focus on the most salient technical issues while significantly reducing the likelihood of hallucinations. ## Structured Technical Feedback - Feedback was delivered in a structured format that included a high-level summary of the paper's core contributions. - The model provided a detailed analysis of potential mistakes, specifically targeting errors within lemmas, theorems, and logical proofs. - Authors also received a categorized list of minor corrections, such as inconsistent variable naming and typographical errors. ## Identified Technical Issues and Impact - The pilot saw high engagement, with over 80% of STOC 2026 submitters opting in for the AI-generated review. - The tool successfully identified "critical bugs" and calculation errors that had previously evaded human authors for months. - Survey results showed that 97% of participants found the feedback helpful, and 81% reported that the tool improved the overall clarity and readability of their work. ## Expert Verification and Hallucinations - Because the users were domain experts, they were able to act as a filter, distinguishing between deep technical insights and occasional model hallucinations. - While the model sometimes struggled to parse complex notation or interpret figures, authors valued the "neutral tone" and the speed of the two-day turnaround. - The feedback was used as a starting point for human verification, allowing researchers to refine their arguments rather than blindly following the model's output. ## Future Outlook and Educational Potential - Beyond professional research, 75% of surveyed authors see significant educational value in using the tool to train students in mathematical rigor. - The experiment's success has led to 88% of participants expressing interest in having continuous access to such a tool throughout their entire research and drafting process. The success of the STOC 2026 pilot suggests that researchers should consider integrating specialized LLMs early in the drafting phase to catch "embarrassing" or logic-breaking errors. While the human expert remains the final arbiter of truth, these tools provide a necessary layer of automated verification that can accelerate the pace of scientific discovery.

google

AI as a research partner: Advancing theoretical computer science with AlphaEvolve (opens in new tab)

AlphaEvolve, an LLM-powered coding agent developed by Google DeepMind, facilitates mathematical discovery by evolving code to find complex combinatorial structures that are difficult to design manually. By utilizing a "lifting" technique, the system discovers finite structures that can be plugged into existing proof frameworks to establish new universal theorems in complexity theory. This methodology has successfully produced state-of-the-art results for the MAX-4-CUT problem and tightened bounds on the hardness of certifying properties in random graphs. ## The Role of AlphaEvolve in Mathematical Research * The system uses an iterative feedback loop to morph code snippets, evaluating the resulting mathematical structures and refining the code toward more optimal solutions. * AlphaEvolve operates as a tool-based assistant that generates specific proof elements, which can then be automatically verified by computer programs to ensure absolute mathematical correctness. * By focusing on verifiable finite structures, the agent overcomes the common "hallucination" issues of LLMs, as the final output is a computationally certified object rather than a speculative text-based proof. ## Bridging Finite Discovery and Universal Statements through Lifting * Theoretical computer science often requires proofs that hold true for all problem sizes ($\forall n$), a scale that AI systems typically struggle to address directly. * The "lifting" technique treats a proof as a modular structure where a specific finite component—such as a combinatorial gadget—can be replaced with a more efficient version while keeping the rest of the proof intact. * When AlphaEvolve finds a superior finite structure, the improvement is "lifted" through the existing mathematical framework to yield a stronger universal theorem without requiring a human to redesign the entire logical architecture. ## Optimizing Gadget Reductions and MAX-k-CUT * Researchers applied the agent to "gadget reductions," which are recipes used to map known intractable problems to new ones to prove computational hardness (NP-hardness). * AlphaEvolve discovered complex gadgets that were previously unknown because they were too intricate for researchers to construct by hand. * These discoveries led to a new state-of-the-art inapproximability result for the MAX-4-CUT problem, defining more precise limits on how accurately the problem can be solved by any efficient algorithm. ## Advancing Average-Case Hardness in Random Graphs * The agent was tasked with uncovering structures related to the average-case hardness of certifying properties within random graphs. * By evolving better combinatorial structures for these specific instances, the team was able to tighten existing mathematical bounds, providing a clearer picture of when certain graph properties become computationally intractable to verify. This research demonstrates that LLM-based agents can serve as genuine research partners by focusing on the discovery of verifiable, finite components within broader theoretical frameworks. For researchers in mathematics and computer science, this "lifting" approach provides a practical roadmap for using AI to solve bottleneck problems that were previously restricted by the limits of manual construction.