sast

1 posts

toss

Automating Service Vulnerability Analysis Using (opens in new tab)

Toss has developed a high-precision automated vulnerability analysis system by integrating Large Language Models (LLMs) with traditional security testing tools. By evolving their architecture from a simple prompt-based approach to a multi-agent system utilizing open-source models and static analysis, the team achieved over 95% accuracy in threat detection. This project demonstrates that moving beyond a technical proof-of-concept requires solving real-world constraints such as context window limits, output consistency, and long-term financial sustainability. ### Navigating Large Codebases with MCP * Initial attempts to use RAG (Retrieval Augmented Generation) and repository compression tools failed because the LLM could not maintain complex code relationships within token limits. * The team implemented a "SourceCode Browse MCP" (Model Context Protocol) which allows the LLM agent to dynamically query the codebase. * By indexing the code, the agent can perform specific tool calls to find function definitions or variable usages only when necessary, effectively bypassing context window restrictions. ### Ensuring Consistency via SAST Integration * Testing revealed that standalone LLMs produced inconsistent results, often missing known vulnerabilities or generating hallucinations across different runs. * To solve this, the team integrated Semgrep, a Static Application Security Testing (SAST) tool, to identify all potential "Source-to-Sink" paths. * Semgrep was chosen over CodeQL due to its lighter resource footprint and faster execution, acting as a structured roadmap that ensures the LLM analyzes every suspicious input path without omission. ### Optimizing Costs with Multi-Agent Architectures * Analyzing every possible code path identified by SAST tools was prohibitively expensive due to high token consumption. * The workflow was divided among three specialized agents: a Discovery Agent to filter out irrelevant paths, an Analysis Agent to perform deep logic checks, and a Verification Agent to confirm findings. * This "sieve" strategy ensured that the most resource-intensive analysis was only performed on high-probability vulnerabilities, significantly reducing operational costs. ### Transitioning to Open Models for Sustainability * Scaling the system to hundreds of services and daily PRs made proprietary cloud models financially unviable. * After benchmarking models like Llama 3.1 and GPT-OSS, the team selected **Qwen3:30B** for its 100% coverage rate and high true-positive accuracy in vulnerability detection. * To bridge the performance gap between open-source and proprietary models, the team utilized advanced prompt engineering, one-shot learning, and enforced structured JSON outputs to improve reliability. To build a production-ready AI security tool, teams should focus on the synergy between specialized open-source models and traditional static analysis tools. This hybrid approach provides a cost-effective and sustainable way to achieve enterprise-grade accuracy while maintaining full control over the analysis infrastructure.