*이 글은 연구 개발망에서 진행된 내용을 바탕으로 합니다. 안녕하세요. 토스 Security Researcher 표상영입니다. 지난 글에서는 LLM을 이용해 서비스 취약점 분석을 자동화하면서 마주했던 문제점과 그에 대한 해결책들을 간단히 소개드렸습니다. 이전 글을 작성한 시점부터 벌써 3개월이 지났는데요. 불과 몇 달 사이에 AI의 취약점 분석 능력은 정말 높은 수준으로 올라왔습니다. 이렇게 가파른 기술 발전 속도에 따라, AI를 대하는 저의 자세와 생각도 많이 바뀌게 되었어요. 이번 글에서는…
AI Security for Apps is now generally available 2026-03-11 Liam Reese Zhiyuan Zheng Catherine Newcomb Cloudflare’s AI Security for Apps detects and mitigates threats to AI-powered applications. Today, we're announcing that it is generally available. We’re shipping with new capab…
Complexity is a choice. SASE migrations shouldn’t take years. 2026-03-09 Warnessa Weaver For years, the cybersecurity industry has accepted a grim reality: migrating to a zero trust architecture is a marathon of misery. CIOs have been conditioned to expect multi-year deployment…
From the endpoint to the prompt: a unified data security vision in Cloudflare One 2026-03-06 Alex Dunbrack Cloudflare One has grown a lot over the years. What started with securing traffic at the network now spans the endpoint and SaaS applications – because that’s where work ha…
Modernizing with agile SASE: a Cloudflare One blog takeover 2026-03-02 Warnessa Weaver Yumna Moazzam Return to office has stalled for many, and the “new normal” for what the corporate network means is constantly changing. In 2026, your office may be a coffee shop, your workforce…
Developing AI products introduces unique security vulnerabilities that extend beyond traditional software risks, ranging from package hallucinations to sophisticated indirect prompt injections. To mitigate these threats, organizations must move away from trusting LLM-generated content and instead implement rigorous validation, automated threat modeling, and input/output guardrails. The following summary details the specific risks and mitigation strategies identified by LY Corporation’s security engineering team.
## Slopsquatting and Package Hallucinations
- AI models frequently hallucinate non-existent library or package names when providing coding instructions (e.g., suggesting `huggingface-cli` instead of the correct `huggingface_hub[cli]`).
- Attackers exploit this by registering these hallucinated names on public registries to distribute malware to unsuspecting developers.
- Mitigation requires developers to manually verify all AI-suggested commands and dependencies before execution in any environment.
## Prompt Injection and Arbitrary Code Execution
- As seen in CVE-2024-5565 (Vanna AI), attackers can inject malicious instructions into prompts to force the application to execute arbitrary code.
- This vulnerability arises when developers grant LLMs the autonomy to generate and run logic within the application context without sufficient isolation.
- Mitigation involves treating LLM outputs as untrusted data, sanitizing user inputs, and strictly limiting the LLM's ability to execute system-level commands.
## Indirect Prompt Injection in Integrated AI
- AI assistants integrated into office environments (like Gemini for Workspace) are susceptible to indirect prompt injections hidden within emails or documents.
- A malicious email can contain "system-like" instructions that trick the AI into hiding content, redirecting users to phishing sites, or leaking data from other files.
- Mitigation requires the implementation of robust guardrails that scan both the input data (the content being processed) and the generated output for instructional anomalies.
## Permission Risks in AI Agents and MCP
- The use of Model Context Protocol (MCP) and coding agents creates risks where an agent might overstep its intended scope.
- If an agent has broad access to a developer's environment, a malicious prompt in a public repository could trick the agent into accessing or leaking sensitive data (such as salary info or private keys) from a private repository.
- Mitigation centers on the principle of least privilege, ensuring AI agents are restricted to specific, scoped directories and repositories.
## Embedding Inversion and Vector Store Vulnerabilities
- Attacks targeting the retrieval phase of RAG (Retrieval-Augmented Generation) systems can lead to data leaks.
- Embedding Inversion techniques may allow attackers to reconstruct original sensitive text from the vector embeddings stored in a database.
- Securing AI products requires protecting the integrity of the vector store and ensuring that retrieved context does not bypass security filters.
## Automated Security Assessment Tools
- To scale security, LY Corporation is developing internal tools like "ConA" for automated threat modeling and "LAVA" for automated vulnerability assessment.
- These tools aim to identify AI-specific risks during the design and development phases rather than relying solely on manual reviews.
Effective AI security requires a shift in mindset: treat every LLM response as a potential security risk. Developers should adopt automated threat modeling and implement strict input/output validation layers to protect both the application infrastructure and user data from evolving AI-based exploits.