Code Mode: give agents an entire API in 1,000 tokens 2026-02-20 Matt Carey Model Context Protocol (MCP) has become the standard way for AI agents to use external tools. But there is a tension at its core: agents need many tools to do useful work, yet every tool added fills the m…
Background Coding Agents: Predictable Results Through Strong Feedback Loops (Honk, Part 3) This is part 3 in our series about Spotify's journey with background coding agents (internal codename: “Honk”) and the future of large-scale software maintenance. See also part 1 and part…
When we launched the Microsoft Learn Model Context Protocol (MCP) Server in June 2025, our goal was simple: make it effortless for AI agents to use trusted, up-to-date Microsoft Learn documentation. GitHub Copilot and other agents are increasingly common, and they need to be abl…
The tech industry is shifting from Software 1.0 (explicit logic) and 2.0 (neural networks) into Software 3.0, where natural language prompts and autonomous agents act as the primary programming interface. While Large Language Models (LLMs) are the engines of this era, they require a "Harness"—a structured environment of tools and protocols—to perform real-world tasks effectively. This evolution does not render traditional engineering obsolete; instead, it demonstrates that robust architectural principles like layered design and separation of powers are essential for building reliable AI agents.
### The Evolution of Software 3.0
* Software 1.0 is defined by explicit "How" logic written in languages like Python or Java, while Software 2.0 focuses on weights and data in neural networks.
* Software 3.0, popularized by Andrej Karpathy, moves to "What" logic, where natural language prompts drive the execution.
* The "Harness" concept is critical: just as a horse needs a harness to be useful to a human, an LLM needs tools (CLI, API access, file systems) to move from a chatbot to a functional agent like Claude Code.
### Mapping Agent Architecture to Traditional Layers
* **Slash Commands as Controllers:** Tools like `/review` or `/refactor` act as entry points for user requests, similar to REST controllers in Spring or Express.
* **Sub-agents as the Service Layer:** Sub-agents coordinate multiple skills and maintain independent context, mirroring how services orchestrate domain objects and repositories.
* **Skills as Domain Components:** Following the Single Responsibility Principle (SRP), individual skills should handle one clear task (e.g., "generating tests") to prevent logic bloat.
* **MCP as Infrastructure/Adapters:** The Model Context Protocol (MCP) functions like the Repository or Adapter pattern, abstracting external systems like databases and APIs from the core logic.
* **CLAUDE.md as Configuration:** Project-specific rules and tech stacks are stored in metadata files, acting as the `package.json` or `pom.xml` of the agent environment.
### From Exceptions to Questions
* Traditional 1.0 software must have every branch of logic predefined; if an unknown state is reached, the system throws an exception or fails.
* Software 3.0 introduces Human-in-the-Loop (HITL), where "Exceptions" become "Questions," allowing the agent to ask for clarification on high-risk or ambiguous tasks.
* Effective agent design requires identifying when to act autonomously (reversible, low-risk tasks) versus when to delegate decisions to a human (deployments, deletions, or high-cost API calls).
### Managing Constraints: Tokens and Complexity
* In Software 3.0, tokens represent the "memory" (RAM) of the system; large codebases can lead to "token explosion," causing context overflow or high costs.
* Deterministic logic should be moved to external scripts rather than being interpreted by the LLM every time to save tokens and ensure consistency.
* To avoid "Skill Explosion" (similar to Class Explosion), developers should use "Progressive Disclosure," providing the agent with a high-level entry point and only loading detailed task knowledge when specifically required.
Traditional software engineering expertise—specifically in cohesion, coupling, and abstraction—is the most valuable asset when transitioning to Software 3.0. By treating prompt engineering and agent orchestration with the same architectural rigor as 1.0 code, developers can build agents that are scalable, maintainable, and truly useful.
How Dash uses context engineering for smarter AI When we first built Dash, it looked like most enterprise search systems: a traditional RAG pipeline that combined semantic and keyword search across indexed documents. It worked well for retrieving information and generating conci…
ServiceNow and Figma launch strategic collaboration to turn design vision into enterprise transformation Inside Figma News Today, ServiceNow, the AI platform for business transformation, and Figma announced a strategic collaboration to turn design vision into enterprise transfor…
Developing AI products introduces unique security vulnerabilities that extend beyond traditional software risks, ranging from package hallucinations to sophisticated indirect prompt injections. To mitigate these threats, organizations must move away from trusting LLM-generated content and instead implement rigorous validation, automated threat modeling, and input/output guardrails. The following summary details the specific risks and mitigation strategies identified by LY Corporation’s security engineering team.
## Slopsquatting and Package Hallucinations
- AI models frequently hallucinate non-existent library or package names when providing coding instructions (e.g., suggesting `huggingface-cli` instead of the correct `huggingface_hub[cli]`).
- Attackers exploit this by registering these hallucinated names on public registries to distribute malware to unsuspecting developers.
- Mitigation requires developers to manually verify all AI-suggested commands and dependencies before execution in any environment.
## Prompt Injection and Arbitrary Code Execution
- As seen in CVE-2024-5565 (Vanna AI), attackers can inject malicious instructions into prompts to force the application to execute arbitrary code.
- This vulnerability arises when developers grant LLMs the autonomy to generate and run logic within the application context without sufficient isolation.
- Mitigation involves treating LLM outputs as untrusted data, sanitizing user inputs, and strictly limiting the LLM's ability to execute system-level commands.
## Indirect Prompt Injection in Integrated AI
- AI assistants integrated into office environments (like Gemini for Workspace) are susceptible to indirect prompt injections hidden within emails or documents.
- A malicious email can contain "system-like" instructions that trick the AI into hiding content, redirecting users to phishing sites, or leaking data from other files.
- Mitigation requires the implementation of robust guardrails that scan both the input data (the content being processed) and the generated output for instructional anomalies.
## Permission Risks in AI Agents and MCP
- The use of Model Context Protocol (MCP) and coding agents creates risks where an agent might overstep its intended scope.
- If an agent has broad access to a developer's environment, a malicious prompt in a public repository could trick the agent into accessing or leaking sensitive data (such as salary info or private keys) from a private repository.
- Mitigation centers on the principle of least privilege, ensuring AI agents are restricted to specific, scoped directories and repositories.
## Embedding Inversion and Vector Store Vulnerabilities
- Attacks targeting the retrieval phase of RAG (Retrieval-Augmented Generation) systems can lead to data leaks.
- Embedding Inversion techniques may allow attackers to reconstruct original sensitive text from the vector embeddings stored in a database.
- Securing AI products requires protecting the integrity of the vector store and ensuring that retrieved context does not bypass security filters.
## Automated Security Assessment Tools
- To scale security, LY Corporation is developing internal tools like "ConA" for automated threat modeling and "LAVA" for automated vulnerability assessment.
- These tools aim to identify AI-specific risks during the design and development phases rather than relying solely on manual reviews.
Effective AI security requires a shift in mindset: treat every LLM response as a potential security risk. Developers should adopt automated threat modeling and implement strict input/output validation layers to protect both the application infrastructure and user data from evolving AI-based exploits.
Tech-Verse 2025 showcased LY Corporation’s strategic shift toward an AI-integrated ecosystem following the merger of LINE and Yahoo Japan. The event focused on the practical hurdles of deploying generative AI, concluding that the transition from experimental models to production-ready services requires sophisticated evaluation frameworks and deep contextual integration into developer workflows.
## AI-Driven Engineering with Ark Developer
LY Corporation’s internal "Ark Developer" solution demonstrates how AI can be embedded directly into the software development life cycle.
* The system utilizes a Retrieval-Augmented Generation (RAG) based code assistant to handle tasks such as code completion, security reviews, and automated test generation.
* Rather than treating codebases as simple text documents, the tool performs graph analysis on directory structures to maintain structural context during code synthesis.
* Real-world application includes a seamless integration with GitHub for automated Pull Request (PR) creation, with internal users reporting higher satisfaction compared to off-the-shelf tools like GitHub Copilot.
## Quantifying Quality in Generative AI
A significant portion of the technical discussion centered on moving away from subjective "vibes-based" assessments toward rigorous, multi-faceted evaluation of AI outputs.
* To measure the quality of generated images, developers utilized traditional metrics like Fréchet Inception Distance (FID) and Inception Score (IS) alongside LAION’s Aesthetic Score.
* Advanced evaluation techniques were introduced, including CLIP-IQA, Q-Align, and Visual Question Answering (VQA) based on video-language models to analyze image accuracy.
* Technical challenges in image translation and inpainting were highlighted, specifically the difficulty of restoring layout and text structures naturally after optical character recognition (OCR) and translation.
## Global Technical Exchange and Implementation
The conference served as a collaborative hub for engineers across Japan, Taiwan, and Korea to discuss the implementation of emerging standards like the Model Context Protocol (MCP).
* Sessions emphasized the "how-to" of overcoming deployment hurdles rather than just following technical trends.
* Poster sessions (Product Street) and interactive Q&A segments allowed developers to share localized insights on LLM agent performance and agentic workflows.
* The recurring theme across diverse teams was that the "evaluation and verification" stage is now the primary driver of quality in generative AI services.
For organizations looking to scale AI, the key recommendation is to move beyond simple implementation and invest in "evaluation-driven development." By building internal tools that leverage graph-based context and quantitative metrics like Aesthetic Scores and VQA, teams can ensure that generative outputs meet professional service standards.
Introducing our MCP server: Bringing Figma into your workflow Inside Figma Product updates Dev Mode AI Engineering News Today we’re announcing the beta release of the Figma MCP server, which brings Figma directly into the developer workflow to help LLMs achieve design-informed c…