human-in-the-loop

3 posts

toss

Welcoming the Era of (opens in new tab)

The tech industry is shifting from Software 1.0 (explicit logic) and 2.0 (neural networks) into Software 3.0, where natural language prompts and autonomous agents act as the primary programming interface. While Large Language Models (LLMs) are the engines of this era, they require a "Harness"—a structured environment of tools and protocols—to perform real-world tasks effectively. This evolution does not render traditional engineering obsolete; instead, it demonstrates that robust architectural principles like layered design and separation of powers are essential for building reliable AI agents. ### The Evolution of Software 3.0 * Software 1.0 is defined by explicit "How" logic written in languages like Python or Java, while Software 2.0 focuses on weights and data in neural networks. * Software 3.0, popularized by Andrej Karpathy, moves to "What" logic, where natural language prompts drive the execution. * The "Harness" concept is critical: just as a horse needs a harness to be useful to a human, an LLM needs tools (CLI, API access, file systems) to move from a chatbot to a functional agent like Claude Code. ### Mapping Agent Architecture to Traditional Layers * **Slash Commands as Controllers:** Tools like `/review` or `/refactor` act as entry points for user requests, similar to REST controllers in Spring or Express. * **Sub-agents as the Service Layer:** Sub-agents coordinate multiple skills and maintain independent context, mirroring how services orchestrate domain objects and repositories. * **Skills as Domain Components:** Following the Single Responsibility Principle (SRP), individual skills should handle one clear task (e.g., "generating tests") to prevent logic bloat. * **MCP as Infrastructure/Adapters:** The Model Context Protocol (MCP) functions like the Repository or Adapter pattern, abstracting external systems like databases and APIs from the core logic. * **CLAUDE.md as Configuration:** Project-specific rules and tech stacks are stored in metadata files, acting as the `package.json` or `pom.xml` of the agent environment. ### From Exceptions to Questions * Traditional 1.0 software must have every branch of logic predefined; if an unknown state is reached, the system throws an exception or fails. * Software 3.0 introduces Human-in-the-Loop (HITL), where "Exceptions" become "Questions," allowing the agent to ask for clarification on high-risk or ambiguous tasks. * Effective agent design requires identifying when to act autonomously (reversible, low-risk tasks) versus when to delegate decisions to a human (deployments, deletions, or high-cost API calls). ### Managing Constraints: Tokens and Complexity * In Software 3.0, tokens represent the "memory" (RAM) of the system; large codebases can lead to "token explosion," causing context overflow or high costs. * Deterministic logic should be moved to external scripts rather than being interpreted by the LLM every time to save tokens and ensure consistency. * To avoid "Skill Explosion" (similar to Class Explosion), developers should use "Progressive Disclosure," providing the agent with a high-level entry point and only loading detailed task knowledge when specifically required. Traditional software engineering expertise—specifically in cohesion, coupling, and abstraction—is the most valuable asset when transitioning to Software 3.0. By treating prompt engineering and agent orchestration with the same architectural rigor as 1.0 code, developers can build agents that are scalable, maintainable, and truly useful.

kakao

[AI_TOP_10 (opens in new tab)

The AI TOP 100 contest was designed to shift the focus from evaluating AI model performance to measuring human proficiency in solving real-world problems through AI collaboration. By prioritizing the "problem-solving process" over mere final output, the organizers sought to identify individuals who can define clear goals and navigate the technical limitations of current AI tools. The conclusion of this initiative suggests that true AI literacy is defined by the ability to maintain a "human-in-the-loop" workflow where human intuition guides AI execution and verification. ### Core Philosophy of Human-AI Collaboration * **Human-in-the-Loop:** The contest emphasizes a cycle of human analysis, AI problem-solving, and human verification. This ensures that the human remains the "pilot" who directs the AI engine and takes responsibility for the quality of the result. * **Strategic Intervention:** Participants were encouraged to provide AI with structural context it might struggle to perceive (like complex table relationships) and to perform data pre-processing to improve AI accuracy. * **Task Delegation:** For complex iterative tasks, such as generating images for a montage, solvers were expected to build automated pipelines using AI agents to handle repetitive feedback loops while focusing human effort on higher-level strategy. ### Designing Against "One-Shot" Solutions * **Low Barrier, High Ceiling:** Problems were designed to be intuitive enough for anyone to understand but complex enough to prevent "one-shot" solutions (the "click-and-solve" trap). * **Targeting Technical Weaknesses:** Organizers intentionally embedded technical hurdles that current LLMs struggle with, forcing participants to demonstrate how they bridge the gap between AI limitations and a correct answer. * **The Difficulty Ladder:** To account for varying domain expertise (e.g., OCR experience), problems utilized a multi-part structure. This included "Easy" starting questions to build momentum and "Medium" hint questions that guided participants toward solving the more difficult "Killer" components. ### The 4-Pattern Problem Framework * **P1 - Insight (Analysis & Definition):** Identifying meaningful opportunities or problems within complex, unstructured data. * **P2 - Action (Implementation & Automation):** Developing functional code or workflows to execute a defined solution. * **P3 - Persuasion (Strategy & Creativity):** Generating logical and creative content to communicate technical solutions to non-technical stakeholders. * **P4 - Decision (Optimization):** Making optimal choices and simulations to maximize goals under specific constraints. ### Quality Assurance and Score Calibration * **4-Stage Pipeline:** Problems moved from Ideation to Drafting (testing for one-shot immunity), then to Candidate (analyzing abuse vulnerabilities), and finally to a Final selection based on difficulty balance. * **Cross-Model Validation:** Internal and alpha testers solved problems using various models including Claude, GPT, and Gemini to ensure that no single tool could bypass the intended human-led process. * **Effort-Based Scoring:** Instead of uniform points, scores were calibrated based on the "effort cost" and human competency required to solve them. This resulted in varying total points per problem to better reflect the true difficulty of the task. In the era of rapidly evolving AI, the ability to "use" a tool is becoming less valuable than the ability to "collaborate" with it. This shift requires a move toward building automated pipelines and utilizing a "difficulty ladder" approach to tackle complex, multi-stage problems that AI cannot yet solve in a single iteration.

line

IUI 202 (opens in new tab)

The IUI 2025 conference highlighted a significant shift in the AI landscape, moving away from a sole focus on model performance toward "human-centered AI" that prioritizes collaboration, ethics, and user agency. The prevailing consensus across key sessions suggests that for AI to be sustainable and trustworthy, it must transcend simple automation to become a tool that augments human perception and decision-making through transparent, interactive, and socially aware design. ## Reality Design and Human Augmentation The concept of "Reality Design" suggests that Human-Computer Interaction (HCI) research must expand beyond screen-based interfaces to design reality itself. As AI, sensors, and wearables become integrated into daily life, technology can be used to directly augment human perception, cognition, and memory. * Memory extension: Systems can record and reconstruct personal experiences, helping users recall details in educational or professional settings. * Sensory augmentation: Technologies like selective hearing or slow-motion visual playback can enhance a user's natural observational powers. * Cognitive balance: While AI can assist with task difficulty (e.g., collaborative Lego building), designers must ensure that automation does not erode the human will to learn or remember, echoing historical warnings about technology-induced "forgetfulness." ## Bridging the Socio-technical Gap in AI Transparency Transparency in AI, particularly for high-risk areas like finance or medicine, should not be limited to showing mathematical model weights. Instead, it must bridge the gap between technical complexity and human understanding by focusing on user goals and social contexts. * Multi-faceted communication: Effective transparency involves model reporting (Model Cards), sharing safety evaluation results, and providing linguistic or visual cues for uncertainty rather than just numerical scores. * Counterfactual explanations: Users gain better trust when they can see how a decision might have changed if specific input conditions were different. * Interaction-based transparency: Transparency must be coupled with control, allowing users to act as "adjusters" who provide feedback that the model then reflects in its future outputs. ## Interactive Machine Learning and Human-in-the-Loop The framework of Interactive Machine Learning (IML) challenges the traditional view of AI as a static black box trained on fixed data. Instead, it proposes an interactive loop where the user and the model grow together through continuous feedback. * User-driven training: Users should be able to inspect model classifications, correct errors, and have those corrections immediately influence the model's learning path. * Beyond automation: This approach reframes AI from a replacement for human labor into a collaborative partner that adapts to specific user behaviors and professional expertise. * Impact on specialized tools: Modern applications include educational platforms where students manipulate data directly and research tools that integrate human intuition into large-scale data analysis. ## Collaborative Systems in Specialized Professional Contexts Practical applications of human-centered AI are being realized in sensitive fields like child counseling, where AI assists experts without replacing the human element. * Counselor-AI transcription: Systems designed for counseling analysis allow AI to handle the heavy lifting of transcription while counselors manage the nuance and contextual editing. * Efficiency through partnership: By focusing on reducing administrative burdens, these systems enable professionals to spend more time on high-level cognitive tasks and emotional support, demonstrating the value of AI as a supportive infrastructure. The future of AI development requires moving beyond isolated technical optimization to embrace the complexity of the human experience. Organizations and developers should focus on creating systems where transparency is a tool for "appropriate trust" and where design is focused on empowering human capabilities rather than simply automating them.