python

4 posts

meta

Python Typing Survey 2025: Code Quality and Flexibility As Top Reasons for Typing Adoption - Engineering at Meta (opens in new tab)

The 2025 Typed Python Survey highlights that type hinting has transitioned from an optional feature to a core development standard, with 86% of respondents reporting frequent usage. While mid-career developers show the highest enthusiasm for typing, the ecosystem faces ongoing friction from tooling fragmentation and the complexity of advanced type logic. Overall, the community is pushing for a more robust system that mirrors the expressive power of TypeScript while maintaining Python’s hallmark flexibility. ## Respondent Demographics and Adoption Trends * The survey analyzed responses from 1,241 developers, the majority of whom are highly experienced, with nearly half reporting over a decade of Python expertise. * Adoption is highest among developers with 5–10 years of experience (93%), whereas junior developers (83%) and those with over 10 years of experience (80%) show slightly lower usage rates. * The lower adoption among seniors is attributed to the management of legacy codebases and long-standing habits formed before type hints were introduced to the language. ## Primary Drivers for Typing Adoption * **Incremental Integration:** Developers value the "gradual typing" approach, which allows them to add types to existing projects at their own pace without breaking the codebase. * **Improved Tooling and IDE Support:** Typing significantly enhances developer experience by enabling more accurate autocomplete, jump-to-definition, and inline documentation in IDEs. * **Bug Prevention and Readability:** Type hints act as living documentation that helps catch subtle bugs during refactoring and makes complex codebases easier for teams to reason about. * **Library Compatibility:** Features like Protocols and Generics are highly appreciated, particularly for their synergy with modern libraries like Pydantic and FastAPI that utilize type annotations at runtime. ## Technical Pain Points and Ecosystem Friction * **Third-Party Integration:** A major hurdle is the inconsistent quality or total absence of type stubs in massive libraries like NumPy, Pandas, and Django. * **Tooling Fragmentation:** Developers expressed frustration over inconsistencies between major type checkers like Mypy and Pyright, as well as the slow performance of Mypy in large projects. * **Conceptual Complexity:** Advanced features such as variance (co/contravariance), decorators, and complex nested Generics remain difficult for many developers to implement correctly. * **Runtime Limitations:** Because Python does not enforce types at the interpreter level, some developers find it difficult to justify the verbosity of typing when it offers no native runtime guarantees. ## Most Requested Type System Enhancements * **TypeScript Parity:** There is a strong demand for features found in TypeScript, specifically Intersection types (using the `&` operator), Mapped types, and Conditional types. * **Utility Types:** Developers are looking for built-in utilities like `Pick`, `Omit`, and `keyof` to handle dictionary shapes more effectively. * **Improved Structural Typing:** While `TypedDict` exists, respondents want more flexible, anonymous structural typing to handle complex data structures without excessive boilerplate. * **Performance and Enforcement:** There is a recurring request for an official, high-performance built-in type checker and optional runtime enforcement to bridge the gap between static analysis and execution. As the Python type system continues to mature, developers should prioritize incremental adoption in shared libraries and internal APIs to maximize the benefits of static analysis. While waiting for more advanced features like intersection types, focusing on tooling consistency—such as aligning team standards around a specific type checker—can mitigate much of the friction identified in the 2025 survey.

google

DS-STAR: A state-of-the-art versatile data science agent (opens in new tab)

DS-STAR is an advanced autonomous data science agent developed to handle the complexity and heterogeneity of real-world data tasks, ranging from statistical analysis to visualization. By integrating a specialized file analysis module with an iterative planning and verification loop, the system can interpret unstructured data and refine its reasoning steps dynamically based on execution feedback. This architecture allows DS-STAR to achieve state-of-the-art performance on major industry benchmarks, effectively bridging the gap between natural language queries and executable, verified code. ## Comprehensive Data File Analysis The framework addresses a major limitation of current agents—the over-reliance on structured CSV files—by implementing a dedicated analysis stage for diverse data formats. * The system automatically scans a directory to extract context from heterogeneous formats, including JSON, unstructured text, and markdown files. * A Python-based analysis script generates a textual summary of the data structure and content, which serves as the foundational context for the planning phase. * This module ensures the agent can navigate complex, multi-file environments where critical information is often spread across non-relational sources. ## Iterative Planning and Verification Architecture DS-STAR utilizes a sophisticated loop involving four specialized roles to mimic the workflow of a human expert conducting sequential analysis. * **Planner and Coder:** A Planner agent establishes high-level objectives, which a Coder agent سپس translates into executable Python scripts. * **LLM-based Verification:** A Verifier agent acts as a judge, assessing whether the generated code and its output are sufficient to solve the problem or if the reasoning is flawed. * **Dynamic Routing:** If the Verifier identifies gaps, a Router agent guides the refinement process by adding new steps or correcting errors, allowing the cycle to repeat for up to 10 rounds. * **Intermediate Review:** The agent reviews intermediate results before proceeding to the next step, similar to how data scientists use interactive environments like Google Colab. ## Benchmarking and State-of-the-Art Performance The effectiveness of the DS-STAR framework was validated through rigorous testing against existing agents like AutoGen and DA-Agent. * The agent secured the top rank on the public DABStep leaderboard, raising accuracy from 41.0% to 45.2% compared to previous best-performing models. * Performance gains were consistent across other benchmarks, including KramaBench (39.8% to 44.7%) and DA-Code (37.0% to 38.5%). * DS-STAR showed a significant advantage in "hard" tasks—those requiring the synthesis of information from multiple, varied data sources—demonstrating its superior versatility in complex environments. By automating the time-intensive tasks of data wrangling and verification, DS-STAR provides a robust template for the next generation of AI assistants. Organizations looking to scale their data science capabilities should consider adopting iterative agentic workflows that prioritize multi-format data understanding and self-correcting execution loops.

google

MLE-STAR: A state-of-the-art machine learning engineering agent (opens in new tab)

MLE-STAR is a state-of-the-art machine learning engineering agent designed to automate complex ML tasks by treating them as iterative code optimization challenges. Unlike previous agents that rely solely on an LLM’s internal knowledge, MLE-STAR integrates external web searches and targeted ablation studies to pinpoint and refine specific pipeline components. This approach allows the agent to achieve high-performance results, evidenced by its ability to win medals in 63% of Kaggle competitions within the MLE-Bench-Lite benchmark. ## External Knowledge and Targeted Ablation The core of MLE-STAR’s effectiveness lies in its ability to move beyond generic machine learning libraries by incorporating external research and specific performance testing. * The agent uses web search to retrieve task-specific, state-of-the-art models and approaches rather than defaulting to familiar libraries like scikit-learn. * Instead of modifying an entire script at once, the system conducts an ablation study to evaluate the impact of individual pipeline components, such as feature engineering or model selection. * By identifying which code blocks have the most significant impact on performance, the agent can focus its reasoning and optimization efforts where they are most needed. ## Iterative Refinement and Intelligent Ensembling Once the critical components are identified, MLE-STAR employs a specialized refinement process to maximize the effectiveness of the generated solution. * Targeted code blocks undergo iterative refinement based on LLM-suggested plans that incorporate feedback from prior experimental failures and successes. * The agent features a unique ensembling strategy where it proposes multiple candidate solutions and then designs its own method to merge them. * Rather than using simple validation-score voting, the agent iteratively improves the ensemble strategy itself, treating the combination of models as a distinct optimization task. ## Robustness and Safety Verification To ensure the generated code is both functional and reliable for real-world deployment, MLE-STAR incorporates three specialized diagnostic modules. * **Debugging Agent:** Automatically analyzes tracebacks and execution errors in Python scripts to provide iterative corrections. * **Data Leakage Checker:** Reviews the solution script prior to execution to ensure the model does not improperly access test dataset information during the training phase. * **Data Usage Checker:** Analyzes whether the script is utilizing all available data sources, preventing the agent from overlooking complex data formats in favor of simpler files like CSVs. By combining external grounding with a granular, component-based optimization strategy, MLE-STAR represents a significant shift in automated machine learning. For organizations looking to scale their ML workflows, such an agent suggests a future where the role of the engineer shifts from manual coding to high-level supervision of autonomous agents that can navigate the vast landscape of research and data engineering.

line

Implementing a RAG-Based Bot to (opens in new tab)

To address the operational burden of handling repetitive user inquiries for the AWX automation platform, LY Corporation developed a support bot utilizing Retrieval-Augmented Generation (RAG). By combining internal documentation with historical Slack thread data, the system provides automated, context-aware answers that significantly reduce manual SRE intervention. This approach enhances service reliability by ensuring users receive immediate assistance while allowing engineers to focus on high-priority development tasks. ### Technical Infrastructure and Stack * **Slack Integration**: The bot is built using the **Bolt for Python** framework to handle real-time interactions within the company’s communication channels. * **LLM Orchestration**: **LangChain** is used to manage the RAG pipeline; the developers suggest transitioning to LangGraph for teams requiring more complex multi-agent workflows. * **Embedding Model**: The **paraphrase-multilingual-mpnet-base-v2** (SBERT) model was selected to support multi-language inquiries from LY Corporation’s global workforce. * **Vector Database**: **OpenSearch** serves as the vector store, chosen for its availability as an internal PaaS and its efficiency in handling high-dimensional data. * **Large Language Model**: The system utilizes **OpenAI (ChatGPT) Enterprise**, which ensures business data privacy by preventing the model from training on internal inputs. ### Enhancing LLM Accuracy through RAG and Vector Search * **Overcoming LLM Limits**: Traditional LLMs suffer from "hallucinations," lack of up-to-date info, and opaque sourcing; RAG fixes this by providing the model with specific, trusted context during the prompt phase. * **Embedding and Vectorization**: Textual data from wikis and chats are converted into high-dimensional vectors, where semantically similar phrases (e.g., "Buy" and "Purchase") are stored in close proximity. * **k-NN Retrieval**: When a user asks a question, the bot uses **k-Nearest Neighbors (k-NN)** algorithms to retrieve the top *k* most relevant snippets of information from the vector database. * **Contextual Generation**: Rather than relying on its internal training data, the LLM generates a response based specifically on the retrieved snippets, leading to higher accuracy and domain-specific relevance. ### AWX Support Bot Workflow and Data Sources * **Multi-Source Indexing**: The bot references two main data streams: the official internal AWX guide wiki and historical Slack inquiry threads where previous solutions were discussed. * **Automated First Response**: The workflow begins when a user submits a query via a Slack workflow; the bot immediately processes the request and provides an initial AI-generated answer. * **Human-in-the-Loop Validation**: After receiving an answer, users can click "Issue Resolved" to close the ticket or "Call AWX Admin" if the AI's response was insufficient. * **Efficiency Gains**: This tiered approach filters out "RTFM" (Read The F***ing Manual) style questions, ensuring that human administrators only spend time on unique or complex technical issues. Implementing a RAG-based support bot is a highly effective strategy for SRE teams looking to scale their internal support without increasing headcount. For the best results, organizations should focus on maintaining clean internal documentation and selecting embedding models that reflect the linguistic diversity of their specific workforce.