AWS / gen-ai

7 posts

aws

Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs (opens in new tab)

Amazon has announced the general availability of EC2 G7e instances, a new hardware tier powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs designed for generative AI and high-end graphics. These instances deliver up to 2.3 times the inference performance of their G6e predecessors while providing significant upgrades to memory and bandwidth. This launch aims to provide a cost-effective solution for running medium-sized AI models and complex spatial computing workloads at scale. **Blackwell GPU and Memory Advancements** * The G7e instances feature NVIDIA RTX PRO 6000 Blackwell GPUs, which provide twice the memory and 1.85 times the memory bandwidth of the G6e generation. * Each GPU provides 96 GB of memory, allowing users to run medium-sized models—such as those with up to 70 billion parameters—on a single GPU using FP8 precision. * The architecture is optimized for both spatial computing and scientific workloads, offering the highest graphics performance currently available in the EC2 portfolio. **High-Speed Connectivity and Multi-GPU Scaling** * To support large-scale models, G7e instances utilize NVIDIA GPUDirect P2P, enabling direct communication between GPUs over PCIe interconnects with minimal latency. * These instances offer four times the inter-GPU bandwidth compared to the L40s GPUs found in G6e instances, facilitating more efficient data transfer in multi-GPU configurations. * Total GPU memory can scale up to 768 GB within a single node, supporting massive inference tasks across eight interconnected GPUs. **Networking and Storage Performance** * G7e instances provide up to 1,600 Gbps of network bandwidth, a four-fold increase over previous generations, making them suitable for small-scale multi-node clusters. * Support for NVIDIA GPUDirect Remote Direct Memory Access (RDMA) via Elastic Fabric Adapter (EFA) reduces latency for remote GPU-to-GPU communication. * The instances support GPUDirect Storage with Amazon FSx for Lustre, achieving throughput speeds up to 1.2 Tbps to ensure rapid model loading and data processing. **System Specifications and Configurations** * Under the hood, G7e instances are powered by Intel Emerald Rapids processors and support up to 192 vCPUs and 2,048 GiB of system memory. * Local storage options include up to 15.2 TB of NVMe SSD capacity to handle high-speed data caching and local processing. * The instance family ranges from the g7e.2xlarge (1 GPU, 8 vCPUs) to the g7e.48xlarge (8 GPUs, 192 vCPUs). For developers ready to transition to Blackwell-based architecture, these instances are accessible through AWS Deep Learning AMIs (DLAMI). They represent a major step forward for organizations needing to balance the high memory requirements of modern LLMs with the cost efficiencies of the G-series instance family.

aws

AWS Weekly Roundup: AWS re:Invent keynote recap, on-demand videos, and more (December 8, 2025) (opens in new tab)

The December 8, 2025, AWS Weekly Roundup recaps the major themes from AWS re:Invent, signaling a significant industry transition from AI assistants to autonomous AI agents. While technical innovation in infrastructure remains a priority, the event underscored that developers remain at the heart of the AWS mission, empowered by new tools to automate complex tasks using natural language. This shift represents a "renaissance" in cloud computing, where purpose-built infrastructure is now designed to support the non-deterministic nature of agentic workloads. ## Community Recognition and the Now Go Build Award * Raphael Francis Quisumbing (Rafi) from the Philippines was honored with the Now Go Build Award, presented by Werner Vogels. * A veteran of the ecosystem, Quisumbing has served as an AWS Hero since 2015 and has co-led the AWS User Group Philippines for over a decade. * The recognition emphasizes AWS's continued focus on community dedication and the role of individual builders in empowering regional developer ecosystems. ## The Evolution from AI Assistants to Agents * AWS CEO Matt Garman identified AI agents as the next major inflection point for the industry, moving beyond simple chat interfaces to systems that perform tasks and automate workflows. * Dr. Swami Sivasubramanian highlighted a paradigm shift where natural language serves as the primary interface for describing complex goals. * These agents are designed to autonomously generate plans, write necessary code, and call various tools to execute complete solutions without constant human intervention. * AWS is prioritizing the development of production-ready infrastructure that is secure and scalable specifically to handle the "non-deterministic" behavior of these AI agents. ## Core Infrastructure and the Developer Renaissance * Despite the focus on AI, AWS reaffirmed that its core mission remains the "freedom to invent," keeping developers central to its 20-year strategy. * Leaders Peter DeSantis and Dave Brown reinforced that foundational attributes—security, availability, and performance—remain the non-negotiable pillars of the AWS cloud. * The integration of AI agents is framed as a way to finally realize material business returns on AI investments by moving from experimental use cases to automated business logic. To maximize the value of these updates, organizations should begin evaluating how to transition from simple LLM implementations to agentic frameworks that can execute end-to-end business processes. Reviewing the on-demand keynote sessions from re:Invent 2025 is recommended for technical teams looking to implement the latest secure, agent-ready infrastructure.

aws

Amazon Bedrock adds reinforcement fine-tuning simplifying how developers build smarter, more accurate AI models (opens in new tab)

Amazon Bedrock has introduced reinforcement fine-tuning, a new model customization capability that allows developers to build more accurate and cost-effective AI models using feedback-driven training. By moving away from the requirement for massive labeled datasets in favor of reward signals, the platform enables average accuracy gains of 66% while automating the complex infrastructure typically associated with advanced machine learning. This approach allows organizations to optimize smaller, faster models for specific business needs without sacrificing performance or incurring the high costs of larger model variants. **Challenges of Traditional Model Customization** * Traditional fine-tuning often requires massive, high-quality labeled datasets and expensive human annotation, which can be a significant barrier for many organizations. * Developers previously had to choose between settle for generic "out-of-the-box" results or managing the high costs and complexity of large-scale infrastructure. * The high barrier to entry for advanced reinforcement learning techniques often required specialized ML expertise that many development teams lack. **Mechanics of Reinforcement Fine-Tuning** * The system uses an iterative feedback loop where models improve based on reward signals that judge the quality of responses against specific business requirements. * Reinforcement Learning with Verifiable Rewards (RLVR) utilizes rule-based graders to provide objective feedback for tasks such as mathematics or code generation. * Reinforcement Learning from AI Feedback (RLAIF) uses AI-driven evaluations to help models understand preference and quality without manual human intervention. * The workflow can be powered by existing API logs within Amazon Bedrock or by uploading training datasets, eliminating the need for complex infrastructure setup. **Performance and Security Advantages** * The technique achieves an average accuracy improvement of 66% over base models, enabling smaller models to perform at the level of much larger alternatives. * Current support includes the Amazon Nova 2 Lite model, which helps developers optimize for both speed and price-to-performance. * All training data and customization processes remain within the secure AWS environment, ensuring that proprietary data is protected and compliant with organizational security standards. Developers should consider reinforcement fine-tuning as a primary strategy for optimizing smaller models like Amazon Nova 2 Lite to achieve high-tier performance at a lower cost. This capability is particularly recommended for specialized tasks like reasoning and coding where objective reward functions can be used to rapidly iterate and improve model accuracy.

aws

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning (opens in new tab)

Amazon SageMaker AI has introduced a new serverless customization capability designed to accelerate the fine-tuning of popular models like Llama, DeepSeek, and Amazon Nova. By automating resource provisioning and providing an intuitive interface for advanced reinforcement learning techniques, this feature reduces the model customization lifecycle from months to days. This end-to-end workflow allows developers to focus on model performance rather than infrastructure management, from initial training through to final deployment. **Automated Infrastructure and Model Support** * The service provides a serverless environment where SageMaker AI automatically selects and provisions compute resources based on the specific model architecture and dataset size. * Supported models include a broad range of high-performance options such as Amazon Nova, DeepSeek, GPT-OSS, Meta Llama, and Qwen. * The feature is accessible directly through the Amazon SageMaker Studio interface, allowing users to manage their entire model catalog in one location. **Advanced Customization and Reinforcement Learning** * Users can choose from several fine-tuning techniques, including traditional Supervised Fine-Tuning (SFT) and more advanced methods. * The platform supports modern optimization techniques such as Direct Preference Optimization (DPO), Reinforcement Learning from Verifiable Rewards (RLVR), and Reinforcement Learning from AI Feedback (RLAIF). * To simplify the process, SageMaker AI provides recommended defaults for hyperparameters like batch size, learning rate, and epochs based on the selected tuning technique. **Experiment Tracking and Security** * The workflow introduces a serverless MLflow application, enabling seamless experiment tracking and performance monitoring without additional setup. * Advanced configuration options allow for fine-grained control over network encryption and storage volume encryption to ensure data security. * The "Continue customization" feature allows for iterative tuning, where users can adjust hyperparameters or apply different techniques to an existing customized model. **Evaluation and Deployment Flexibility** * Built-in evaluation tools allow developers to compare the performance of their customized models against the original base models to verify improvements. * Once a model is finalized, it can be deployed with a few clicks to either Amazon SageMaker or Amazon Bedrock. * A centralized "My Models" dashboard tracks all custom iterations, providing detailed logs and status updates for every training and evaluation job. This serverless approach is highly recommended for teams that need to adapt large language models to specific domains quickly without the operational overhead of managing GPU clusters. By utilizing the integrated evaluation and multi-platform deployment options, organizations can transition from experimentation to production-ready AI more efficiently.

aws

Amazon Bedrock AgentCore adds quality evaluations and policy controls for deploying trusted AI agents (opens in new tab)

AWS has introduced several new capabilities to Amazon Bedrock AgentCore designed to remove the trust and quality barriers that often prevent AI agents from moving into production environments. These updates, which include granular policy controls and sophisticated evaluation tools, allow developers to implement strict operational boundaries and monitor real-world performance at scale. By balancing agent autonomy with centralized verification, AgentCore provides a secure framework for deploying highly capable agents across enterprise workflows. **Governance through Policy in AgentCore** * This feature establishes clear boundaries for agent actions by intercepting tool calls via the AgentCore Gateway before they are executed. * By operating outside of the agent’s internal reasoning loop, the policy layer acts as an independent verification system that treats the agent as an autonomous actor requiring permission. * Developers can define fine-grained permissions to ensure agents do not access sensitive data inappropriately or take unauthorized actions within external systems. **Quality Monitoring with AgentCore Evaluations** * The new evaluation framework allows teams to monitor the quality of AI agents based on actual behavior rather than theoretical simulations. * Built-in evaluators provide standardized metrics for critical dimensions such as helpfulness and correctness. * Organizations can also implement custom evaluators to ensure agents meet specific business-logic requirements and industry-specific compliance standards. **Enhanced Memory and Communication Features** * New episodic functionality in AgentCore Memory introduces a long-term strategy that allows agents to learn from past experiences and apply successful solutions to similar future tasks. * Bidirectional streaming in the AgentCore Runtime supports the deployment of advanced voice agents capable of handling natural, simultaneous conversation flows. * These enhancements focus on improving consistency and user experience, enabling agents to handle complex, multi-turn interactions with higher reliability. **Real-World Application and Performance** * The AgentCore SDK has seen rapid adoption with over 2 million downloads, supporting diverse use cases from content generation at the PGA TOUR to financial data analysis at Workday. * Case studies highlight significant operational gains, such as a 1,000 percent increase in content writing speed and a 50 percent reduction in problem resolution time through improved observability. * The platform emphasizes 100 percent traceability of agent decisions, which is critical for organizations transitioning from reactive to proactive AI-driven operations. To successfully scale AI agents, organizations should transition from simple prompt engineering to a robust agentic architecture. Leveraging these new policy and evaluation tools will allow development teams to maintain the necessary control and visibility required for customer-facing and mission-critical deployments.

aws

Amazon OpenSearch Service improves vector database performance and cost with GPU acceleration and auto-optimization (opens in new tab)

Amazon OpenSearch Service has introduced serverless GPU acceleration and auto-optimization features designed to enhance the performance and cost-efficiency of large-scale vector databases. These updates allow users to build vector indexes up to ten times faster at a quarter of the traditional indexing cost, enabling the creation of billion-scale databases in under an hour. By automating complex tuning processes, OpenSearch Service simplifies the deployment of generative AI and high-speed search applications. ### GPU Acceleration for Rapid Indexing The new serverless GPU acceleration streamlines the creation of vector data structures by offloading intensive workloads to specialized hardware. * **Performance Gains:** Indexing speed is increased by 10x compared to non-GPU configurations, significantly reducing the time-to-market for data-heavy applications. * **Cost Efficiency:** Indexing costs are reduced to approximately 25% of standard costs, and users only pay for active processing through OpenSearch Compute Units (OCU) rather than idle instance time. * **Serverless Management:** There is no need to provision or manage GPU instances manually; OpenSearch Service automatically detects acceleration opportunities and isolates workloads within the user's Amazon VPC. * **Operational Scope:** Acceleration is automatically applied to both initial indexing and subsequent force-merge operations. ### Automated Vector Index Optimization Auto-optimization removes the requirement for deep vector expertise by automatically balancing competing performance metrics. * **Simplified Tuning:** The system replaces manual index tuning—which can traditionally take weeks—with automated configurations. * **Resource Balancing:** The tool finds the optimal trade-off between search latency, search quality (recall rates), and memory requirements. * **Improved Accuracy:** Users can achieve higher recall rates and better cost savings compared to using default, unoptimized index configurations. ### Configuration and Integration These features can be integrated into new or existing OpenSearch Service domains and Serverless collections through the AWS Console or CLI. * **CLI Activation:** Users can enable acceleration on existing domains using the `update-domain-config` command with the `--aiml-options` flag set to enable `ServerlessVectorAcceleration`. * **Index Settings:** To leverage GPU processing, users must create a vector index with specific settings, notably setting `index.knn.remote_index_build.enabled` to `true`. * **Supported Workloads:** The service supports standard OpenSearch operations, including the Bulk API for adding vector data and text embeddings. For organizations managing large-scale vector workloads for RAG (Retrieval-Augmented Generation) or semantic search, enabling GPU acceleration is a highly recommended step to reduce operational overhead. Developers should transition existing indexes to include the `remote_index_build` setting to take immediate advantage of the improved speed and reduced OCU pricing.

aws

Amazon Bedrock adds 18 fully managed open weight models, including the new Mistral Large 3 and Ministral 3 models (opens in new tab)

Amazon Bedrock has significantly expanded its generative AI offerings by adding 18 new fully managed open-weight models from providers including Google, Mistral AI, NVIDIA, and OpenAI. This update brings the platform's total to nearly 100 serverless models, allowing developers to leverage a broad spectrum of specialized capabilities through a single, unified API. By providing access to these high-performing models without requiring infrastructure changes, AWS enables organizations to rapidly evaluate and deploy the most cost-effective and capable tools for their specific workloads. ### Specialized Mistral AI Releases The launch features four new models from Mistral AI, headlined by Mistral Large 3 and the edge-optimized Ministral series. * **Mistral Large 3:** Optimized for long-context tasks, multimodal reasoning, and instruction reliability, making it suitable for complex coding assistance and multilingual enterprise knowledge work. * **Ministral 3 (3B, 8B, and 14B):** These models are specifically designed for edge-optimized deployments on a single GPU. * **Use Cases:** While the 3B model excels at real-time translation and data extraction on low-resource devices, the 14B version is built for advanced local agentic workflows where privacy and hardware constraints are primary concerns. ### Broadened Model Provider Portfolio Beyond the Mistral updates, AWS has integrated several other open-weight options to address diverse industry requirements ranging from mobile applications to global scaling. * **Google Gemma 3 4B:** An efficient multimodal model designed to run locally on laptops, supporting on-device AI and multilingual processing. * **Global Provider Support:** The expansion includes models from MiniMax AI, Moonshot AI, NVIDIA, OpenAI, and Qwen, ensuring a competitive variety of reasoning and processing capabilities. * **Multimodal Capabilities:** Many of the new additions support vision-based tasks, such as image captioning and document understanding, alongside traditional text-based functions. ### Streamlined AI Development and Integration The primary technical advantage of this update is the ability to swap between diverse models using the Amazon Bedrock unified API. * **Infrastructure Consistency:** Developers can switch to newer, more efficient models without rewriting application code or managing underlying servers. * **Evaluation and Deployment:** The serverless architecture allows for immediate testing of different model weights (such as moving from 3B to 14B) to find the optimal balance between performance and latency. * **Enterprise Tooling:** These models integrate with existing Bedrock features, allowing for simplified agentic workflows and tool-use implementations. To take full advantage of these updates, developers should utilize the Bedrock console to experiment with the new Mistral and Gemma models for edge and multimodal use cases. The unified API structure makes it practical to run A/B tests between these open-weight models and established industry favorites to optimize for specific cost and performance targets.