ai-inference

2 posts

aws

Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs | AWS News Blog (opens in new tab)

Amazon has announced the general availability of EC2 G7e instances, a new hardware tier powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs designed for generative AI and high-end graphics. These instances deliver up to 2.3 times the inference performance of their G6e predecessors while providing significant upgrades to memory and bandwidth. This launch aims to provide a cost-effective solution for running medium-sized AI models and complex spatial computing workloads at scale. **Blackwell GPU and Memory Advancements** * The G7e instances feature NVIDIA RTX PRO 6000 Blackwell GPUs, which provide twice the memory and 1.85 times the memory bandwidth of the G6e generation. * Each GPU provides 96 GB of memory, allowing users to run medium-sized models—such as those with up to 70 billion parameters—on a single GPU using FP8 precision. * The architecture is optimized for both spatial computing and scientific workloads, offering the highest graphics performance currently available in the EC2 portfolio. **High-Speed Connectivity and Multi-GPU Scaling** * To support large-scale models, G7e instances utilize NVIDIA GPUDirect P2P, enabling direct communication between GPUs over PCIe interconnects with minimal latency. * These instances offer four times the inter-GPU bandwidth compared to the L40s GPUs found in G6e instances, facilitating more efficient data transfer in multi-GPU configurations. * Total GPU memory can scale up to 768 GB within a single node, supporting massive inference tasks across eight interconnected GPUs. **Networking and Storage Performance** * G7e instances provide up to 1,600 Gbps of network bandwidth, a four-fold increase over previous generations, making them suitable for small-scale multi-node clusters. * Support for NVIDIA GPUDirect Remote Direct Memory Access (RDMA) via Elastic Fabric Adapter (EFA) reduces latency for remote GPU-to-GPU communication. * The instances support GPUDirect Storage with Amazon FSx for Lustre, achieving throughput speeds up to 1.2 Tbps to ensure rapid model loading and data processing. **System Specifications and Configurations** * Under the hood, G7e instances are powered by Intel Emerald Rapids processors and support up to 192 vCPUs and 2,048 GiB of system memory. * Local storage options include up to 15.2 TB of NVMe SSD capacity to handle high-speed data caching and local processing. * The instance family ranges from the g7e.2xlarge (1 GPU, 8 vCPUs) to the g7e.48xlarge (8 GPUs, 192 vCPUs). For developers ready to transition to Blackwell-based architecture, these instances are accessible through AWS Deep Learning AMIs (DLAMI). They represent a major step forward for organizations needing to balance the high memory requirements of modern LLMs with the cost efficiencies of the G-series instance family.

aws

Amazon EC2 X8i instances powered by custom Intel Xeon 6 processors are generally available for memory-intensive workloads | AWS News Blog (opens in new tab)

Amazon has announced the general availability of EC2 X8i instances, specifically engineered for memory-intensive workloads such as SAP HANA, large-scale databases, and data analytics. Powered by custom Intel Xeon 6 processors with a 3.9 GHz all-core turbo frequency, these instances provide a significant performance leap over the previous X2i generation. By offering up to 6 TB of memory and substantial improvements in throughput, X8i instances represent the highest-performing Intel-based memory-optimized option in the AWS cloud. ### Performance Enhancements and Processor Architecture * **Custom Silicon:** The instances utilize custom Intel Xeon 6 processors available exclusively on AWS, delivering the fastest memory bandwidth among comparable Intel cloud processors. * **Memory and Bandwidth:** X8i provides 1.5 times more memory capacity (up to 6 TB) and 3.4 times more memory bandwidth compared to previous-generation X2i instances. * **Workload Benchmarks:** Real-world performance gains include a 50% increase in SAP Application Performance Standard (SAPS), 47% faster PostgreSQL performance, 88% faster Memcached performance, and a 46% boost in AI inference. ### Scalable Instance Sizes and Throughput * **Flexible Sizing:** The instances are available in 14 sizes, including new larger formats such as the 48xlarge, 64xlarge, and 96xlarge. * **Bare Metal Options:** Two bare metal sizes (metal-48xl and metal-96xl) are available for workloads requiring direct access to physical hardware resources. * **Networking and Storage:** The architecture supports up to 100 Gbps of network bandwidth with Elastic Fabric Adapter (EFA) support and up to 80 Gbps of Amazon EBS throughput. * **Bandwidth Control:** Support for Instance Bandwidth Configuration (IBC) allows users to customize the allocation of performance between networking and EBS to suit specific application needs. ### Cost Efficiency and Use Cases * **Licensing Optimization:** In preview testing, customers like Orion reduced SQL Server licensing costs by 50% by maintaining performance thresholds with fewer active cores compared to older instance types. * **Enterprise Applications:** The instances are SAP-certified, making them ideal for RISE with SAP and other high-demand ERP environments. * **Broad Utility:** Beyond databases, the instances are optimized for Electronic Design Automation (EDA) and complex data analytics that require massive memory footprints. For organizations managing massive datasets or expensive licensed database software, migrating to X8i instances offers a clear path to both performance optimization and infrastructure cost reduction. These instances are currently available in the US East (N. Virginia), US West (Oregon), and Europe (Ireland) regions through On-Demand, Spot, and Reserved purchasing models.