AWS announces Parallel Computing Service (AWS PCS)

On Aug 28, 2024, AWS announced Parallel Computing Service (AWS PCS), which allows customers to build scientific and engineering models to quickly and easily set up and manage high-performance computing infrastructure to accelerate R&D at scale. AWS PCS is designed for a varied range of traditional and emerging, compute or data-intensive, engineering and scientific workloads across areas such as computational fluid dynamics, weather modeling, finite element analysis, electronic design automation, and reservoir simulations using familiar ways of preparing, executing, and analyzing simulations and computations.

Marvel Fusion, Maxar, RONIN, and The National Renewable Energy Laboratory were among the first customers and partners to use AWS Parallel Computing Service

SEATTLE–(BUSINESS WIRE)– Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), announced the general availability of AWS Parallel Computing Service. This new managed service supports customers in easily setting up and managing high-performance computing (HPC) clusters so they can run scientific and engineering workloads at virtually any scale on AWS. The service makes it easy for system administrators to build clusters using Amazon Elastic Compute Cloud (Amazon EC2) instances, low-latency networking, and storage optimized for HPC workloads. With AWS Parallel Computing Service, scientists and engineers can swiftly scale simulations to validate models and designs. At the same time, system administrators and integrators can build and maintain HPC clusters on AWS using Slurm, the most widespread open-source HPC workload manager. This service accelerates innovation in areas such as fast-tracking drug discovery, uncovering genomic insights, building engineering designs, running weather applications, and building scientific and engineering models.

“Managing HPC workloads, particularly the most complex and challenging extreme-scale workloads, is extraordinarily difficult. Our aim is that every scientist and engineer using AWS Parallel Computing Service, regardless of organization size, is the most productive person in their field because they have the same top-tier HPC capabilities as large enterprises to solve the world’s toughest challenges any time they need to and at any scale.”

Ian Colle, director of advanced compute and simulation at AWS

AWS has a history of innovation in supporting HPC workloads, including releases like the open source cluster orchestration toolkit AWS ParallelCluster, fully managed batch computing service AWS Batch, low latency network interconnect Elastic Fabric Adapter, Amazon FSx for Lustre high-performance storage, and dedicated AMD, Intel, and Graviton-based HPC compute instances, the latter delivering up to 65% better price-performance over comparable compute optimized x86-based instances.

In November 2018, AWS introduced AWS ParallelCluster, an AWS-supported open-source cluster management tool that supports the deployment and management of HPC clusters in the AWS Cloud. With AWS ParallelCluster, customers can quickly build and deploy proof of concept and production HPC compute environments. They can use the AWS ParallelCluster Command-Line interface, API, Python library, and the user interface installed from open-source packages. They are responsible for updates, including tearing down and redeploying clusters.

A large number of customers from a wide range of industries have migrated their HPC workloads to AWS to fast-track drug discovery, uncover genomic insights, maximize energy resources, and spin up supercomputers with millions of cores. Hence, AWS continues innovation in HPC by releasing a fully managed and comprehensive HPC service, which eliminates the undifferentiated heavy lifting of creating and managing HPC clusters.

AWS PCS streamlines HPC environments managed by AWS and is accessible through the AWS Management Console, AWS SDK, and AWS Command-Line Interface (AWS CLI). The system administrators can create managed Slurm clusters that use their compute and storage configurations, identity, and job allocation preferences. AWS PCS uses Slurm, a highly scalable, fault-tolerant job scheduler used across a wide range of HPC customers, for scheduling and orchestrating simulations. End users such as scientists, researchers, and engineers can log in to AWS PCS clusters to run and manage HPC jobs, use interactive software on virtual desktops, and access data. They can bring their workloads to AWS PCS quickly, without significant effort to port code.

Also Read: Understanding Amazon Elastic Compute Cloud (EC2)

Table of Contents

Getting started with AWS Parallel Computing Service

To try out AWS PCS, you can use the tutorial to create a simple cluster in the AWS documentation. Initially, you created a virtual private cloud (VPC) with an AWS CloudFormation template. You shared storage in Amazon Elastic File System (Amazon EFS) within your account for the AWS Region, where you will try AWS PCS. To learn more, visit Create a VPC and Create shared storage in the AWS documentation.

Things to know

Below are a couple of things that you should know about this feature:

Slurm versions – AWS PCS initially supports Slurm 23.11 and oﬀers mechanisms designed to facilitate customers to upgrade their Slurm major versions once new versions are added. Moreover, AWS PCS is designed to automatically update the Slurm controller with patch versions. To learn more, visit Slurm versions in the AWS documentation.
Capacity Reservations — You can reserve EC2 capacity in a particular Availability Zone and for a specific duration using On-Demand Capacity Reservations to ensure that you have the necessary compute capacity available when you need it. To learn more, visit Capacity Reservations in the AWS documentation.
Network file systems – You can attach network storage volumes where data and files can be written and accessed, including Amazon FSx for NetApp ONTAP, Amazon FSx for OpenZFS, and Amazon File Cache as well as Amazon EFS and Amazon FSx for Lustre. You can also use self-managed volumes, such as NFS servers. To learn more, visit Network file systems in the AWS documentation.

Availability

AWS Parallel Computing Service is now available in the US East (N. Virginia), AWS US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm) Regions.

AWS PCS launches all resources in the user’s AWS account. You will be billed accordingly for those resources. For more information, see the AWS PCS Pricing page.