Generative AI
Advanced
English
Amazon SageMaker helps data scientists prepare, build, train, deploy, and monitor machine learning (ML) models. SageMaker brings together a broad set of capabilities, including access to distributed training libraries, open source and foundation models (FMs).
This course introduces experienced data scientists and ML Engineers to the challenges of building small to large language models (LLM). You will learn about the different storage and ingestion options available to process large amount of text. The course also discusses the use of distributed training libraries and Amazon SageMaker Hyperpod to reduce training time by splitting workloads across more than a thousand AI accelerators. You will learn about the different approaches to train LLMs to align with human feedback and how to perform that alignment using SageMaker. The course also discusses the challenges of deploying LLMs and the optimizations available to improve inference performance.
Activities
This course includes text instruction, illustrative graphics, knowledge check questions, and video demonstrations of labs you can run in your own Amazon Web Services (AWS) account.
Note: The files used in the video demonstrations in this course are available in the Building Language Models on AWS git repository.
Course objectives
After completing this course, data scientists can confidently build, train, and tune performant language models on AWS using SageMaker.
In this course, you will learn to do the following:
Intended audience
Prerequisites
Course outline
Module 1: Course Series Introduction
Large Language Model Basics
Course Series Outline
Module 2: Addressing the Challenges of Building Language Models
Common Challenges
Multi-Machine Training Solutions
Performance Optimization Solutions
Wrap Up
Module 3: Using Amazon SageMaker for Training Language Models
Configuring SageMaker Studio
SageMaker Infrastructure
Working with the SageMaker Python SDK
Wrap Up
Demonstration 1: Setting up Amazon SageMaker Studio
Module 4: Ingesting Language Model Data
Preparing Data
Analyzing Data Ingestion Options
Wrap Up
Module 5: Training Large Language Models
Creating a SageMaker Training Job
Optimizing Your SageMaker Training Job
Using Distributed Training on SageMaker
Wrap Up
Demonstration 2: Training Your First Language Model with Amazon SageMaker
Demonstration 3: Launching a SageMaker Training Job using the @remote decorator
Demonstration 4: Distributed Training with SageMaker Model Parallel
Demonstration 5: Fine-Tune and Deploy Your Own LLM Using AWS Custom Silicon
Module 6: Aligning Language Models to Human Feedback
Aligning Language Models
Wrap Up
Demonstration 6: E2E Preference Alignment with RLHF and Multi-Adapter PPO
Module 7: Building language models with SageMaker HyperPod
Amazon SageMaker HyperPod
Wrap Up
Demonstration 7: Building Your Own Language Models with SageMaker HyperPod
Module 8: Deploying Language Models
Deploying a Model in SageMaker
Deploying Models for Inference
Deploying Large Language Models for Inference
Additional Considerations
Wrap Up
Demonstration 8: Deploy Mixtral-8x7B on Amazon SageMaker Using the LMI Container
Module 9: Call to Action and Additional Resources
Review