AI Infrastructure
Overview
This course provides a comprehensive overview of the
infrastructure and technologies required to build, deploy, and
manage AI systems, with a focus on Large Language Models (LLMs).
Students will gain a deep understanding of the AI workflow, from
data acquisition and model training to deployment and monitoring.
The course covers essential aspects of AI infrastructure, including
machine learning pipelines, generative AI techniques, LLM
infrastructure components, and LLM operations.
Instructor: Ioannis Papapanagiotou, PhD
Course Objectives
Upon successful completion of this course, students will be able to:
-
Explain the AI Infrastructure components, understand the AI
workflows and be able to architect an AI systems (C1).
-
Demonstrate how to run AI systems in production with common
frameworks based on MLOps and AIOps (C2).
-
Build applications that leverage Generative AI, Large Language
Models (LLMs) (C3).
-
Identify what model and how to deploy a model for the use case
including one or more of small LLMs, leverage multi-modal
capabilities and a variety of Large Language Models (C4).
Key Topics
The course is structured around the following key topics:
-
AI Infrastructure Fundamentals: Covering the core
components of AI infrastructure, AI workflows, AI components, AI
compute, AI application frameworks, and Cloud vs On-Prem AI
Infrastructure. Students will learn to define AI infrastructure
components, explain AI workflows, and architect AI systems.
-
ML Infrastructure: Focusing on the components of ML
infrastructure, including ML pipelines, model building, data
challenges, MLOps, ML feature stores, and ML model stores.
Students will learn to explain ML infrastructure components and
implement ML pipelines and MLOps practices.
-
Generative AI: Exploring Generative AI concepts,
Transformer Architecture, applications of Transformer
Architecture, LLM parameters, Retrieval Augmentation, Small
LLMs, Embedding Models, and Large Multimodal Models. Students
will learn Retrieval Augmented Generation (RAG), the
capabilities and limitations of LLMs, and how to combine these
to build applications.
-
LLM Infrastructure: Detailing the data layer, model
layer, deployment layer, interface layer, key takeaways, and
Model Gardens (AWS Bedrock vs Google Vertex AI), and AWS
Bedrock/AWS Sagemaker. Students will learn data requirements for
LLMs, LLM architectures, and when/how to use different LLMs and
Multimodal capabilities.
-
LLM Operations: Covering LLM Operations, LLM Security,
LLMOps, LLM in Production, and LLM Hallucinations. Students will
learn LLMOps concepts and practices, security risks and
mitigation, and ethical implications of LLMs.
Hands on Labs/Assignments
The course includes hands-on labs and assignments designed to
reinforce the concepts learned:
-
Homework #1: ML Infrastructure
-
Homework #2: Generative AI
-
Homework #3: LLM Infrastructure
-
Homework #4: LLM Operations
Miscellaneous
|