AI Infrastructure

Overview

This course provides a comprehensive overview of the infrastructure and technologies required to build, deploy, and manage AI systems, with a focus on Large Language Models (LLMs). Students will gain a deep understanding of the AI workflow, from data acquisition and model training to deployment and monitoring. The course covers essential aspects of AI infrastructure, including machine learning pipelines, generative AI techniques, LLM infrastructure components, and LLM operations.

Instructor: Ioannis Papapanagiotou, PhD

Course Objectives

Upon successful completion of this course, students will be able to:

Explain the AI Infrastructure components, understand the AI workflows and be able to architect an AI systems (C1).
Demonstrate how to run AI systems in production with common frameworks based on MLOps and AIOps (C2).
Build applications that leverage Generative AI, Large Language Models (LLMs) (C3).
Identify what model and how to deploy a model for the use case including one or more of small LLMs, leverage multi-modal capabilities and a variety of Large Language Models (C4).

Key Topics

The course is structured around the following key topics:

AI Infrastructure Fundamentals: Covering the core components of AI infrastructure, AI workflows, AI components, AI compute, AI application frameworks, and Cloud vs On-Prem AI Infrastructure. Students will learn to define AI infrastructure components, explain AI workflows, and architect AI systems.
ML Infrastructure: Focusing on the components of ML infrastructure, including ML pipelines, model building, data challenges, MLOps, ML feature stores, and ML model stores. Students will learn to explain ML infrastructure components and implement ML pipelines and MLOps practices.
Generative AI: Exploring Generative AI concepts, Transformer Architecture, applications of Transformer Architecture, LLM parameters, Retrieval Augmentation, Small LLMs, Embedding Models, and Large Multimodal Models. Students will learn Retrieval Augmented Generation (RAG), the capabilities and limitations of LLMs, and how to combine these to build applications.
LLM Infrastructure: Detailing the data layer, model layer, deployment layer, interface layer, key takeaways, and Model Gardens (AWS Bedrock vs Google Vertex AI), and AWS Bedrock/AWS Sagemaker. Students will learn data requirements for LLMs, LLM architectures, and when/how to use different LLMs and Multimodal capabilities.
LLM Operations: Covering LLM Operations, LLM Security, LLMOps, LLM in Production, and LLM Hallucinations. Students will learn LLMOps concepts and practices, security risks and mitigation, and ethical implications of LLMs.

Hands on Labs/Assignments

The course includes hands-on labs and assignments designed to reinforce the concepts learned:

Homework #1: ML Infrastructure
Homework #2: Generative AI
Homework #3: LLM Infrastructure
Homework #4: LLM Operations

Miscellaneous

Syllabus ECE595: AI Infrastructure