AI Infrastructure                                
       

AI Infrastructure

       

Overview 

       

          This course provides a comprehensive overview of the infrastructure and technologies required to build, deploy, and manage AI systems, with a focus on Large Language Models (LLMs). Students will gain a deep understanding of the AI workflow, from data acquisition and model training to deployment and monitoring. The course covers essential aspects of AI infrastructure, including machine learning pipelines, generative AI techniques, LLM infrastructure components, and LLM operations.        

       

Instructor: Ioannis Papapanagiotou, PhD

       

Course Objectives

       

Upon successful completion of this course, students will be able to:

       
             
  •            

    Explain the AI Infrastructure components, understand the AI workflows and be able to architect an AI systems (C1).

             
  •          
  •            

    Demonstrate how to run AI systems in production with common frameworks based on MLOps and AIOps (C2).

             
  •          
  •            

    Build applications that leverage Generative AI, Large Language Models (LLMs) (C3).

             
  •          
  •            

    Identify what model and how to deploy a model for the use case including one or more of small LLMs, leverage multi-modal capabilities and a variety of Large Language Models (C4).

             
  •        
       

Key Topics

       

The course is structured around the following key topics:

       
             
  •            

    AI Infrastructure Fundamentals: Covering the core components of AI infrastructure, AI workflows, AI components, AI compute, AI application frameworks, and Cloud vs On-Prem AI Infrastructure. Students will learn to define AI infrastructure components, explain AI workflows, and architect AI systems.

             
  •          
  •            

    ML Infrastructure: Focusing on the components of ML infrastructure, including ML pipelines, model building, data challenges, MLOps, ML feature stores, and ML model stores. Students will learn to explain ML infrastructure components and implement ML pipelines and MLOps practices.

             
  •          
  •            

    Generative AI: Exploring Generative AI concepts, Transformer Architecture, applications of Transformer Architecture, LLM parameters, Retrieval Augmentation, Small LLMs, Embedding Models, and Large Multimodal Models. Students will learn Retrieval Augmented Generation (RAG), the capabilities and limitations of LLMs, and how to combine these to build applications.

             
  •          
  •            

    LLM Infrastructure: Detailing the data layer, model layer, deployment layer, interface layer, key takeaways, and Model Gardens (AWS Bedrock vs Google Vertex AI), and AWS Bedrock/AWS Sagemaker. Students will learn data requirements for LLMs, LLM architectures, and when/how to use different LLMs and Multimodal capabilities.

             
  •          
  •            

    LLM Operations: Covering LLM Operations, LLM Security, LLMOps, LLM in Production, and LLM Hallucinations. Students will learn LLMOps concepts and practices, security risks and mitigation, and ethical implications of LLMs.

             
  •        
       

Hands on Labs/Assignments

       

The course includes hands-on labs and assignments designed to reinforce the concepts learned:

       
             
  •            

    Homework #1: ML Infrastructure

             
  •          
  •            

    Homework #2: Generative AI

             
  •          
  •            

    Homework #3: LLM Infrastructure

             
  •          
  •            

    Homework #4: LLM Operations

             
  •        
       

Miscellaneous