Lectures - CSc 59866-E: Senior Project I - AI Agents for Decision Making in the Real World / Spring 2026

January 26, 2026
tl;dr: Introduction to Senior Project I
[slides]

Introduction to Senior Project I.

January 28, 2026
tl;dr: Overview of Class Expectations
[slides]

Overview of course expectations, deliverables, and grading criteria.

February 2, 2026
tl;dr: Research Methods and Basic AI Evaluation
[slides]

Suggested Readings:

How to Read a Paper (S. Keshav)

Overview of research methods, how to read research papers, and an introduction to AI evaluation metrics (Precision, Recall, F-Measure).

February 4, 2026
tl;dr: Single vs. Multi-Agent AI and Complexity
[slides]

Introduction to Single-Agent vs. Multi-Agent systems, GridWorld examples, Markov Decision Processes (MDP), and the curse of dimensionality in multi-agent environments.

February 9, 2026
tl;dr: Generalizability and Reward Modeling
[slides]

Overview of Generalizability in AI Agents versus simple memorization. The lecture covers Reward Modeling strategies, including the trade-offs between Dense (hackable) and Sparse (slow) rewards, as well as advanced techniques like Curriculum Learning and Inverse Reinforcement Learning (IRL).

February 11, 2026
tl;dr: Distributed Processing for AI Agents: Modalities (tabular, graphical, multi-modal)
[slides]

Overview of the hierarchy of decision-making models, moving from simple MDPs to Partially Observable Stochastic Games (POSG). The lecture also explores the necessity of distributed processing for AI agents, covering Data versus Model Parallelism and the differences between Synchronous and Asynchronous execution.

February 18, 2026
tl;dr: Systems: Latency and Bandwidth balancing
[slides]

Exploration of the physical limits of AI agents regarding latency, bandwidth, and throughput. The lecture contrasts Cloud versus Edge architectures and discusses engineering solutions for system constraints, such as model compression, quantization, and activation offloading, particularly in latency-critical scenarios like autonomous braking.

February 23, 2026
tl;dr: Deep Reinforcement Learning (RL) & Multi-Agent Deep Reinforcement Learning (MARL): Training/Finetuning AI Models, Transformers, Q Learning, Deep Q Networks and the Bellman Equation
[slides]

The Transformer architecture and its applications in Deep Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL). The lecture covers training and finetuning AI models, the Bellman Equation, Q Learning, and Deep Q Networks, providing insights into their roles in decision-making processes.

February 25, 2026
tl;dr: Reinforcement Learning (RL) & Multi-Agent Deep Reinforcement Learning (MARL): REINFORCE, Independent Q Learning and Convergence
[slides]

Walkthroughs on DQL, Experience Buffers, and comparison of Q-Learning with REINFORCE.

March 2, 2026
tl;dr: Sample Complexity, Generalizability, IQL, Actor-Critic, MADDPG, and Imitation Learning
[slides]

Discussion on Sample Complexity, Generalizability, Independent Q Learning (IQL), Actor-Critic methods, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), and Imitation Learning methods (ehavioral Cloning, DAgger) in the context of Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL).

March 4, 2026
tl;dr: Decision Making Algorithms for AI Agents like AlphaGo (RL + MCTS), TRPO, PPO, DPO, GRPO, StrateGo, and Intelligent AI Delegation
[slides]

Decision Making Algorithms for AI Agents like AlphaGo (RL + MCTS), TRPO, PPO, DPO, GRPO, StrateGo, and Intelligent AI Delegation

March 9, 2026
tl;dr: Robustness, Consistency and Risk Mitigation in AI Agents & Offline Reinforcement Learning and Inverse Reinforcement Learning
[slides]

Discussion on Robustness, Consistency and Risk Mitigation in AI Agents, Offline Reinforcement Learning, and Inverse Reinforcement Learning (IRL) in the context of Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL).

March 11, 2026
tl;dr: Model Predictive Control (MPC): Real-time constraints, mistake correction and daily control tasks by AI Agents
[slides]

Discussion on Model Predictive Control (MPC): Real-time constraints, mistake correction and daily control tasks by AI Agents in the context of Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL).

March 16, 2026
tl;dr: Game Theory for AI Agents
[slides]

Discussion on Game Theory for AI Agents in the context of Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL).

March 18, 2026
tl;dr: Evolutionary Learning, Mean-Field Learning, Self-play for self-improvement and Continual Learning: Communication paradigms for AI Agents to Adapt in Real-Time Decentralized AI: When and How to Scale AI Agents.
[slides]

Discussion on Evolutionary Learning, Mean-Field Learning, Self-play for self-improvement and Continual Learning: Communication paradigms for AI Agents to Adapt in Real-Time Decentralized AI: When and How to Scale AI Agents in the context of Reinforcement Learning (RL) and Multi-Agent Deep Reinforcement Learning (MARL).

March 23, 2026
tl;dr: Hands-on Session: Implementing Multi-Agent Deep Reinforcement Learning Algorithms
[slides]

Discussion on implementing Multi-Agent Deep Reinforcement Learning algorithms.

March 25, 2026
tl;dr: Agentic Systems Efficiency: Energy, Network bandwidth, memory, chip utilization and power optimizations
[slides]

Discussion on optimizing agentic systems for efficiency in terms of energy, network bandwidth, memory, chip utilization, and power.

March 30, 2026
tl;dr: Multimodal Agentic Evaluation: Post-training, Inference, Qualitative and Quantitative decision-making assessment (Trustworthy and Explainable Agents)
[slides]

Discussion on evaluating multimodal agentic systems post-training, including inference, qualitative and quantitative decision-making assessment, with a focus on trustworthy and explainable agents.

April 13, 2026
tl;dr: AI Agents in 6G / 7G Networks: Opportunities and Challenges
[slides]

Exploration of the role of AI agents in the context of 6G and 7G networks, discussing the opportunities they present as well as the challenges that need to be addressed for successful integration and deployment.

April 15, 2026
tl;dr: AI Agents for Scientific Discovery
[slides]

Exploration of AI agents applied to scientific discovery, detailing generative diffusion models for drug discovery, Bayesian Optimization for molecular search, and the automation of materials discovery through robotic Self-Driving Laboratories.

April 20, 2026
tl;dr: AI Agents for Smart Grid Power Orchestration
[slides]

Discussion on the transition to agentic smart grids using decentralized multi-agent frameworks, focusing on balancing economic dispatch constraints, utilizing the Model Context Protocol (MCP), and ensuring safe reinforcement learning via Digital Twins.

April 22, 2026
tl;dr: Robotic, AR/VR, and Autonomous Transportation Agents
[slides]

Examination of agents operating in the physical and spatial world, covering robotic imitation learning and Vision-Language-Action (VLA) models, generative world models and causal reasoning for autonomous driving, and proactive interventions in AR/VR environments.

April 27, 2026
tl;dr: Software Coding Agents, Agentic RAG, and Multimodal Agents
[slides]

Overview of agents navigating digital spaces, detailing Agentic RAG to reduce hallucinations, the Reflexion framework for software coding, digital embodiment for web interactions, and grounding multimodal models in real-world physical affordances.

April 29, 2026
tl;dr: Supply Chain Logistics, Stock Portfolios, and Physical AI Agents
[slides]

Analysis of applying distributed multi-agent systems to real-world logistics and volatile stock portfolio orchestration, alongside an overview of foundation models for physical robotics and the challenge of closing the sim-to-real gap.

May 4, 2026
tl;dr: Human-Agent and Agent-Agent Coordination and Competition
[slides]

Exploration of the scaling laws and architectures of multi-agent coordination, mixed-motive dilemmas where agents cooperate to compete, and human-agent collaboration through zero-shot adaptation and generative partner modeling.

May 6, 2026
tl;dr: Advanced Agent Competition and Human-Agent Alignment
[slides]

In-depth review of agent evaluation of non-binding deals using expected utility, the GAMMA framework for simulating diverse human behavior, and enhancing human-agent alignment via iterative simulation sandboxes and explicit goal abstractions.

May 11, 2026
tl;dr: Dynamic Memory and Standardized Protocols for Real-World Agents
[slides]

Discussion on overcoming limited context windows using episodic and semantic dynamic memory, featuring the Zettelkasten-style A-Mem architecture, and utilizing Elastic Weight Consolidation to prevent catastrophic forgetting during agentic adaptation.

May 13, 2026
tl;dr: Standardized Protocols and Industrial Governance for Agents
[slides]

Examination of interoperability in multi-agent systems via the Model Context Protocol (MCP) and Agent-to-Agent (A2A) protocol, concluding with the strict industrial governance laws required for deploying trustworthy autonomous operations.