The third rung. The on-call mindset, observability for AI systems, reliability and fallback chains, security in AI-built systems, compliance for regulated environments, documentation that survives, and team practices in the AI-native era. Capstone: each student leads a multi-service project with three to five other students through to production deployment with monitoring, runbooks, and an incident playbook.
What changes when you move from "ships features" to "owns the system in production". The on-call mindset, the incident-response mindset, the post-mortem discipline.
Logs, metrics, traces. PostHog, Sentry, OpenTelemetry, LangSmith, Helicone. What to alert on, what to leave as a dashboard. Cost monitoring. Error budgets and SLOs.
Multi-provider fallback chains in depth using Marktrader's cascade as the worked example. Circuit breakers, retries with backoff, idempotency. Graceful degradation when the AI fails.
Prompt injection. Data leakage through model context. Authorisation: the AI is not a permission boundary. Input/output validation. Secrets management, KMS, BYOK patterns. Provider-side compliance posture.
Monolith vs services vs serverless. Edge vs origin. Database choice. Queues, jobs, workers. Caching layers. The build-vs-buy decision tree for AI applications.
GDPR. EU AI Act high-risk classifications. UK FCA expectations. NHS DSPT. CQC framework. WCAG 2.2. Age-verification and consent flows.
READMEs that match the code. Architecture Decision Records. OpenAPI specs. Runbooks the on-call engineer can actually use at three in the morning.
Adding features to codebases you did not write. Using AI to summarise, navigate, refactor. Migration patterns: the strangler fig, parallel build, big-bang cutover.
Code review when everyone has an AI assistant. Pair-programming patterns. Leading a team where every engineer works with AI. Sharing CLAUDE.md/AGENTS.md context across a team.
Each student designs an architecture, leads a team of three to five other students implementing it, and ships the result to production with monitoring, runbooks, and an incident playbook.