Your agent learns.
You get certified.
Train real agents in live environments. Watch reward curves converge. Earn certifications that prove you understand policy gradients beyond the textbook.
Episode Reward
+324.6
MuJoCo Humanoid v4 — mid-stride

Hyperparameters
Numbers from the last training run
Not projected. Not estimated. Pulled from the dashboard 27 minutes ago.
Agents trained this month
across 47 environments
Certification pass rate
on first attempt
Faster curriculum completion
vs traditional self-study
Pick your environment.
Start the run.

CartPole — Policy Gradient Foundations
Start here. Balance a pole on a cart. Sounds trivial — the reward curve will humble you before it teaches you.

LunarLander — Proximal Policy Optimization
Land a spacecraft using continuous thrust control. PPO's clipped objective will click when you watch the agent overcorrect, then not.

MuJoCo Humanoid — Sim-to-Real Transfer
Train a 17-DOF humanoid to walk. Then transfer the policy to hardware. This is the capstone that gets you hired.

Quant Trading Agent — Market Microstructure
Build an agent that reads Level 2 order books and executes trades. The reward function is a modified Sharpe ratio.

Multi-Agent Coordination — Cooperative RL
Five agents. One shared reward. Watch emergent communication protocols develop by episode 3,000.
Free tier unlocks CartPole, LunarLander, and the sandbox dashboard immediately
What convergence gets you
Certification outcomes, salary data, and capstone projects from engineers who shipped.
+$45k median increase
After Epoch certification. Based on 847 alumni reports, 2025.
n=847 · self-reported · US market · 2025
38 companies hiring
Epoch-certified engineers get fast-tracked at these firms.






Certifications issued
Q1 2026
Avg completion: 6.4 weeks
Capstone projects
Real projects. Real reward curves. Real hires.

Bipedal Walker with Curriculum Learning

Priya Krishnamurthy
ML Engineer → Robotics at Waymo

Market-Making Agent with Risk Constraints

Marcus Webb
Quant Dev → RL Researcher at Two Sigma

Sim-to-Real Locomotion Transfer

Yuki Tanaka
Robotics Researcher → Lead at Boston Dynamics
Full Curriculum Map
9 weeks · 3 tracks · 15 lab environments
The reward curve won't wait.
Neither should you.
Free tier. Three environments. Sandbox dashboard. No credit card. Your first agent could be converging in 30 minutes.