Andrej Karpathy: Methodologies, Key Decisions & Mental Models

Andrej Karpathy

Educator and engineer who distills AI's essence through the simplest possible code

Andrej Karpathy is one of the most influential AI researchers and educators of his generation. Trained under Fei-Fei Li at Stanford, he co-founded OpenAI, then led Tesla's Autopilot perception team as Director of AI, driving the neural-network-first architecture of FSD. He rejoined OpenAI in 2022 and departed in 2023 to focus on AI education. His minimalist projects — nanoGPT, micrograd — and YouTube lecture series reveal the core of deep learning through the least possible code, influencing millions of learners worldwide. His Software 2.0 thesis — that neural networks will replace hand-written software — has become a foundational paradigm in AI engineering.

Methodologies

Learn by Implementing from Scratch - Without calling libraries, starting from the most basic mathematical operations, implement every line of code yourself until the system runs.
Software 2.0 Migration Assessment Framework - Systematically assess which software modules are suitable for neural network replacement, prioritizing modules with abundant data and complex rules.

Key decisions and timeline

2011-09 Entered Stanford PhD Program under Fei-Fei Li - Choosing an exploding field and a top mentor is the optimal path to building research impact; technical intuition matters more than following the mainstream.
2015-01 Led Stanford CS231n, Creating the Benchmark for Deep Learning Education - Openly sharing knowledge does not diminish competitive advantage but builds long-term influence and reputation; teaching is the best way to learn.
2015-12 Co-Founded OpenAI - Joining the right organization at a critical technological paradigm shift can determine long-term impact more than individual technical ability.

Beliefs and mental models

Belief 1 - The core thesis of Software 2.0: traditional software has humans write explicit rules, while in Software 2.0 neural networks learn rules automatically from data. Large swaths of software will be represented as weight files rather than source code.
Belief 2 - Real understanding of deep learning comes from hands-on implementation, not abstract formula derivation. The best teaching path is to show students a working neural network first, then explain the mathematics behind it; code is the best teaching medium.
Belief 3 - nanoGPT implements full GPT-2 training in under 300 lines; micrograd implements a backpropagation engine in about 150 lines. Minimal code is not a compromise but a precise grasp of essence. Complexity is the enemy of engineering; problems solvable simply should never introduce complexity.
Model 1
Model 2
Model 3