Phase 011957 – 1969
The Perceptron & Symbolic Dawn
Frank Rosenblatt's single-layer neural network proves machines can learn from examples — and then Minsky & Papert prove its limits.
The Perceptron was the first artificial neural network implemented in hardware. With just one layer of weighted connections, it could learn to classify simple patterns via the perceptron convergence procedure. Rosenblatt's public demonstrations ignited the first AI boom, but in 1969 Minsky and Papert's book Perceptrons proved the single-layer model could not solve XOR or any non-linearly-separable problem. The funding collapse that followed became known as the first AI winter.
Milestones
- 1943 — McCulloch & Pitts: threshold neuron model
- 1957 — Rosenblatt: Mark I Perceptron
- 1969 — Minsky & Papert: Perceptrons (book)
- 1970s — First AI winter
Phase 021970 – 1989
Expert Systems & Symbolic AI
Hand-written IF-THEN rule bases encode domain expertise. MYCIN, DENDRAL, and XCON prove commercial viability — then hit the knowledge-acquisition wall.
Expert systems replaced statistical learning with explicit symbolic rules. MYCIN (medical diagnosis), DENDRAL (chemistry), and XCON (DEC computer configuration) were early successes. But maintaining hand-curated rule bases scaled poorly: every new domain required months of knowledge engineer interviews, and rule conflicts grew combinatorially. The rule-based paradigm never generalized. The late-1980s crash of Lisp machines triggered the second AI winter.
Milestones
- 1972 — MYCIN medical expert system
- 1980 — XCON at Digital Equipment Corp
- 1986 — Rumelhart backpropagation paper
- 1987 — Lisp machine market collapse
Phase 031990 – 2011
Statistical ML & the Rise of Data
Support vector machines, random forests, and Bayesian methods quietly take over. The internet ships training data. Netflix Prize crowdsources ML.
The 1990s and 2000s were the golden age of statistical learning. Support vector machines (Cortes & Vapnik, 1995), random forests (Breiman, 2001), and gradient boosting reshaped pattern recognition. PageRank redefined web search (1998). Meanwhile, the web generated unprecedented volumes of labeled data — ImageNet (Deng, 2009) would become the catalyst. AI was rebranded as machine learning and embedded quietly into search, spam filters, and recommendation engines.
Milestones
- 1995 — Support Vector Machines
- 1997 — Deep Blue defeats Kasparov
- 1998 — PageRank / Google
- 2009 — ImageNet dataset released
Phase 042012 – 2016
The Deep Learning Revolution
AlexNet halves the ImageNet error rate overnight. GPUs + big data + backprop converge. Convolutional and recurrent nets eat computer vision, speech, and translation.
AlexNet (Krizhevsky, Sutskever, Hinton, 2012) won the ImageNet competition by a decisive margin using GPU-accelerated convolutional networks. Within two years deep learning had displaced hand-engineered features across computer vision. Recurrent nets and LSTMs powered speech recognition and machine translation. DeepMind's AlphaGo (2016) defeated Lee Sedol, demonstrating deep reinforcement learning at superhuman scale. The NVIDIA GPU became the central compute primitive of modern AI.
Milestones
- 2012 — AlexNet wins ImageNet
- 2014 — Generative Adversarial Networks
- 2015 — ResNet-152: skip connections
- 2016 — AlphaGo beats Lee Sedol
Phase 052017 – 2022
Transformers & Foundation Models
"Attention Is All You Need" replaces recurrence with self-attention. BERT, GPT, CLIP, and Diffusion models collapse the boundary between modalities.
The 2017 Transformer paper introduced self-attention as a universal sequence-modeling primitive. BERT (Google, 2018) and GPT-3 (OpenAI, 2020) demonstrated that scaling parameter count and training data produced emergent capabilities: translation, summarization, and eventually few-shot reasoning. CLIP (2021) and diffusion models (Stable Diffusion, 2022) extended the Transformer paradigm to vision and generation. ChatGPT's November 2022 release made foundation models a mass-market phenomenon.
Milestones
- 2017 — Attention Is All You Need
- 2020 — GPT-3: 175B parameters
- 2022 — Stable Diffusion / DALL·E 2
- 2022 — ChatGPT public launch
Phase 062023 – Horizon
Agentic AI & the Trillion-Parameter Horizon
Multi-modal models, tool-use, long context, and autonomous agents. Frontier systems cross 1T parameters. Orchestration, evaluation, and alignment become the new bottlenecks.
The current era is defined by agents that plan, use tools, and execute multi-step workflows. Claude, GPT-4o, and Gemini 3 exceed human performance on many expert-level benchmarks. Model context windows have grown from 4K → 1M+ tokens. Frontier labs are scaling toward trillion-parameter mixture-of-experts architectures while simultaneously researching constitutional AI, RLHF, and scalable oversight. The open question of this decade: how do we align and evaluate systems that exceed human expertise in narrow domains?
Milestones
- 2023 — GPT-4 & multi-modal frontier
- 2024 — 1M-token context windows
- 2025 — Claude Opus 4 & agentic workflows
- 2026 — 1T-parameter frontier models