Karpathy Guidelines

Implement Andrej Karpathy's neural network training guidelines through automated pipeline integration

Karpathy Guidelines is a community skill for implementing machine learning engineering best practices inspired by Andrej Karpathy, covering systematic debugging, training pipeline verification, and practical approaches to neural network development.

What Is This?

Overview

Karpathy Guidelines provides structured approaches to neural network training and debugging based on practical ML engineering principles. It covers the progressive development methodology of starting simple and adding complexity incrementally, systematic overfitting verification, learning rate finding, loss curve analysis, and common training failure diagnostics. The skill translates experienced ML intuition into repeatable checklists that reduce debugging time.

Who Should Use This

This skill serves ML engineers debugging training pipelines that produce unexpected results, researchers setting up new training experiments who want to avoid common pitfalls, and teams establishing ML development standards that ensure consistent quality across their training workflows.

Why Use It?

Problems It Solves

Training failures are difficult to diagnose because many components interact: data preprocessing, model architecture, loss functions, optimizers, and hyperparameters. Developers jump to complex architectures before verifying that basic setups work correctly. Training runs that appear to progress but produce poor final results waste compute hours. Without systematic verification steps, subtle bugs in data loading or label processing go undetected until evaluation.

Core Highlights

Progressive complexity builds from the simplest possible baseline before adding layers, regularization, and architectural features. Overfit-first methodology verifies that the model can memorize a small batch before training on the full dataset. Loss curve analysis identifies common failure patterns like underfitting, overfitting, and learning rate issues. Data verification checks confirm that labels are correct, preprocessing is faithful, and batches are properly constructed.

How to Use It?

Basic Usage

import torch
import torch.nn as nn

class TrainingVerifier:
    def __init__(self, model: nn.Module, loss_fn, optimizer):
        self.model = model
        self.loss_fn = loss_fn
        self.optimizer = optimizer

    def verify_overfit_single_batch(self, batch_x, batch_y,
                                     max_steps: int = 100) -> dict:
        self.model.train()
        losses = []
        for step in range(max_steps):
            self.optimizer.zero_grad()
            output = self.model(batch_x)
            loss = self.loss_fn(output, batch_y)
            loss.backward()
            self.optimizer.step()
            losses.append(loss.item())
        return {
            "initial_loss": losses[0],
            "final_loss": losses[-1],
            "converged": losses[-1] < losses[0] * 0.01,
            "loss_history": losses
        }

    def check_gradients(self, batch_x, batch_y) -> dict:
        self.model.train()
        output = self.model(batch_x)
        loss = self.loss_fn(output, batch_y)
        loss.backward()
        grad_stats = {}
        for name, param in self.model.named_parameters():
            if param.grad is not None:
                grad_stats[name] = {
                    "mean": param.grad.mean().item(),
                    "std": param.grad.std().item(),
                    "zero_pct": (param.grad == 0).float().mean().item()
                }
        return grad_stats

Real-World Examples

class TrainingChecklist:
    def __init__(self):
        self.checks: list[dict] = []

    def verify_data(self, dataset, num_samples: int = 5) -> dict:
        samples = [dataset[i] for i in range(num_samples)]
        shapes = [s[0].shape for s in samples]
        labels = [s[1] for s in samples]
        result = {
            "consistent_shapes": len(set(str(s) for s in shapes)) == 1,
            "sample_shapes": shapes,
            "label_samples": labels
        }
        self.checks.append({"type": "data", "result": result})
        return result

    def verify_initial_loss(self, model, loss_fn, batch_x, batch_y,
                            num_classes: int) -> dict:
        model.eval()
        with torch.no_grad():
            output = model(batch_x)
            loss = loss_fn(output, batch_y).item()
        expected = -torch.log(torch.tensor(1.0 / num_classes)).item()
        result = {
            "actual_loss": round(loss, 4),
            "expected_loss": round(expected, 4),
            "reasonable": abs(loss - expected) < expected * 0.5
        }
        self.checks.append({"type": "initial_loss", "result": result})
        return result

Advanced Tips

Always verify that the initial loss matches the expected value for random predictions before starting training. Visualize a few training samples after all preprocessing to confirm that augmentations and normalization preserve label correctness. Use gradient norm tracking to detect vanishing or exploding gradients early in training.

When to Use It?

Use Cases

Debug a training pipeline that produces unexpectedly high loss or fails to converge. Verify a new model architecture works correctly before launching expensive full training runs. Establish training verification checklists for team ML development standards.

Related Topics

Neural network debugging techniques, training loop diagnostics, learning rate scheduling, gradient flow analysis, and ML experiment reproducibility.

Important Notes

Requirements

A PyTorch or equivalent deep learning framework for model training. A small batch of training data for verification steps. Understanding of expected loss values for the chosen loss function and number of classes.

Usage Recommendations

Do: run all verification checks before starting expensive training runs. Start with the simplest possible model and verify it works before adding complexity. Monitor loss curves throughout training to catch issues early.

Don't: skip the single-batch overfit test that catches fundamental pipeline bugs. Add regularization before confirming the model can overfit the training data. Trust that preprocessing is correct without visually inspecting processed samples.

Limitations

Verification checklists catch common bugs but cannot prevent all training issues. Some training problems only manifest at scale with the full dataset and many epochs. The guidelines focus on supervised learning patterns and may not apply directly to reinforcement learning or generative model training.