Weights And Biases

Automate and integrate Weights and Biases for ML experiment tracking

Source: Orchestra-Research/AI-Research-SKILLs

Weights And Biases is a community skill for using the W&B MLOps platform, covering experiment tracking, hyperparameter sweeps, model versioning, dataset management, and collaborative machine learning workflow organization.

What Is This?

Overview

Weights And Biases provides guidance on using the W&B platform for machine learning experiment management and collaboration. It covers experiment tracking that logs metrics, hyperparameters, system resource usage, and model outputs with automatic versioning for complete reproducibility of training runs, hyperparameter sweeps that define search spaces and optimization strategies for systematically finding optimal model configurations using Bayesian and grid search methods, model versioning that registers trained model artifacts with metadata, lineage tracking, and stage management for promoting models through development to production, dataset management that versions training datasets with automatic deduplication, integrity verification, and lineage linking to the experiments that consumed them, and collaborative dashboards that create shared workspaces with custom visualizations, comparison tables, and reports for team-wide experiment analysis. The skill helps ML engineers organize their complete training workflow.

Who Should Use This

This skill serves machine learning engineers tracking training experiments, teams managing model development pipelines from training to deployment, and researchers comparing experimental results across collaborative projects.

Why Use It?

Problems It Solves

Tracking experiment results in spreadsheets or log files becomes unmanageable as the number of training runs grows. Hyperparameter tuning without systematic search wastes compute resources on redundant or poorly chosen configurations. Reproducing previous training runs is difficult without complete records of code versions, data versions, and configuration parameters. Comparing model performance across team members requires a shared platform with standardized metric logging.

Core Highlights

Experiment tracker logs metrics and parameters with automatic run versioning. Sweep engine orchestrates systematic hyperparameter searches with Bayesian optimization. Model registry versions artifacts with lineage and stage management. Team dashboard creates shared visualizations for collaborative experiment analysis.

How to Use It?

Basic Usage

import wandb

wandb.init(
    project='my-project',
    config={
        'learning_rate': 1e-3,
        'batch_size': 32,
        'epochs': 10,
        'architecture':
            'resnet50'
    })

for epoch in range(10):
    train_loss = train(
        model, loader)
    val_loss, val_acc =
        evaluate(
            model, val_loader)

    wandb.log({
        'train/loss':
            train_loss,
        'val/loss': val_loss,
        'val/accuracy':
            val_acc,
        'epoch': epoch
    })

wandb.finish()

Real-World Examples

import wandb

sweep_config = {
    'method': 'bayes',
    'metric': {
        'name':
            'val/accuracy',
        'goal': 'maximize'
    },
    'parameters': {
        'learning_rate': {
            'min': 1e-5,
            'max': 1e-2
        },
        'batch_size': {
            'values':
                [16, 32, 64]
        },
        'optimizer': {
            'values':
                ['adam',
                 'sgd',
                 'adamw']
        }
    }
}

sweep_id =
    wandb.sweep(
        sweep_config,
        project=
            'my-project')

def train_sweep():
    wandb.init()
    cfg = wandb.config
    model = build_model(
        cfg)
    for epoch in range(
        10):
        loss = train_epoch(
            model, cfg)
        acc = evaluate(
            model)
        wandb.log({
            'val/accuracy':
                acc})

wandb.agent(
    sweep_id,
    train_sweep,
    count=50)

Advanced Tips

Use W&B Artifacts to version datasets alongside model checkpoints for complete experiment lineage tracking. Create custom W&B Reports to share experiment findings with team members using interactive charts and markdown annotations. Integrate W&B callbacks with training frameworks like PyTorch Lightning or Hugging Face Transformers for automatic metric logging.

When to Use It?

Use Cases

Track hundreds of training experiments across a research team with automatic metric comparison dashboards. Run Bayesian hyperparameter sweeps to find optimal configurations while minimizing wasted compute resources. Version and register trained models with full lineage from dataset to production deployment.

Important Notes

Requirements

Python with the wandb package installed for logging experiments and managing artifacts on the W&B platform. A W&B account with API key authentication for uploading run data to the cloud-hosted or self-hosted dashboard. Compatible training framework such as PyTorch, TensorFlow, or JAX for integrating metric logging into existing training loops.

Usage Recommendations

Do: log all relevant hyperparameters at run initialization so experiments are fully reproducible from their configuration records. Use wandb.watch to automatically log model gradients and parameter distributions during training for debugging. Organize experiments into projects with consistent naming conventions for easier filtering and comparison.

Don't: log excessively large artifacts like full datasets on every run since storage costs accumulate and uploads slow training. Forget to call wandb.finish at the end of training since incomplete runs create orphaned data in the dashboard. Skip tagging runs with meaningful notes since unlabeled experiments become impossible to distinguish weeks later.

Limitations

Cloud-hosted W&B requires network connectivity for uploading experiment data which may not be available in restricted training environments. Large-scale sweeps with many concurrent agents can exceed API rate limits requiring careful scheduling of parallel training runs. The free tier has storage and collaboration limits that may require upgrading for teams with extensive experiment history and large artifact volumes.

More Skills You Might Like

Explore similar skills to enhance your workflow