Jupyter Notebook
Automate and integrate Jupyter Notebook workflows and data processes
Jupyter Notebook is a community skill for creating, managing, and automating Jupyter notebooks programmatically, covering cell execution, kernel management, output extraction, integration into data science workflows, CI pipelines, and automated reporting systems.
What Is This?
Overview
Jupyter Notebook provides patterns for working with Jupyter notebooks beyond the interactive browser interface. It covers programmatic notebook creation and modification, headless execution through nbconvert and papermill, output extraction for reporting, and kernel lifecycle management. The skill enables teams to treat notebooks as executable documents that integrate into automated workflows rather than existing only as interactive tools.
Who Should Use This
This skill serves data scientists who need to automate notebook execution for scheduled reports, ML engineers integrating notebooks into training pipelines, and platform teams building notebook execution services that run user-submitted notebooks in managed environments.
Why Use It?
Problems It Solves
Running notebooks manually through the browser does not scale for recurring analysis tasks. Parameterizing notebooks for different datasets or configurations requires editing cells by hand each time. Extracting results from executed notebooks into downstream systems involves manual copy-paste that introduces errors. Notebook execution in CI environments requires headless operation that the standard Jupyter interface does not provide.
Core Highlights
Papermill integration parameterizes and executes notebooks with injected variables for batch processing across datasets. Headless execution via nbconvert runs notebooks without a browser and captures all cell outputs including figures. Programmatic notebook construction creates notebooks from code using the nbformat library. Output extraction pulls specific cell results, dataframes, and images from executed notebooks for reporting.
How to Use It?
Basic Usage
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from pathlib import Path
def execute_notebook(input_path: str, output_path: str,
timeout: int = 600) -> nbformat.NotebookNode:
nb = nbformat.read(input_path, as_version=4)
ep = ExecutePreprocessor(timeout=timeout, kernel_name="python3")
ep.preprocess(nb, {"metadata": {"path": str(Path(input_path).parent)}})
with open(output_path, "w") as f:
nbformat.write(nb, f)
return nb
def extract_outputs(nb: nbformat.NotebookNode) -> list[dict]:
results = []
for i, cell in enumerate(nb.cells):
if cell.cell_type == "code" and cell.outputs:
for output in cell.outputs:
results.append({
"cell": i,
"type": output.output_type,
"data": output.get("text", output.get("data", {}))
})
return resultsReal-World Examples
import papermill as pm
from datetime import date
class ReportRunner:
def __init__(self, template_path: str, output_dir: str):
self.template = template_path
self.output_dir = Path(output_dir)
self.output_dir.mkdir(exist_ok=True)
def run_daily(self, dataset: str) -> Path:
today = date.today().isoformat()
output = self.output_dir / f"report_{today}.ipynb"
pm.execute_notebook(
self.template, str(output),
parameters={"dataset_path": dataset, "report_date": today}
)
return output
def run_batch(self, datasets: list[str]) -> list[Path]:
results = []
for ds in datasets:
path = self.run_daily(ds)
results.append(path)
return results
runner = ReportRunner("templates/analysis.ipynb", "reports/")
outputs = runner.run_batch(["data/q1.csv", "data/q2.csv"])
for out in outputs:
print(f"Generated: {out}")Advanced Tips
Tag cells with metadata to control which cells execute during parameterized runs. Use nbconvert exporters to convert executed notebooks to HTML or PDF for non-technical stakeholders. Implement timeout handling per cell rather than per notebook to prevent a single slow cell from failing the entire execution.
When to Use It?
Use Cases
Automate weekly data analysis reports that run the same notebook with updated datasets. Build ML experiment tracking by executing training notebooks with different hyperparameter sets through papermill. Create documentation sites from notebooks that are executed and converted to HTML during the build process.
Related Topics
Papermill parameterized execution, nbconvert output formats, JupyterHub multi-user deployments, notebook version control strategies, and data pipeline orchestration.
Important Notes
Requirements
Python with jupyter, nbformat, and nbconvert packages installed. Papermill is needed for parameterized execution. A Jupyter kernel matching the notebook language must be available in the execution environment.
Usage Recommendations
Do: pin all dependency versions in notebooks to ensure reproducible execution across environments. Use papermill parameters instead of editing notebook cells directly for configuration changes. Store executed notebooks alongside their outputs for audit trails.
Don't: execute untrusted notebooks without sandboxing, as code cells run with full system access. Ignore cell execution order dependencies that may cause failures in headless mode. Skip output validation after automated execution to catch silent errors.
Limitations
Notebooks with interactive widgets do not function in headless execution mode. Large output cells such as full dataframe displays increase notebook file size significantly. Kernel startup time adds overhead to batch execution when processing many short notebooks. Cell execution order assumptions may break when notebooks are modified outside the standard sequential flow.
More Skills You Might Like
Explore similar skills to enhance your workflow
Landbot Automation
Automate Landbot operations through Composio's Landbot toolkit via Rube
Tech Debt Tracker
Track, manage, and resolve technical debt with automated monitoring and integration
Senior Qa
Senior QA automation and integration for expert-level quality assurance testing
Og Image Design
OG Image Design automation and integration for social media preview image creation
Affinity Automation
Automate Affinity operations through Composio's Affinity toolkit via
Labs64 Netlicensing Automation
Automate Labs64 Netlicensing tasks via Rube MCP (Composio)