Git Cleanup

Automate the removal of stale branches and optimize local repository storage for cleaner Git history

Git Cleanup is a community skill for maintaining clean git repositories by removing stale branches, pruning unreachable objects, managing large files, optimizing repository size, and enforcing branch hygiene policies for development teams.

What Is This?

Overview

Git Cleanup provides tools for reducing repository clutter and optimizing storage usage. It covers stale branch removal that identifies and deletes local and remote branches that have been merged or abandoned, object pruning that removes unreachable git objects left from rebases, amended commits, and force pushes, large file management that identifies oversized files in history and migrates them to Git LFS or removes them with history rewriting, repository size optimization that runs garbage collection and repacking to reduce disk usage, and branch hygiene policies that enforce naming conventions and automatic cleanup of feature branches after merge. The skill helps teams maintain responsive repositories that are fast to clone and operate.

Who Should Use This

This skill serves development teams with repositories that have accumulated hundreds of stale branches, DevOps engineers managing repository infrastructure and storage, and individual developers cleaning up personal repositories before sharing or archiving.

Why Use It?

Problems It Solves

Repositories accumulate merged branches making branch listings unwieldy. Large binary files committed accidentally inflate repository size permanently since git preserves all history. Unreachable objects from rebases and amended commits consume disk space without purpose. Clone times increase as repository size grows from accumulated history.

Core Highlights

Branch cleaner identifies merged and stale branches on both local and remote repositories. Object pruner removes unreachable objects after verifying no references point to them. File scanner finds large files in git history sorted by size impact. Repository optimizer runs gc and repack with aggressive settings to minimize storage footprint.

How to Use It?

Basic Usage

import subprocess

class BranchCleaner:
  def merged_branches(
    self,
    base: str = 'main'
  ) -> list[str]:
    result = subprocess\
      .run([
        'git', 'branch',
        '--merged', base],
      capture_output=True,
      text=True)
    branches = []
    for line in result\
        .stdout.strip()\
          .split('\n'):
      name = line.strip()
      if name and not (
          name.startswith(
            '*')) and (
          name != base):
        branches.append(
          name)
    return branches

  def delete_local(
    self,
    branches:
      list[str]
  ) -> list[str]:
    deleted = []
    for b in branches:
      result = subprocess\
        .run([
          'git', 'branch',
          '-d', b],
        capture_output=True,
        text=True)
      if result\
          .returncode == 0:
        deleted.append(b)
    return deleted

Real-World Examples

class RepoAnalyzer:
  def large_files(
    self,
    top_n: int = 20
  ) -> list[dict]:
    result = subprocess\
      .run([
        'git', 'rev-list',
        '--objects',
        '--all'],
      capture_output=True,
      text=True)
    objects = []
    for line in result\
        .stdout.strip()\
          .split('\n'):
      parts = (
        line.split(None, 1))
      if len(parts) == 2:
        objects.append({
          'hash':
            parts[0],
          'path':
            parts[1]})
    sized = []
    for obj in objects:
      size_result = (
        subprocess.run([
          'git',
          'cat-file',
          '-s',
          obj['hash']],
        capture_output=True,
        text=True))
      if size_result\
          .returncode == 0:
        sized.append({
          'path':
            obj['path'],
          'size': int(
            size_result
              .stdout
              .strip())})
    sized.sort(
      key=lambda x:
        x['size'],
      reverse=True)
    return sized[:top_n]

  def repo_stats(
    self
  ) -> dict:
    count = subprocess\
      .run([
        'git',
        'count-objects',
        '-v'],
      capture_output=True,
      text=True)
    stats = {}
    for line in count\
        .stdout.strip()\
          .split('\n'):
      key, val = (
        line.split(': '))
      stats[key] = val
    return stats

Advanced Tips

Use git filter-repo instead of git filter-branch for history rewriting as it is faster and produces cleaner results. Schedule automated cleanup in CI pipelines to prune merged branches and run garbage collection regularly. Before deleting remote branches, verify they are not referenced by open pull requests or deployments.

When to Use It?

Use Cases

Remove all local branches that have been merged into the main branch after a release cycle. Identify and remove large binary files that were accidentally committed to repository history. Run repository optimization to reduce clone times for a frequently forked open source project.

Related Topics

Git maintenance, repository management, branch hygiene, Git LFS, garbage collection, history rewriting, and CI automation.

Important Notes

Requirements

Git version 2.30 or newer for modern maintenance commands. Git filter-repo for safe history rewriting operations. Write access to remote repositories for remote branch cleanup.

Usage Recommendations

Do: create a backup or mirror clone before performing history rewriting operations that cannot be undone. Use dry-run flags when available to preview cleanup actions before execution. Coordinate with team members before deleting remote branches or rewriting shared history.

Don't: force delete branches that have not been merged without confirming the work is saved elsewhere. Run aggressive garbage collection on shared servers during peak usage hours. Rewrite history on branches that other developers have based work on without coordination.

Limitations

History rewriting requires all collaborators to re-clone or carefully rebase their local copies. Garbage collection may not immediately reclaim space if objects are still referenced by reflogs which have a default expiry of ninety days. Large file removal from history is irreversible once rewritten history is force pushed to the remote.