Semgrep

Streamline static analysis by automating Semgrep scans and integrating security rules into CI/CD pipelines

Semgrep is a community skill for static analysis and code pattern matching using the Semgrep tool, covering rule-based scanning, security auditing, code quality enforcement, custom pattern creation, and CI integration for automated code review.

What Is This?

Overview

Semgrep provides tools for scanning source code with pattern-based rules that detect bugs, security vulnerabilities, and style violations. It covers rule-based scanning that matches code patterns using a syntax-aware query language, security auditing that detects common vulnerability patterns from curated rule sets, code quality enforcement that validates coding standards across projects, custom pattern creation that defines project-specific rules for internal conventions, and CI integration that runs automated scans on pull requests and commits. The skill helps teams enforce code standards consistently.

Who Should Use This

This skill serves security engineers implementing automated code scanning, developers enforcing coding standards in team projects, and DevOps teams integrating static analysis into build pipelines.

Why Use It?

Problems It Solves

Manual code review cannot consistently catch all instances of known vulnerability patterns across large codebases. Traditional linters lack the ability to match complex multi-line code structures. Enforcing internal coding conventions requires custom tooling that is expensive to build from scratch. Security scanning tools often produce excessive false positives that slow development.

Core Highlights

Pattern matcher finds code structures using syntax-aware queries. Security scanner detects vulnerabilities from curated rule packs. Style enforcer validates coding standards across repositories. CI runner automates scanning on pull requests and merges.

How to Use It?

Basic Usage

rules:
  - id: no-eval
    patterns:
      - pattern: eval(...)
    message: >
      Avoid eval() as it
      can execute arbitrary
      code. Use
      ast.literal_eval()
      for safe parsing.
    languages: [python]
    severity: ERROR

  - id: sql-injection
    patterns:
      - pattern: |
          cursor.execute(
            f"...")
    message: >
      Use parameterized
      queries instead of
      f-strings to prevent
      SQL injection.
    languages: [python]
    severity: ERROR

  - id: no-print-debug
    patterns:
      - pattern: |
          print(
            "debug", ...)
    message: >
      Remove debug print
      statements before
      merging.
    languages: [python]
    severity: WARNING

Real-World Examples

import subprocess
import json

class SemgrepRunner:
  def __init__(
    self,
    config: str = 'auto'
  ):
    self.config = config

  def scan(
    self,
    target: str
  ) -> dict:
    result = subprocess.run(
      ['semgrep', 'scan',
       '--config',
       self.config,
       '--json', target],
      capture_output=True,
      text=True)
    return json.loads(
      result.stdout)

  def summarize(
    self, results: dict
  ) -> dict:
    findings = results.get(
      'results', [])
    by_severity = {}
    for f in findings:
      sev = f['extra'][
        'severity']
      by_severity.setdefault(
        sev, []).append({
          'rule':
            f['check_id'],
          'file':
            f['path'],
          'line':
            f['start'][
              'line']})
    return {
      'total':
        len(findings),
      'by_severity':
        by_severity}

runner = SemgrepRunner(
  'p/python')
output = runner.scan('src')
report = runner.summarize(
  output)
print(
  f'Found '
  f'{report["total"]} '
  f'issues')
for sev, items in (
  report[
    'by_severity'
  ].items()
):
  print(
    f'  {sev}: '
    f'{len(items)}')

Advanced Tips

Use metavariables in patterns to capture and constrain specific code elements across matched expressions. Combine pattern-inside and pattern-not operators to narrow matches to specific code contexts. Organize rules into rulesets by category for modular CI configuration.

When to Use It?

Use Cases

Scan a Python project for SQL injection vulnerabilities using curated security rules. Enforce a team convention that prohibits certain function calls in production code paths. Run automated security scans on every pull request in a GitHub Actions workflow.

Related Topics

Static analysis, SAST, code quality, linting, security scanning, pattern matching, and CI/CD automation.

Important Notes

Requirements

Semgrep CLI installed via pip or package manager. Rule configuration files in YAML format or references to community rule packs. Supported languages include Python, JavaScript, Go, Java, and many others.

Usage Recommendations

Do: start with community rule packs for common languages before writing custom rules. Review findings and tune rules to reduce false positives in your specific codebase. Integrate scanning into CI so every change is checked automatically.

Don't: write overly broad patterns that match legitimate code and generate excessive false positives. Ignore severity levels since errors should block merges while warnings can be advisory. Skip testing custom rules against known positive and negative examples.

Limitations

Pattern matching operates on syntax structure and cannot track values through complex data flow across function boundaries. Rule writing requires understanding of the Semgrep pattern syntax which has a learning curve. Performance may degrade on very large monorepos with thousands of files scanned simultaneously.