Semgrep Rule Creator

Automate the generation and integration of custom Semgrep rules to enhance static code analysis and security

Semgrep Rule Creator is a community skill for authoring custom Semgrep rules, covering pattern syntax, metavariable constraints, taint tracking, fix suggestions, and rule testing for project-specific static analysis.

What Is This?

Overview

Semgrep Rule Creator provides guidance for writing custom static analysis rules that detect specific code patterns in your projects. It covers pattern syntax that defines code structures to match using Semgrep's query language, metavariable constraints that restrict matches based on captured variable types or values, taint tracking that follows data flow from sources to sinks for vulnerability detection, fix suggestions that provide automated code corrections alongside findings, and rule testing that validates rules against example code snippets. The skill helps teams create tailored analysis rules.

Who Should Use This

This skill serves security engineers building custom vulnerability detectors, platform teams enforcing internal API usage patterns, and developers creating automated checks for project conventions.

Why Use It?

Problems It Solves

Community rule packs may not cover project-specific patterns and internal API conventions. Writing effective rules requires understanding of Semgrep's pattern language and operators. Detecting data flow vulnerabilities needs taint analysis configuration beyond simple pattern matching. Rules without proper testing may produce false positives or miss real issues in production code.

Core Highlights

Pattern builder constructs syntax-aware code matching expressions. Constraint editor restricts metavariable types and values. Taint configurator defines source-to-sink data flow tracking. Test harness validates rules against positive and negative examples.

How to Use It?

Basic Usage

rules:
  - id: unsafe-deserialize
    patterns:
      - pattern: |
          pickle.loads($DATA)
      - pattern-not-inside: |
          if is_trusted(
            $DATA): ...
    message: >
      Deserializing
      untrusted data with
      pickle can lead to
      remote code execution.
    languages: [python]
    severity: ERROR
    fix: |
      json.loads($DATA)

  - id: require-auth
    patterns:
      - pattern: |
          @app.route(...)
          def $FUNC(...):
            ...
      - pattern-not: |
          @app.route(...)
          @login_required
          def $FUNC(...):
            ...
    message: >
      Route handler
      $FUNC lacks
      @login_required.
    languages: [python]
    severity: WARNING

Real-World Examples

import subprocess
import json
import tempfile
from pathlib import Path

class RuleTester:
  def __init__(
    self,
    rule_file: str
  ):
    self.rule = rule_file

  def test_code(
    self,
    code: str,
    lang: str = 'python',
    expect_match:
      bool = True
  ) -> dict:
    suffix = {
      'python': '.py',
      'javascript': '.js',
      'go': '.go'
    }.get(lang, '.py')
    with tempfile\
      .NamedTemporaryFile(
        suffix=suffix,
        mode='w',
        delete=False
    ) as f:
      f.write(code)
      path = f.name
    result = subprocess.run(
      ['semgrep', 'scan',
       '--config',
       self.rule,
       '--json', path],
      capture_output=True,
      text=True)
    data = json.loads(
      result.stdout)
    found = len(
      data.get(
        'results', []))
    Path(path).unlink()
    passed = (
      (found > 0) ==
      expect_match)
    return {
      'passed': passed,
      'matches': found,
      'expected':
        expect_match}

tester = RuleTester(
  'rules.yaml')
r1 = tester.test_code(
  'pickle.loads(data)',
  expect_match=True)
r2 = tester.test_code(
  'json.loads(data)',
  expect_match=False)
print(
  f'Positive: '
  f'{r1["passed"]}')
print(
  f'Negative: '
  f'{r2["passed"]}')

Advanced Tips

Use pattern-inside to restrict matches to specific function or class contexts for targeted scanning. Chain metavariable-comparison operators to enforce numeric constraints on captured values. Write negative test cases alongside positive ones to verify rules do not produce false positives.

When to Use It?

Use Cases

Create a rule that detects unsafe deserialization calls without proper validation guards. Enforce that all API route handlers include authentication decorators. Build a taint tracking rule that follows user input from request parameters to database queries.

Related Topics

Semgrep, static analysis, rule authoring, pattern matching, code quality, security rules, and custom linting.

Important Notes

Requirements

Semgrep CLI installed for running and testing rules locally. Understanding of Semgrep pattern syntax including operators like pattern-not, pattern-inside, and metavariables. Example code snippets representing both positive and negative matches for testing.

Usage Recommendations

Do: test rules against real codebase files to validate accuracy before deploying to CI. Use the fix field to suggest automated corrections that developers can apply directly. Document rule intent in the message field with clear remediation guidance.

Don't: create overly complex pattern combinations when a simpler rule achieves the same detection coverage. Deploy rules without testing against both matching and non-matching code samples. Rely on pattern matching alone for data flow vulnerabilities that require taint analysis.

Limitations

Semgrep patterns match syntax structure and cannot reason about runtime behavior or dynamic values. Taint tracking has limitations with indirect data flow through complex data transformations. Custom rules require maintenance as codebases evolve and API patterns change over time.