Semgrep Rule Creator
Automate the generation and integration of custom Semgrep rules to enhance static code analysis and security
Semgrep Rule Creator is a community skill for authoring custom Semgrep rules, covering pattern syntax, metavariable constraints, taint tracking, fix suggestions, and rule testing for project-specific static analysis.
What Is This?
Overview
Semgrep Rule Creator provides guidance for writing custom static analysis rules that detect specific code patterns in your projects. It covers pattern syntax that defines code structures to match using Semgrep's query language, metavariable constraints that restrict matches based on captured variable types or values, taint tracking that follows data flow from sources to sinks for vulnerability detection, fix suggestions that provide automated code corrections alongside findings, and rule testing that validates rules against example code snippets. The skill helps teams create tailored analysis rules.
Who Should Use This
This skill serves security engineers building custom vulnerability detectors, platform teams enforcing internal API usage patterns, and developers creating automated checks for project conventions.
Why Use It?
Problems It Solves
Community rule packs may not cover project-specific patterns and internal API conventions. Writing effective rules requires understanding of Semgrep's pattern language and operators. Detecting data flow vulnerabilities needs taint analysis configuration beyond simple pattern matching. Rules without proper testing may produce false positives or miss real issues in production code.
Core Highlights
Pattern builder constructs syntax-aware code matching expressions. Constraint editor restricts metavariable types and values. Taint configurator defines source-to-sink data flow tracking. Test harness validates rules against positive and negative examples.
How to Use It?
Basic Usage
rules:
- id: unsafe-deserialize
patterns:
- pattern: |
pickle.loads($DATA)
- pattern-not-inside: |
if is_trusted(
$DATA): ...
message: >
Deserializing
untrusted data with
pickle can lead to
remote code execution.
languages: [python]
severity: ERROR
fix: |
json.loads($DATA)
- id: require-auth
patterns:
- pattern: |
@app.route(...)
def $FUNC(...):
...
- pattern-not: |
@app.route(...)
@login_required
def $FUNC(...):
...
message: >
Route handler
$FUNC lacks
@login_required.
languages: [python]
severity: WARNINGReal-World Examples
import subprocess
import json
import tempfile
from pathlib import Path
class RuleTester:
def __init__(
self,
rule_file: str
):
self.rule = rule_file
def test_code(
self,
code: str,
lang: str = 'python',
expect_match:
bool = True
) -> dict:
suffix = {
'python': '.py',
'javascript': '.js',
'go': '.go'
}.get(lang, '.py')
with tempfile\
.NamedTemporaryFile(
suffix=suffix,
mode='w',
delete=False
) as f:
f.write(code)
path = f.name
result = subprocess.run(
['semgrep', 'scan',
'--config',
self.rule,
'--json', path],
capture_output=True,
text=True)
data = json.loads(
result.stdout)
found = len(
data.get(
'results', []))
Path(path).unlink()
passed = (
(found > 0) ==
expect_match)
return {
'passed': passed,
'matches': found,
'expected':
expect_match}
tester = RuleTester(
'rules.yaml')
r1 = tester.test_code(
'pickle.loads(data)',
expect_match=True)
r2 = tester.test_code(
'json.loads(data)',
expect_match=False)
print(
f'Positive: '
f'{r1["passed"]}')
print(
f'Negative: '
f'{r2["passed"]}')Advanced Tips
Use pattern-inside to restrict matches to specific function or class contexts for targeted scanning. Chain metavariable-comparison operators to enforce numeric constraints on captured values. Write negative test cases alongside positive ones to verify rules do not produce false positives.
When to Use It?
Use Cases
Create a rule that detects unsafe deserialization calls without proper validation guards. Enforce that all API route handlers include authentication decorators. Build a taint tracking rule that follows user input from request parameters to database queries.
Related Topics
Semgrep, static analysis, rule authoring, pattern matching, code quality, security rules, and custom linting.
Important Notes
Requirements
Semgrep CLI installed for running and testing rules locally. Understanding of Semgrep pattern syntax including operators like pattern-not, pattern-inside, and metavariables. Example code snippets representing both positive and negative matches for testing.
Usage Recommendations
Do: test rules against real codebase files to validate accuracy before deploying to CI. Use the fix field to suggest automated corrections that developers can apply directly. Document rule intent in the message field with clear remediation guidance.
Don't: create overly complex pattern combinations when a simpler rule achieves the same detection coverage. Deploy rules without testing against both matching and non-matching code samples. Rely on pattern matching alone for data flow vulnerabilities that require taint analysis.
Limitations
Semgrep patterns match syntax structure and cannot reason about runtime behavior or dynamic values. Taint tracking has limitations with indirect data flow through complex data transformations. Custom rules require maintenance as codebases evolve and API patterns change over time.
More Skills You Might Like
Explore similar skills to enhance your workflow
Griptape Automation
Automate Griptape operations through Composio's Griptape toolkit via
Competitor Alternatives
competitor-alternatives skill for business & marketing
Tdd Workflow
Optimize your TDD workflow with efficient automation and integration pipelines
M15 Anti Pattern
Identify and remediate M15 architectural anti-patterns through automated analysis
Accelerate
Accelerate development cycles through automated performance optimization and seamless tool integration
Tdd Guide
A comprehensive guide to Test-Driven Development automation and integration practices