Sarif Parsing
Automate and integrate SARIF Parsing to streamline security analysis results
SARIF Parsing is a community skill for processing Static Analysis Results Interchange Format files, covering SARIF file parsing, result aggregation, rule extraction, code location mapping, and report generation for security and code quality tooling.
What Is This?
Overview
SARIF Parsing provides tools for reading and processing SARIF files that standardize static analysis tool output. It covers SARIF file parsing that reads the JSON-based format and extracts results, rules, and tool information from analysis runs, result aggregation that combines findings from multiple tools into a unified view with deduplication, rule extraction that maps result identifiers to rule descriptions, severity levels, and help text, code location mapping that resolves file paths, line numbers, and code regions from SARIF location objects, and report generation that creates summary reports with severity distributions and trend tracking. The skill helps teams process static analysis results programmatically.
Who Should Use This
This skill serves security engineers processing scanner output from multiple tools, DevOps teams integrating static analysis into CI pipelines, and developers building dashboards for code quality metrics.
Why Use It?
Problems It Solves
Different static analysis tools produce output in different formats making aggregation and comparison difficult. Extracting actionable information from raw SARIF files requires understanding the complex nested JSON structure. Deduplicating findings across multiple tools that detect the same issue requires matching by location and rule. Tracking analysis trends over time needs structured data extraction from each scan run.
Core Highlights
SARIF reader parses the standardized format and extracts structured results. Result aggregator combines findings across multiple tools and runs. Rule mapper connects result IDs to human-readable descriptions and severity. Location resolver maps findings to specific file paths and code regions.
How to Use It?
Basic Usage
import json
def parse_sarif(
filepath: str
) -> list[dict]:
with open(
filepath) as f:
sarif = json.load(f)
findings = []
for run in sarif.get(
'runs', []
):
tool = run['tool'][
'driver']['name']
rules = {
r['id']: r
for r in run['tool'][
'driver'].get(
'rules', [])}
for result in run.get(
'results', []
):
rule_id = result.get(
'ruleId', '')
rule = rules.get(
rule_id, {})
locs = result.get(
'locations', [{}])
loc = locs[0].get(
'physicalLocation',
{})
findings.append({
'tool': tool,
'rule': rule_id,
'severity': result
.get('level',
'warning'),
'message': result
.get('message',
{}).get(
'text', ''),
'file': loc.get(
'artifactLocation',
{}).get('uri', ''),
'line': loc.get(
'region', {}).get(
'startLine', 0)
})
return findings
results = parse_sarif(
'scan.sarif')
for r in results[:5]:
print(
f'{r["severity"]}: '
f'{r["file"]}:'
f'{r["line"]} '
f'{r["rule"]}')Real-World Examples
import json
from collections import (
Counter)
class SARIFAggregator:
def __init__(self):
self.findings = []
def load(
self,
filepath: str
):
with open(
filepath) as f:
sarif = json.load(f)
for run in sarif.get(
'runs', []
):
tool = run['tool'][
'driver']['name']
for result in (
run.get(
'results', [])
):
self.findings\
.append({
'tool': tool,
'rule': result
.get('ruleId'),
'level': result
.get('level',
'warning')})
def summary(self):
by_sev = Counter(
f['level']
for f in
self.findings)
by_tool = Counter(
f['tool']
for f in
self.findings)
return {
'total': len(
self.findings),
'by_severity':
dict(by_sev),
'by_tool':
dict(by_tool)}
agg = SARIFAggregator()
agg.load('semgrep.sarif')
agg.load('codeql.sarif')
report = agg.summary()
print(f'Total: '
f'{report["total"]}')
for sev, count in (
report['by_severity']
.items()
):
print(f' {sev}: {count}')Advanced Tips
Deduplicate findings by combining file path and line number as a key when multiple tools detect the same issue. Map SARIF severity levels to your organization's priority system for consistent triage. Use the SARIF fingerprint field when available for more reliable deduplication across scan runs.
When to Use It?
Use Cases
Parse SARIF output from security scanners to extract findings with file locations and severity. Aggregate results from multiple static analysis tools into a unified report. Track the number and severity of findings across builds in a CI pipeline.
Related Topics
SARIF, static analysis, code scanning, security tools, code quality, CI integration, and vulnerability management.
Important Notes
Requirements
SARIF version 2.1.0 files from compatible static analysis tools. JSON parsing library for reading the SARIF format structure. Source code repository access for validating file path references in SARIF location data.
Usage Recommendations
Do: validate SARIF file structure before processing since some tools produce non-standard output. Use rule metadata to provide context when displaying findings to developers. Track findings across builds to identify trends and regressions.
Don't: treat all findings as equal severity since SARIF level values carry important priority information. Parse SARIF files without error handling since malformed files from some tools may cause parsing failures. Ignore the tool version information since rule definitions may change between versions.
Limitations
SARIF file structure varies between tools with some using optional fields differently. Deduplication across tools is imperfect since different scanners may report the same issue with different locations. Large SARIF files from comprehensive scans may require streaming parsers for memory-efficient processing.
More Skills You Might Like
Explore similar skills to enhance your workflow
Cfo Advisor
Financial leadership for startups and scaling companies. Financial modeling, unit economics, fundraising strategy, cash management, and board financia
Chmeetings Automation
Automate Chmeetings operations through Composio's Chmeetings toolkit
Pymc
Advanced PyMC automation and integration for Bayesian statistical modeling and inference
Web Quality Audit
Automate and integrate Web Quality Audit checks for better site performance
Nnsight
Nnsight automation and integration for neural network interpretability and inspection
Hashnode Automation
Automate Hashnode operations through Composio's Hashnode toolkit via