Variant Analysis

Automate and integrate Variant Analysis for scalable genomic variant detection and interpretation

Variant Analysis is a community skill for analyzing code variants and vulnerability patterns across codebases, covering CodeQL query writing, pattern matching, taint analysis, security bug detection, and systematic variant discovery for software security auditing.

What Is This?

Overview

Variant Analysis provides guidance on finding variations of known vulnerabilities across large codebases using query-based analysis tools. It covers CodeQL query writing that defines structural code patterns using a declarative query language to search for vulnerable coding patterns, pattern matching that identifies similar code structures across files and repositories based on AST shapes and data flow characteristics, taint analysis that tracks untrusted input from sources through transformations to sensitive sinks where it could cause harm, security bug detection that identifies common vulnerability classes like injection, buffer overflow, and authentication bypass patterns, and systematic discovery that finds all instances of a vulnerability pattern after an initial bug is identified. The skill helps security auditors find related vulnerabilities systematically.

Who Should Use This

This skill serves security researchers hunting for vulnerability variants, code auditors reviewing large codebases for pattern violations, and development teams scanning for known-bad coding patterns.

Why Use It?

Problems It Solves

Finding all instances of a vulnerability pattern manually in large codebases is impractical and error-prone. Grep-based searches miss semantic variations that use different syntax for the same vulnerable pattern. After fixing one bug, related variants often remain undiscovered in other parts of the code. Taint tracking through complex call chains requires automated data flow analysis.

Core Highlights

Query engine searches code using structural pattern matching. Taint tracker follows untrusted data from source to sink. Variant finder discovers related bugs from initial examples. Pattern library provides queries for common vulnerability classes.

How to Use It?

Basic Usage

// CodeQL: Find SQL
// injection variants
import python
import semmle.python
  .dataflow.new.DataFlow
import semmle.python
  .dataflow.new.TaintTracking
import semmle.python
  .Concepts

module SqlInjection
  implements
  DataFlow::ConfigSig
{
  predicate isSource(
    DataFlow::Node src
  ) {
    src instanceof
      RemoteFlowSource
  }

  predicate isSink(
    DataFlow::Node sink
  ) {
    exists(
      SqlExecution exec |
      sink = exec
        .getAnInput()
    )
  }
}

from
  DataFlow::Node source,
  DataFlow::Node sink
where
  TaintTracking::
    Global<SqlInjection>
    ::flow(source, sink)
select sink,
  "SQL injection from "
  + "$@ to $@.",
  source, "user input",
  sink, "query"

Real-World Examples

// Find command injection
import python
import semmle.python
  .dataflow.new.DataFlow
import semmle.python
  .Concepts

from Call call,
  DataFlow::Node arg
where
  call.getFunc()
    .(Attribute)
    .getName() = "system"
  and arg.asExpr() =
    call.getAnArg()
  and exists(
    RemoteFlowSource rfs |
    DataFlow::localFlow(
      rfs, arg))
select call,
  "Potential command "
  + "injection via $@.",
  arg, "user input"

Advanced Tips

Start variant analysis from a known vulnerability and generalize the pattern by abstracting specific variable names and call targets. Use CodeQL path queries to visualize data flow paths from source to sink for easier triage. Build a library of reusable query modules for common source and sink patterns across projects.

When to Use It?

Use Cases

Search for SQL injection variants across a large Python web application after finding one instance. Audit a codebase for command injection patterns using taint tracking from request handlers. Build custom CodeQL queries for project-specific security patterns and API misuse.

Important Notes

Requirements

CodeQL CLI and database creation tools for building queryable code representations from source repositories. Understanding of the CodeQL query language including predicates, classes, and data flow library concepts. Source code access for creating CodeQL databases from the target project.

Usage Recommendations

Do: start with existing CodeQL queries from the public library and customize them for your target patterns. Test queries against known vulnerable code samples to verify detection accuracy before running on production codebases. Use path queries to understand complete data flow paths.

Don't: rely solely on automated results without manual review since false positives are common in taint analysis. Write overly specific queries that only match exact syntax patterns since variants often use different coding styles. Skip sanitizer modeling since ignoring input validation produces excessive false positive results.

Limitations

Static analysis cannot detect vulnerabilities that depend on runtime state or configuration values. CodeQL database creation requires successful project compilation for compiled languages. Complex taint propagation through serialization boundaries or inter-process communication may not be tracked.

More Skills You Might Like

Explore similar skills to enhance your workflow