Query

Executes SQL or natural language queries against DuckDB databases and ad-hoc files

What Is This?

Overview

The Query skill enables developers and data professionals to execute SQL queries directly against DuckDB databases or ad-hoc against raw files such as CSV, Parquet, and JSON. It accepts both standard SQL syntax and natural language questions, translating them into precise DuckDB-compatible statements using DuckDB Friendly SQL idioms. This makes data exploration fast and accessible without requiring a separate database server or complex setup.

Built on top of DuckDB, one of the fastest in-process analytical databases available, this skill integrates seamlessly into development workflows. Whether you are inspecting a local data file, running aggregations on a connected database, or asking plain-English questions about your data, the Query skill handles the translation and execution in a single step.

The skill operates through the Bash tool, resolving state from a local configuration file to determine the correct execution mode. It supports both persistent database connections and stateless file-based queries, giving developers flexibility depending on their project requirements.

Who Should Use This

  • Data engineers who need to quickly inspect and validate pipeline outputs stored in Parquet or CSV files
  • Backend developers integrating DuckDB into applications who want to test queries interactively during development
  • Analysts working with large local datasets who prefer SQL over Python or R for data exploration
  • DevOps engineers auditing log files or structured exports without importing data into a full database system
  • Data scientists prototyping transformations before moving them into production pipelines
  • Technical writers and documentation authors who need to verify data examples used in guides and tutorials

Why Use It?

Problems It Solves

  • Eliminates the need to spin up a database server just to run a quick query against a local file
  • Removes the friction of writing boilerplate connection code every time you want to inspect data
  • Bridges the gap between natural language questions and executable SQL for users less comfortable with query syntax
  • Reduces context switching by keeping data queries inside the same development environment and workflow
  • Avoids data export and import cycles when working with files that DuckDB can query directly

Core Highlights

  • Accepts raw SQL or natural language input for maximum flexibility
  • Supports querying CSV, Parquet, JSON, and other file formats without importing them first
  • Uses DuckDB Friendly SQL idioms for cleaner, more readable query syntax
  • Resolves database state automatically from a local state file
  • Operates entirely in-process with no server dependency
  • Integrates with Bash for scripting and automation
  • Handles both persistent database connections and one-off file queries
  • Provides fast analytical performance suitable for large datasets

How to Use It?

Basic Usage

To run a SQL query against an attached DuckDB database, pass the query directly as an argument:

query "SELECT * FROM orders LIMIT 10"

To query a file directly without a persistent database, use the --file flag:

query "SELECT COUNT(*) FROM read_csv_auto('data.csv')" --file data.csv

You can also ask a natural language question and the skill will translate it into SQL:

query "How many orders were placed in the last 30 days?"

Specific Scenarios

Scenario 1: Inspecting a Parquet export from a data pipeline

After a pipeline run produces a Parquet file, you can validate its contents immediately:

query "SELECT schema_name, COUNT(*) FROM read_parquet('output.parquet') GROUP BY schema_name"

Scenario 2: Aggregating log data stored as JSON

query "SELECT level, COUNT(*) as count FROM read_json_auto('app.log.json') GROUP BY level ORDER BY count DESC"

Real-World Examples

A backend developer testing a new reporting feature can run aggregation queries against a local DuckDB database to verify results before writing application code. A data engineer can spot-check a freshly generated CSV export by querying it directly without loading it into any system. An analyst can ask plain-English questions about sales data and receive accurate SQL-backed results in seconds.

When to Use It?

Use Cases

  • Validating data pipeline outputs before promoting to production
  • Exploring unfamiliar datasets quickly during onboarding or audits
  • Prototyping SQL transformations for ETL processes
  • Running ad-hoc reports against local exports without a BI tool
  • Debugging data quality issues in CSV or JSON files
  • Testing query performance and optimization strategies locally
  • Answering one-off business questions from structured file exports

Important Notes

Requirements

  • DuckDB must be installed and accessible in the system PATH
  • The Bash tool must be available in the execution environment
  • A valid state file must exist for persistent database connection mode
  • Input files must be accessible at the specified path with appropriate read permissions