Read File
Reads and explores CSV, JSON, Parquet, Excel, and other data files locally or from S3/HTTPS
What Is This?
Overview
Read File is a DuckDB-powered skill designed to open, inspect, and analyze data files across a wide range of formats without requiring manual configuration or format-specific tooling. It accepts a filename or URL as input, automatically resolves the path, and uses DuckDB's extension-based format detection to handle the file appropriately. Supported formats include CSV, JSON, Parquet, Avro, Excel, and spatial data files, whether stored locally or hosted remotely on S3 or HTTPS endpoints.
The skill operates through a simple argument pattern: provide a filename or URL, optionally followed by a question about the data. If no question is supplied, the skill defaults to describing the dataset, giving you an immediate structural overview. This makes it useful for both quick inspections and deeper analytical queries without switching between tools or writing boilerplate code.
Built on DuckDB, the skill benefits from a high-performance analytical engine capable of processing large files efficiently. Because DuckDB handles format detection through its extension system, users do not need to install separate parsers or configure readers manually. The skill abstracts that complexity entirely.
Who Should Use This
- Data engineers who need to quickly inspect incoming data files before building pipelines or transformations.
- Data analysts who work with files from multiple sources and formats and want a single consistent interface for exploration.
- Backend developers integrating external data feeds who need to validate file structure and content before writing parsing logic.
Why Use It?
Problems It Solves
- Eliminates the need to install and configure format-specific tools for each file type encountered during development.
- Removes the friction of writing boilerplate DuckDB or pandas code just to inspect a file's structure and sample rows.
- Solves the problem of accessing remote files on S3 or HTTPS without downloading them to a local machine first.
- Reduces context switching by providing a single skill that handles CSV, JSON, Parquet, Avro, Excel, and spatial formats uniformly.
- Addresses the challenge of answering ad hoc questions about a dataset without setting up a full analysis environment.
Core Highlights
- Supports local and remote files, including S3 URIs and HTTPS URLs.
- Automatic format detection through DuckDB's extension system, no manual configuration required.
- Accepts an optional natural language question about the data alongside the filename.
- Defaults to a full data description when no question is provided.
- Handles CSV, JSON, Parquet, Avro, Excel, and spatial file formats.
- Powered by DuckDB for fast, in-process analytical query execution.
- Works through the Bash tool, keeping the execution environment lightweight.
- No magic file extension dependency, the skill resolves format from content and context.
How to Use It?
Basic Usage
The skill takes a filename or URL as its first argument and an optional question as the second argument.
## Describe a local CSV file
read-file data/sales_2024.csv
## Ask a specific question about a remote Parquet file
read-file s3://my-bucket/events/logs.parquet "What are the top 10 event types by count?"
## Inspect a JSON file served over HTTPS
read-file https://example.com/data/users.json "How many records are there?"Specific Scenarios
Scenario 1: Validating an incoming data file before pipeline ingestion A data engineer receives a new CSV file from a vendor. Before writing a pipeline, they run the skill to confirm column names, data types, and row counts.
read-file /tmp/vendor_export.csv "Show column names and data types"Scenario 2: Auditing a remote Parquet file in S3 A platform engineer needs to verify the schema of a file stored in a production S3 bucket without downloading it.
read-file s3://prod-data/warehouse/orders.parquet "Describe the schema"Real-World Examples
- A data analyst receives an Excel file from a business team and uses the skill to extract a summary of numeric columns before building a report.
- A backend developer queries a remote JSON feed to confirm the structure matches the expected API contract before writing a parser.
- A data scientist profiles a large Parquet dataset on S3 to check for null values and distribution before starting feature engineering.
Important Notes
Requirements
- DuckDB must be installed and accessible in the execution environment.
- Relevant DuckDB extensions must be available for non-standard formats such as Avro, Excel, and spatial files.
- Remote file access requires appropriate network permissions and, for S3, valid credentials configured in the environment.
- The Bash tool must be available as the skill executes through shell commands.
More Skills You Might Like
Explore similar skills to enhance your workflow
Mcp Builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools.
Structured Autonomy Plan
structured-autonomy-plan skill for programming & development
Test Flakiness Detection
allowed-tools: Read, Glob, Grep, Write, Edit, Bash
Retro
Facilitate a structured sprint retrospective — what went well, what didn't, and prioritized action items with owners and deadlines. Use when
Responsiveness Check
Test website responsiveness across viewport widths using browser automation. Resizes a single session through breakpoints, screenshots each width, and
Problem Statement
Write a user-centered problem statement with who is blocked, what they are trying to do, why it matters, and how it feels. Use when framing