Firecrawl Search

Web search and scraping via Firecrawl API. Use when you need to search the web, scrape websites

Firecrawl Search is a community skill for web scraping and search, covering JavaScript-rendered page scraping, full site crawling, search result aggregation, URL content extraction, and structured data retrieval through the Firecrawl API.

What Is This?

Overview

Firecrawl Search provides comprehensive web scraping and search capabilities through the Firecrawl API designed for reliable data extraction. It covers JavaScript page scraping that handles dynamic content rendered by frameworks like React and Vue, full site crawling that recursively follows links to extract content from entire websites with configurable depth limits, search functionality that queries the web and aggregates results from multiple sources, URL content extraction that pulls clean text and structured data from individual pages with HTML stripping, and data formatting that converts scraped content into JSON, markdown, or plain text for downstream processing. The skill helps developers access web data programmatically without maintaining scraping infrastructure.

Who Should Use This

This skill serves data engineers building web scraping pipelines, AI agents needing access to current web content, and researchers collecting data from websites for analysis and monitoring purposes.

Why Use It?

Problems It Solves

Scraping JavaScript-heavy websites with traditional HTTP requests fails because content is rendered client-side after page load. Building reliable web scraping infrastructure requires managing headless browsers, handling rate limits, and dealing with anti-bot measures. Extracting clean text from HTML pages involves parsing complex structures and removing navigation, ads, and boilerplate content. Crawling entire websites at scale requires distributed systems, queue management, and deduplication logic that most teams lack resources to build and maintain properly over time.

Core Highlights

JavaScript scraper handles dynamic content from React, Vue, and other modern frameworks. Site crawler recursively follows links to extract entire websites with depth control. Search engine queries web sources and aggregates results. Content extractor pulls clean text and structured data from HTML pages.

How to Use It?

Basic Usage

import os
import requests

api_key = os.environ[
    'FIRECRAWL_API_KEY']
headers = {
    'Authorization':
        f'Bearer {api_key}'
}

resp = requests.post(
    'https://api.firecrawl'
    '.dev/v0/scrape',
    headers=headers,
    json={
        'url':
            'https://example'
            '.com/page'
    })
content = resp.json()[
    'data']['content']
print(content)

Real-World Examples

crawl_req = {
    'url':
        'https://docs'
        '.example.com',
    'crawlerOptions': {
        'maxDepth': 3,
        'limit': 100
    },
    'pageOptions': {
        'onlyMainContent':
            True
    }
}
crawl_resp = requests.post(
    'https://api.firecrawl'
    '.dev/v0/crawl',
    headers=headers,
    json=crawl_req)
job_id = crawl_resp.json()[
    'jobId']

status = requests.get(
    f'https://api.firecrawl'
    f'.dev/v0/crawl'
    f'/status/{job_id}',
    headers=headers
).json()

search_req = {
    'query':
        'Python tutorials',
    'limit': 10
}
search_resp = requests.post(
    'https://api.firecrawl'
    '.dev/v0/search',
    headers=headers,
    json=search_req)
results = search_resp.json()[
    'data']

Advanced Tips

Use the onlyMainContent option to extract article text while filtering out navigation, ads, and boilerplate for cleaner data. Implement crawl status polling with exponential backoff since large crawls can take minutes to complete. Cache scraped content locally with timestamps to reduce API calls and costs when data does not change frequently, implementing refresh logic based on your update requirements.

When to Use It?

Use Cases

Build web monitoring systems that track competitor websites for price changes, product launches, and content updates. Create AI agents that answer questions using current web content by scraping and indexing relevant sources. Collect training data for machine learning models by crawling documentation sites and extracting structured information.

Important Notes

Requirements

A Firecrawl API key configured in environment variables for authenticating scraping requests. Network access to Firecrawl API endpoints for submitting scrape, crawl, and search operations. Understanding of target website structures and robots.txt policies to ensure ethical scraping practices.

Usage Recommendations

Do: respect robots.txt and terms of service when scraping websites to avoid legal issues. Implement rate limiting and caching to reduce API costs and avoid overwhelming target servers. Use the onlyMainContent option to get cleaner text extraction and reduce data processing overhead.

Don't: scrape websites excessively without considering their bandwidth and server resources. Store API keys in code repositories or expose them publicly since they grant scraping access. Assume scraped content structure remains constant since websites change layouts and markup frequently.

Limitations

Firecrawl API has usage quotas and per-request pricing that can accumulate costs with high-volume scraping. Some websites employ sophisticated bot detection that may block scraping attempts despite JavaScript rendering. Crawling large sites with thousands of pages can be time-consuming and expensive, requiring careful scope definition and limits.

More Skills You Might Like

Explore similar skills to enhance your workflow