
Playwright Scraper Skill
Playwright-based web scraping skill with anti-bot protection for reliable data extraction
Playwright Scraper Skill is a community skill for advanced web scraping, covering anti-bot protection bypass, dynamic content extraction, form automation, screenshot capture, and complex site navigation for robust data collection from modern web applications.
What Is This?
Overview
Playwright Scraper Skill provides AI agents and data collection tools with advanced web scraping capabilities using Playwright browser automation with built-in anti-bot protection handling. It covers anti-bot bypass techniques that handle CAPTCHAs, fingerprint detection, and rate limiting through realistic browser behavior patterns and request timing, dynamic content extraction that waits for JavaScript-rendered elements and AJAX-loaded data before scraping, form automation that fills inputs, selects dropdowns, and submits multi-step forms across complex workflows, screenshot capture that documents page state and visual elements for verification, and complex navigation that handles single-page application routing, infinite scroll, and pagination patterns. The skill has been successfully tested on challenging sites with sophisticated bot detection systems, including e-commerce platforms and content aggregators that actively block automated access.
Who Should Use This
This skill serves data collection engineers building robust scrapers, market research teams extracting competitor data, and AI agents requiring structured web data from protected sites. It is also well suited for QA engineers who need to automate and verify web interactions as part of testing pipelines.
Why Use It?
Problems It Solves
Modern websites use sophisticated bot detection that blocks simple HTTP scraping attempts. JavaScript-rendered content does not appear in initial HTML and requires browser execution to extract. Multi-step workflows with form submissions and authentication cannot be scraped with static requests. Building custom scraping solutions with anti-bot protection requires extensive browser automation knowledge and constant maintenance as detection methods evolve continuously. Scaling scraping operations across many target sites demands infrastructure for managing browser instances and handling concurrent extraction jobs efficiently. Verifying scraping accuracy is difficult without visual confirmation of page state during extraction.
Core Highlights
Anti-bot handler bypasses detection through realistic browser behavior patterns. Content extractor waits for dynamic JavaScript rendering before scraping. Form automator handles multi-step submissions and complex input sequences. Screenshot tool captures visual page state for verification and debugging purposes.
How to Use It?
Basic Usage
from playwright.sync_api import \
sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True
)
page = browser.new_page()
# Navigate and wait
page.goto('https://site.com')
page.wait_for_selector(
'.product'
)
# Extract data
products = page.query_selector_all(
'.product'
)
for p in products:
print(p.text_content())Real-World Examples
page.goto('https://app.com/login')
page.fill('#email',
'user@test.com')
page.fill('#password', 'pass')
page.click('button[type=submit]')
page.wait_for_url('**/dashboard')
page.goto('https://store.com')
page.evaluate(
'window.scrollTo(0, '
'document.body.scrollHeight)'
)
page.wait_for_timeout(2000)
page.screenshot(
path='verification.png'
)
for i in range(5):
page.evaluate(
'window.scrollBy(0, 1000)'
)
page.wait_for_timeout(1000)Advanced Tips
Use stealth plugins to mask Playwright automation signatures that websites detect through JavaScript fingerprinting. Implement random delays between actions to mimic human browsing patterns and avoid rate limiting. Rotate user agents and viewport sizes across scraping sessions to appear as different users and reduce detection risk. Additionally, consider intercepting and blocking unnecessary network requests such as images and fonts to improve scraping speed and reduce resource consumption during large-scale extraction runs.
When to Use It?
Use Cases
Scrape competitor pricing and product data from e-commerce sites with bot protection for market intelligence. Extract job postings from employment websites that render listings dynamically with JavaScript frameworks. Collect social media public profile data from platforms with infinite scroll and anti-automation measures for research analysis.
Related Topics
Web scraping, browser automation, anti-bot bypass, Playwright, data extraction, dynamic content handling, and bot detection evasion.
Important Notes
Requirements
Playwright installed with browser binaries for Chromium, Firefox, or WebKit execution. Sufficient system resources including memory and CPU for browser process execution during scraping. Understanding of HTML selectors and page structure for accurate data extraction targeting.
Usage Recommendations
Do: implement rate limiting and random delays to avoid overwhelming target servers and triggering defenses. Take screenshots at critical steps to verify scraping logic extracts correct data. Use wait conditions for dynamic elements rather than fixed timeouts for reliability.
Don't: scrape websites in violation of their terms of service or robots.txt directives. Run scrapers without error handling since websites change structure and break selectors. Leave browser processes running after scraping completes since they consume significant system resources.
Limitations
Advanced CAPTCHA systems may still block automated access despite anti-bot measures. Scraping is significantly slower than API access due to full browser rendering overhead. Website structure changes break selectors and require scraper maintenance regularly. Some sites implement server-side bot detection that cannot be bypassed with client-side techniques alone.
More Skills You Might Like
Explore similar skills to enhance your workflow
Project Stage Detect
argument-hint: "[optional: role filter like 'programmer' or 'designer']"
Board Deck Builder
Assembles comprehensive board and investor update decks by pulling perspectives from all C-suite roles. Use when preparing board meetings, investor up
Conventional Commit
conventional-commit skill for programming & development
Campaign Manager
Manage protein design campaigns with experiment tracking and iteration planning
Building Threat Hunt Hypothesis Framework
Build a systematic threat hunt hypothesis framework that transforms threat intelligence, attack patterns, and
Conducting Mobile App Penetration Test
Conducts penetration testing of iOS and Android mobile applications following the OWASP Mobile Application Security