Playwright Explore Website
playwright-explore-website skill for entertainment & gaming
Playwright Explore Website is an AI skill that guides developers in using Playwright to programmatically explore and analyze websites. It covers page navigation, content extraction, link discovery, screenshot capture, accessibility auditing, and site mapping for building automated web exploration scripts that systematically traverse and document website structure and content.
What Is This?
Overview
Playwright Explore Website provides patterns for systematically navigating websites using Playwright browser automation. It covers page-by-page crawling with depth control, content extraction from rendered pages including dynamically loaded content, link graph construction for understanding site structure, screenshot capture at multiple viewports, accessibility attribute collection, and performance metric gathering. Each exploration pattern handles real-world challenges like infinite scroll, lazy loading, and single-page application routing.
Who Should Use This
This skill serves QA engineers conducting comprehensive site audits, SEO specialists analyzing website structure and content, content migration teams cataloging pages before platform changes, and developers building monitoring tools that track website changes over time.
Why Use It?
Problems It Solves
Manually exploring large websites to understand their structure is impractical. Static crawlers miss content that requires JavaScript execution, client-side routing, or user interaction to reveal. Traditional HTTP-based scrapers cannot access single-page applications where content is rendered dynamically. Documenting site structure for audits or migrations requires systematic approaches that manual browsing cannot achieve.
Core Highlights
The skill uses Playwright's real browser rendering to access all content including JavaScript-rendered pages. It handles SPA navigation, infinite scroll pagination, lazy-loaded images and content sections, and authentication-gated areas. Exploration results include structured data about page content, navigation structure, visual appearance, and accessibility compliance.
How to Use It?
Basic Usage
import { chromium } from 'playwright';
async function exploreSite(startUrl: string, maxPages: number) {
const browser = await chromium.launch();
const page = await browser.newPage();
const visited = new Set<string>();
const queue = [startUrl];
const siteMap: Array<{ url: string; title: string; links: number }> = [];
while (queue.length > 0 && visited.size < maxPages) {
const url = queue.shift()!;
if (visited.has(url)) continue;
visited.add(url);
await page.goto(url, { waitUntil: 'networkidle' });
const title = await page.title();
const links = await page.$$eval('a[href]', anchors =>
anchors.map(a => (a as HTMLAnchorElement).href)
.filter(href => href.startsWith(new URL(url).origin))
);
siteMap.push({ url, title, links: links.length });
links.forEach(link => { if (!visited.has(link)) queue.push(link); });
}
await browser.close();
return siteMap;
}Real-World Examples
// Screenshot every page at multiple viewport sizes
async function captureResponsive(urls: string[]) {
const browser = await chromium.launch();
const viewports = [
{ name: 'mobile', width: 375, height: 812 },
{ name: 'tablet', width: 768, height: 1024 },
{ name: 'desktop', width: 1440, height: 900 },
];
for (const url of urls) {
for (const vp of viewports) {
const page = await browser.newPage({ viewport: { width: vp.width, height: vp.height } });
await page.goto(url, { waitUntil: 'networkidle' });
const slug = new URL(url).pathname.replace(/\//g, '_') || 'home';
await page.screenshot({
path: `screenshots/${slug}_${vp.name}.png`,
fullPage: true
});
await page.close();
}
}
await browser.close();
}Advanced Tips
Use page.route to block unnecessary resource loading like images and fonts when you only need content structure, significantly speeding up exploration. Implement request interception to log all API calls a page makes, revealing the data sources behind dynamic content. Set reasonable timeouts and retry logic for pages that load slowly or intermittently fail.
When to Use It?
Use Cases
Use Playwright Explore Website when conducting comprehensive site audits that need to cover every page, when building visual regression monitoring that captures screenshots across viewports, when cataloging website content before a platform migration, or when analyzing competitor websites to understand their content structure.
Related Topics
Playwright browser automation, web scraping best practices, site mapping tools, SEO auditing, visual regression testing, accessibility auditing tools, and web performance monitoring all complement the website exploration workflow.
Important Notes
Requirements
Playwright with at least one browser engine installed. Node.js 16 or later. Sufficient disk space for screenshots when capturing full-page images across multiple viewports. Respect for robots.txt and rate limiting when exploring external websites.
Usage Recommendations
Do: implement rate limiting between page visits to avoid overloading target servers. Set a maximum page count to prevent unbounded crawling on large sites. Filter explored URLs by domain to avoid following external links during site-specific exploration.
Don't: explore websites without permission from the site owner, as automated access may violate terms of service. Ignore authentication requirements when exploring protected areas. Run exploration scripts at full speed against production websites during peak traffic hours.
Limitations
Exploration depth is limited by the configured page count and available resources. Some website content behind interactive elements like modals or accordions may require explicit interaction scripts. CAPTCHAs and bot detection systems can block automated exploration. Very large websites with thousands of pages require distributed exploration strategies.
More Skills You Might Like
Explore similar skills to enhance your workflow
Hypothesis Generation
Accelerate scientific discovery by automating hypothesis generation and experimental data integration
Capsule Crm Automation
Automate Capsule CRM tasks via Rube MCP (Composio): contacts,
Gws Workflow
Google Workflow: Cross-service productivity workflows
Whisper
Automate and integrate Whisper speech recognition into your audio workflows
Cults Automation
Automate Cults operations through Composio's Cults toolkit via Rube MCP
Financial Operations Expert
Financial Operations Expert automation and integration