Dev Browser Skill
- Local/source-available sites: Read the source code first to write selectors directly
Dev Browser Skill
What Is This
The Dev Browser Skill is a powerful browser automation tool available on the Happycapy Skills platform, designed to facilitate highly interactive and persistent web automation workflows. With this skill, users can programmatically navigate websites, fill out forms, capture screenshots, extract web data, automate repetitive browser tasks, or even test web applications. Unlike basic automation solutions, Dev Browser Skill maintains page state across script executions, enabling complex, multi-step workflows to be constructed incrementally. This persistence is especially useful for scenarios where tasks need to be executed in stages, or where the state of the browser session must be preserved between operations.
The skill supports two operational modes: a standalone mode that launches a fresh Chromium browser instance, and an extension mode that connects to the user’s existing Chrome browser. This flexibility allows users to choose the most appropriate environment for their automation needs, whether they require a clean session or need to leverage an already authenticated browser profile.
Why Use It
Dev Browser Skill addresses several common challenges faced by developers, testers, and data professionals who need to interact with web interfaces programmatically:
- Persistent State: Unlike stateless automation scripts, this skill keeps the browser session alive, making it easier to build and debug workflows that depend on navigation history, cookies, or logged-in sessions.
- Incremental Development: Users can write and test small, focused scripts, then expand them to cover more complex workflows as needed. This approach reduces risk and improves reliability.
- Selector Accuracy: For local or source-available web applications, the skill allows direct inspection of the source code, enabling precise selector targeting and robust automation.
- AI-Assisted Element Discovery: When dealing with unknown or dynamically generated layouts, built-in helper functions like
getAISnapshot()enable users to discover interactable elements efficiently. - Flexible Execution: Whether the task is navigating to a URL, clicking buttons, extracting data, or taking screenshots, the skill provides a unified interface for diverse browser automation tasks.
How to Use It
Setup
Dev Browser Skill can be operated in two distinct modes. If unsure, consult the end user about their specific requirements.
Standalone Mode (Default)
This mode launches a new, isolated Chromium browser instance, ensuring a clean environment for every session. To start the server, run the following command:
./skills/dev-browser/server.sh &Add the --headless flag for non-interactive, background operation:
./skills/dev-browser/server.sh --headless &Note: Always wait for the Ready message before executing any automation scripts to ensure the browser is fully initialized.
Extension Mode
This mode connects to the user’s active Chrome browser. Use this when the user is already authenticated on websites and wants automation to occur within their logged-in session. This mode is ideal for tasks that require access to resources behind authentication walls or sessions that cannot be recreated programmatically.
Writing Automation Scripts
Automation with Dev Browser Skill is command-driven. Trigger phrases such as “go to [url]”, “click on”, “fill out the form”, “take a screenshot”, “scrape”, “automate”, “test the website”, or “log into” initiate corresponding actions.
For local or source-available sites, inspect the HTML to identify precise selectors:
await page.goto('http://localhost:8080');
await page.click('button#submit');
await page.type('input[name="username"]', 'testuser');For unknown or dynamically generated layouts, use the AI snapshot utilities:
const snapshot = await getAISnapshot();
const loginButton = selectSnapshotRef(snapshot, { text: 'Login' });
await page.click(loginButton.selector);To capture a screenshot for verification:
await page.screenshot({ path: 'current-page.png' });Workflow Example
Suppose you need to log into a web application and extract some table data:
await page.goto('https://example.com/login');
await page.type('input[name="user"]', 'alice');
await page.type('input[name="pass"]', 'securePassword');
await page.click('button[type="submit"]');
await page.waitForNavigation();
const tableData = await page.evaluate(() =>
Array.from(document.querySelectorAll('table.data tr'), row =>
Array.from(row.cells, cell => cell.textContent)
)
);
console.log(tableData);When to Use It
Dev Browser Skill is particularly valuable in the following scenarios:
- Automating repetitive browser tasks such as data entry, web scraping, or form submissions.
- Testing web applications in a controlled, scriptable environment, with the ability to capture screenshots and persist session state.
- Interacting with authenticated web sessions where the user is already logged in and manual session recreation is impractical.
- Incremental workflow development where tasks need to be broken down and tested in stages, with state preserved between script runs.
- Extracting structured data from web pages, especially when selector accuracy is critical and source code is available.
Important Notes
- Always verify which operational mode to use with the user. Standalone mode is safest for isolated tasks, while extension mode is suitable for authenticated workflows.
- Respect privacy and security when automating personal or sensitive sessions in extension mode.
- Wait for the server’s
Readymessage before running scripts to avoid premature execution. - Use precise selectors for stability, especially on local/source-available sites.
- When automating unknown layouts, leverage AI snapshot tools to improve reliability.
- Incremental script development is encouraged-test each step before scaling up to more complex workflows.
By leveraging Dev Browser Skill, you can automate complex browser-based tasks with confidence, maintain session persistence, and streamline both development and testing of web applications.
More Skills You Might Like
Explore similar skills to enhance your workflow
Decision Logger
Two-layer memory architecture for board meeting decisions. Manages raw transcripts (Layer 1) and approved decisions (Layer 2). Use when logging decisi
Writing Prds
Automate and integrate Writing PRDs to streamline product requirements documentation
Senior Security
Senior Security automation and integration for expert-level security engineering
Llamaguard
Llamaguard automation and integration for AI safety and content moderation
Brightpearl Automation
Automate Brightpearl tasks via Rube MCP (Composio)
Curated Automation
Automate Curated operations through Composio's Curated toolkit via Rube