Dev Browser Skill

- Local/source-available sites: Read the source code first to write selectors directly

Dev Browser Skill

What Is This

The Dev Browser Skill is a powerful browser automation tool available on the Happycapy Skills platform, designed to facilitate highly interactive and persistent web automation workflows. With this skill, users can programmatically navigate websites, fill out forms, capture screenshots, extract web data, automate repetitive browser tasks, or even test web applications. Unlike basic automation solutions, Dev Browser Skill maintains page state across script executions, enabling complex, multi-step workflows to be constructed incrementally. This persistence is especially useful for scenarios where tasks need to be executed in stages, or where the state of the browser session must be preserved between operations.

The skill supports two operational modes: a standalone mode that launches a fresh Chromium browser instance, and an extension mode that connects to the user’s existing Chrome browser. This flexibility allows users to choose the most appropriate environment for their automation needs, whether they require a clean session or need to leverage an already authenticated browser profile.

Why Use It

Dev Browser Skill addresses several common challenges faced by developers, testers, and data professionals who need to interact with web interfaces programmatically:

  • Persistent State: Unlike stateless automation scripts, this skill keeps the browser session alive, making it easier to build and debug workflows that depend on navigation history, cookies, or logged-in sessions.
  • Incremental Development: Users can write and test small, focused scripts, then expand them to cover more complex workflows as needed. This approach reduces risk and improves reliability.
  • Selector Accuracy: For local or source-available web applications, the skill allows direct inspection of the source code, enabling precise selector targeting and robust automation.
  • AI-Assisted Element Discovery: When dealing with unknown or dynamically generated layouts, built-in helper functions like getAISnapshot() enable users to discover interactable elements efficiently.
  • Flexible Execution: Whether the task is navigating to a URL, clicking buttons, extracting data, or taking screenshots, the skill provides a unified interface for diverse browser automation tasks.

How to Use It

Setup

Dev Browser Skill can be operated in two distinct modes. If unsure, consult the end user about their specific requirements.

Standalone Mode (Default)

This mode launches a new, isolated Chromium browser instance, ensuring a clean environment for every session. To start the server, run the following command:

./skills/dev-browser/server.sh &

Add the --headless flag for non-interactive, background operation:

./skills/dev-browser/server.sh --headless &

Note: Always wait for the Ready message before executing any automation scripts to ensure the browser is fully initialized.

Extension Mode

This mode connects to the user’s active Chrome browser. Use this when the user is already authenticated on websites and wants automation to occur within their logged-in session. This mode is ideal for tasks that require access to resources behind authentication walls or sessions that cannot be recreated programmatically.

Writing Automation Scripts

Automation with Dev Browser Skill is command-driven. Trigger phrases such as “go to [url]”, “click on”, “fill out the form”, “take a screenshot”, “scrape”, “automate”, “test the website”, or “log into” initiate corresponding actions.

For local or source-available sites, inspect the HTML to identify precise selectors:

await page.goto('http://localhost:8080');
await page.click('button#submit');
await page.type('input[name="username"]', 'testuser');

For unknown or dynamically generated layouts, use the AI snapshot utilities:

const snapshot = await getAISnapshot();
const loginButton = selectSnapshotRef(snapshot, { text: 'Login' });
await page.click(loginButton.selector);

To capture a screenshot for verification:

await page.screenshot({ path: 'current-page.png' });

Workflow Example

Suppose you need to log into a web application and extract some table data:

await page.goto('https://example.com/login');
await page.type('input[name="user"]', 'alice');
await page.type('input[name="pass"]', 'securePassword');
await page.click('button[type="submit"]');
await page.waitForNavigation();
const tableData = await page.evaluate(() =>
  Array.from(document.querySelectorAll('table.data tr'), row =>
    Array.from(row.cells, cell => cell.textContent)
  )
);
console.log(tableData);

When to Use It

Dev Browser Skill is particularly valuable in the following scenarios:

  • Automating repetitive browser tasks such as data entry, web scraping, or form submissions.
  • Testing web applications in a controlled, scriptable environment, with the ability to capture screenshots and persist session state.
  • Interacting with authenticated web sessions where the user is already logged in and manual session recreation is impractical.
  • Incremental workflow development where tasks need to be broken down and tested in stages, with state preserved between script runs.
  • Extracting structured data from web pages, especially when selector accuracy is critical and source code is available.

Important Notes

  • Always verify which operational mode to use with the user. Standalone mode is safest for isolated tasks, while extension mode is suitable for authenticated workflows.
  • Respect privacy and security when automating personal or sensitive sessions in extension mode.
  • Wait for the server’s Ready message before running scripts to avoid premature execution.
  • Use precise selectors for stability, especially on local/source-available sites.
  • When automating unknown layouts, leverage AI snapshot tools to improve reliability.
  • Incremental script development is encouraged-test each step before scaling up to more complex workflows.

By leveraging Dev Browser Skill, you can automate complex browser-based tasks with confidence, maintain session persistence, and streamline both development and testing of web applications.