Skip to content

browser_action

The browser_action tool enables web automation and interaction capabilities using Puppeteer-controlled browsers. It supports VJSP in launching browsers, navigating to target websites, clicking page elements, entering text, scrolling pages, and providing visual feedback through screenshots.

Parameter Description

This tool accepts the following parameters with constraints and value rules:

  • action (required): The type of operation to perform, with valid values:
    • launch: Start a new browser session and navigate to the specified URL
    • click: Perform a click operation at the specified X, Y coordinates on the page
    • type: Perform keyboard text input operation
    • scroll_down: Scroll down by one page height
    • scroll_up: Scroll up by one page height
    • close: Terminate the current browser session
  • url (optional): When using the launch action, specify the target URL for browser navigation
  • coordinate (optional): When using the click action, specify the X, Y coordinates for clicking (example format: "450,300")
  • text (optional): When using the type action, specify the text content to be entered

Functionality Positioning

This tool creates an automated browser session controlled by VJSP, enabling website navigation, page element interaction, and execution of all tasks requiring browser automation support. Each operation generates a screenshot of the current page state, supporting visual validation of the process.

Applicable Scenarios

  • When VJSP needs to interact with web applications or websites
  • When conducting automated user interface (UI) or web functionality testing
  • When capturing web page screenshots for retention or analysis
  • When visually demonstrating web business processes

Core Features

  • Generate screenshots and capture console logs after each operation, providing visual feedback and debugging basis
  • Support full-process automation from browser startup, page interaction to session closure
  • Support coordinate-based precise interaction, keyboard input, and page scrolling operations
  • Built-in intelligent page loading detection mechanism to ensure browser session consistency
  • Support two operating modes: local mode (independent Puppeteer instance), remote mode (connect to already started Chrome browser) -具备 elegant error handling capability, automatically cleaning up sessions and outputting detailed error information
  • Support multiple screenshot formats and quality configurations to optimize visual output effects -实现 full-link tracking of interaction status through position identification and operation history records

Browser Operation Modes

This tool provides two independent operation modes to adapt to different automation scenario requirements:

Local Browser Mode (Default)

  • Automatically download and manage local Chromium instances through Puppeteer
  • Create a brand new browser runtime environment for each startup operation
  • Cannot access local existing user configurations, cookies, and browser extensions -实现 consistent and predictable behavior in a sandboxed environment
  • Automatically and completely close browser processes when the session is terminated

Remote Browser Mode

  • Connect to Chrome/Chromium browser instances with remote debugging enabled
  • Can access existing browser runtime states, cookies, and installed extensions
  • Reuse existing browser processes to optimize startup speed
  • Support connection to browser instances in Docker containers or on remote servers
  • Only disconnect when the session is terminated, without closing the target browser
  • Depend on Chrome browser with remote debugging port enabled (default port: 9222)

Function Limitations

  • During active browser sessions, only the browser_action tool can be called to perform operations
  • Browser coordinates are viewport-relative coordinates, not page absolute coordinates
  • Click operations only support positioning to visible elements within the viewport
  • Need to explicitly close the current browser session before switching to other tools
  • Browser window size supports custom configuration (default size: 900x600)
  • Does not support direct interaction with browser developer tools (DevTools)
  • Browser sessions are temporary and cannot be persisted after VJSP restart
  • Only compatible with Chrome/Chromium browsers, not yet supporting Firefox or Safari
  • Local mode cannot access existing cookies; remote mode depends on Chrome browser with debugging enabled

Execution Principle

When the browser_action tool is called, it executes operations according to the following process:

  1. Operation Validation and Browser Management

    • Validate the legitimacy of required parameters for the requested operation
    • Execute launch operation: Initialize browser session (local Puppeteer instance/remote Chrome connection)
    • Execute interactive operations: Reuse established browser sessions
    • Execute close operation: Terminate or disconnect browser connection according to the corresponding mode
  2. Page Interaction and Stability Assurance

    • Implement DOM stability detection based on the waitTillHTMLStable algorithm to ensure complete page loading
    • Execute requested operations such as navigation, clicking, input, scrolling in a reasonable sequence
    • Monitor network activity after clicking, wait for page navigation to complete if necessary
  3. Visual Feedback Generation

    • Preferentially use WebP format to capture optimized screenshots (PNG format as a fallback solution)
    • Record browser console logs to provide a basis for problem debugging
    • Track mouse position and maintain paginated operation history records
  4. Session Lifecycle Management

    • Maintain browser runtime state across multiple operations
    • Capture errors during execution and automatically clean up system resources -强制约束 compliance workflow execution sequence: start → interaction → close

Workflow Execution Sequence

Browser automation interaction must strictly follow the following execution sequence:

  1. Session Initialization: All browser automation workflows must start with a launch operation
  2. Interaction Execution Phase: Can continuously execute multiple click, type, and scrolling operations
  3. Session Termination: All browser automation workflows must end with a close operation
  4. Tool Switching: After closing the browser session, other tools can be called to perform operations

Typical Application Examples

  • Web form submission process: VJSP launches a browser and navigates to the form page, fills in each field through type operation, and performs a click operation to submit the form
  • Responsive website testing: VJSP navigates to the target site and checks the adaptation effect of different page areas through scrolling operations
  • Web application screenshot capture: VJSP navigates to different pages of the application and performs screenshot operations at each node to complete retention
  • E-commerce checkout process demonstration: VJSP simulates the full process from product selection to payment confirmation

Usage Examples

Launch a browser and navigate to the specified website:

<browser_action>
<action>launch</action>
<url>https://example.com</url>
</browser_action>

Perform a click operation at the specified coordinates (e.g., click a page button):

<browser_action>
<action>click</action>
<coordinate>450,300</coordinate>
</browser_action>

Enter text into a focused input box:

<browser_action>
<action>type</action>
<text>Hello, World!</text>
</browser_action>

Scroll down the page to view more content:

<browser_action>
<action>scroll_down</action>
</browser_action>

Close the current browser session:

<browser_action>
<action>close</action>
</browser_action>