browser_action

The browser_action tool enables web automation and interaction capabilities using Puppeteer-controlled browsers. It supports VJSP in launching browsers, navigating to target websites, clicking page elements, entering text, scrolling pages, and providing visual feedback through screenshots.

Parameter Description

This tool accepts the following parameters with constraints and value rules:

action (required): The type of operation to perform, with valid values:
- launch: Start a new browser session and navigate to the specified URL
- click: Perform a click operation at the specified X, Y coordinates on the page
- type: Perform keyboard text input operation
- scroll_down: Scroll down by one page height
- scroll_up: Scroll up by one page height
- close: Terminate the current browser session
url (optional): When using the launch action, specify the target URL for browser navigation
coordinate (optional): When using the click action, specify the X, Y coordinates for clicking (example format: "450,300")
text (optional): When using the type action, specify the text content to be entered

Functionality Positioning

This tool creates an automated browser session controlled by VJSP, enabling website navigation, page element interaction, and execution of all tasks requiring browser automation support. Each operation generates a screenshot of the current page state, supporting visual validation of the process.

Applicable Scenarios

When VJSP needs to interact with web applications or websites
When conducting automated user interface (UI) or web functionality testing
When capturing web page screenshots for retention or analysis
When visually demonstrating web business processes

Core Features

Generate screenshots and capture console logs after each operation, providing visual feedback and debugging basis
Support full-process automation from browser startup, page interaction to session closure
Support coordinate-based precise interaction, keyboard input, and page scrolling operations
Built-in intelligent page loading detection mechanism to ensure browser session consistency
Support two operating modes: local mode (independent Puppeteer instance), remote mode (connect to already started Chrome browser) -具备 elegant error handling capability, automatically cleaning up sessions and outputting detailed error information
Support multiple screenshot formats and quality configurations to optimize visual output effects -实现 full-link tracking of interaction status through position identification and operation history records

Browser Operation Modes

This tool provides two independent operation modes to adapt to different automation scenario requirements:

Local Browser Mode (Default)

Automatically download and manage local Chromium instances through Puppeteer
Create a brand new browser runtime environment for each startup operation
Cannot access local existing user configurations, cookies, and browser extensions -实现 consistent and predictable behavior in a sandboxed environment
Automatically and completely close browser processes when the session is terminated

Remote Browser Mode

Connect to Chrome/Chromium browser instances with remote debugging enabled
Can access existing browser runtime states, cookies, and installed extensions
Reuse existing browser processes to optimize startup speed
Support connection to browser instances in Docker containers or on remote servers
Only disconnect when the session is terminated, without closing the target browser
Depend on Chrome browser with remote debugging port enabled (default port: 9222)

Function Limitations

During active browser sessions, only the browser_action tool can be called to perform operations
Browser coordinates are viewport-relative coordinates, not page absolute coordinates
Click operations only support positioning to visible elements within the viewport
Need to explicitly close the current browser session before switching to other tools
Browser window size supports custom configuration (default size: 900x600)
Does not support direct interaction with browser developer tools (DevTools)
Browser sessions are temporary and cannot be persisted after VJSP restart
Only compatible with Chrome/Chromium browsers, not yet supporting Firefox or Safari
Local mode cannot access existing cookies; remote mode depends on Chrome browser with debugging enabled

Execution Principle

When the browser_action tool is called, it executes operations according to the following process:

Operation Validation and Browser Management
- Validate the legitimacy of required parameters for the requested operation
- Execute launch operation: Initialize browser session (local Puppeteer instance/remote Chrome connection)
- Execute interactive operations: Reuse established browser sessions
- Execute close operation: Terminate or disconnect browser connection according to the corresponding mode
Page Interaction and Stability Assurance
- Implement DOM stability detection based on the waitTillHTMLStable algorithm to ensure complete page loading
- Execute requested operations such as navigation, clicking, input, scrolling in a reasonable sequence
- Monitor network activity after clicking, wait for page navigation to complete if necessary
Visual Feedback Generation
- Preferentially use WebP format to capture optimized screenshots (PNG format as a fallback solution)
- Record browser console logs to provide a basis for problem debugging
- Track mouse position and maintain paginated operation history records
Session Lifecycle Management
- Maintain browser runtime state across multiple operations
- Capture errors during execution and automatically clean up system resources -强制约束 compliance workflow execution sequence: start → interaction → close

Workflow Execution Sequence

Browser automation interaction must strictly follow the following execution sequence:

Session Initialization: All browser automation workflows must start with a launch operation
Interaction Execution Phase: Can continuously execute multiple click, type, and scrolling operations
Session Termination: All browser automation workflows must end with a close operation
Tool Switching: After closing the browser session, other tools can be called to perform operations

Typical Application Examples

Web form submission process: VJSP launches a browser and navigates to the form page, fills in each field through type operation, and performs a click operation to submit the form
Responsive website testing: VJSP navigates to the target site and checks the adaptation effect of different page areas through scrolling operations
Web application screenshot capture: VJSP navigates to different pages of the application and performs screenshot operations at each node to complete retention
E-commerce checkout process demonstration: VJSP simulates the full process from product selection to payment confirmation

Usage Examples

Launch a browser and navigate to the specified website:

<browser_action>
<action>launch</action>
<url>https://example.com</url>
</browser_action>

Perform a click operation at the specified coordinates (e.g., click a page button):

<browser_action>
<action>click</action>
<coordinate>450,300</coordinate>
</browser_action>

Enter text into a focused input box:

<browser_action>
<action>type</action>
<text>Hello, World!</text>
</browser_action>

Scroll down the page to view more content:

<browser_action>
<action>scroll_down</action>
</browser_action>

Close the current browser session:

<browser_action>
<action>close</action>
</browser_action>

browser_action ​

Parameter Description ​

Functionality Positioning ​

Applicable Scenarios ​

Core Features ​

Browser Operation Modes ​

Local Browser Mode (Default) ​

Remote Browser Mode ​

Function Limitations ​

Execution Principle ​

Workflow Execution Sequence ​

Typical Application Examples ​

Usage Examples ​