browser_action
The browser_action tool enables web automation and interaction capabilities using Puppeteer-controlled browsers. It supports VJSP in launching browsers, navigating to target websites, clicking page elements, entering text, scrolling pages, and providing visual feedback through screenshots.
Parameter Description
This tool accepts the following parameters with constraints and value rules:
action(required): The type of operation to perform, with valid values:launch: Start a new browser session and navigate to the specified URLclick: Perform a click operation at the specified X, Y coordinates on the pagetype: Perform keyboard text input operationscroll_down: Scroll down by one page heightscroll_up: Scroll up by one page heightclose: Terminate the current browser session
url(optional): When using thelaunchaction, specify the target URL for browser navigationcoordinate(optional): When using theclickaction, specify the X, Y coordinates for clicking (example format: "450,300")text(optional): When using thetypeaction, specify the text content to be entered
Functionality Positioning
This tool creates an automated browser session controlled by VJSP, enabling website navigation, page element interaction, and execution of all tasks requiring browser automation support. Each operation generates a screenshot of the current page state, supporting visual validation of the process.
Applicable Scenarios
- When VJSP needs to interact with web applications or websites
- When conducting automated user interface (UI) or web functionality testing
- When capturing web page screenshots for retention or analysis
- When visually demonstrating web business processes
Core Features
- Generate screenshots and capture console logs after each operation, providing visual feedback and debugging basis
- Support full-process automation from browser startup, page interaction to session closure
- Support coordinate-based precise interaction, keyboard input, and page scrolling operations
- Built-in intelligent page loading detection mechanism to ensure browser session consistency
- Support two operating modes: local mode (independent Puppeteer instance), remote mode (connect to already started Chrome browser) -具备 elegant error handling capability, automatically cleaning up sessions and outputting detailed error information
- Support multiple screenshot formats and quality configurations to optimize visual output effects -实现 full-link tracking of interaction status through position identification and operation history records
Browser Operation Modes
This tool provides two independent operation modes to adapt to different automation scenario requirements:
Local Browser Mode (Default)
- Automatically download and manage local Chromium instances through Puppeteer
- Create a brand new browser runtime environment for each startup operation
- Cannot access local existing user configurations, cookies, and browser extensions -实现 consistent and predictable behavior in a sandboxed environment
- Automatically and completely close browser processes when the session is terminated
Remote Browser Mode
- Connect to Chrome/Chromium browser instances with remote debugging enabled
- Can access existing browser runtime states, cookies, and installed extensions
- Reuse existing browser processes to optimize startup speed
- Support connection to browser instances in Docker containers or on remote servers
- Only disconnect when the session is terminated, without closing the target browser
- Depend on Chrome browser with remote debugging port enabled (default port: 9222)
Function Limitations
- During active browser sessions, only the
browser_actiontool can be called to perform operations - Browser coordinates are viewport-relative coordinates, not page absolute coordinates
- Click operations only support positioning to visible elements within the viewport
- Need to explicitly close the current browser session before switching to other tools
- Browser window size supports custom configuration (default size: 900x600)
- Does not support direct interaction with browser developer tools (DevTools)
- Browser sessions are temporary and cannot be persisted after VJSP restart
- Only compatible with Chrome/Chromium browsers, not yet supporting Firefox or Safari
- Local mode cannot access existing cookies; remote mode depends on Chrome browser with debugging enabled
Execution Principle
When the browser_action tool is called, it executes operations according to the following process:
Operation Validation and Browser Management
- Validate the legitimacy of required parameters for the requested operation
- Execute
launchoperation: Initialize browser session (local Puppeteer instance/remote Chrome connection) - Execute interactive operations: Reuse established browser sessions
- Execute
closeoperation: Terminate or disconnect browser connection according to the corresponding mode
Page Interaction and Stability Assurance
- Implement DOM stability detection based on the
waitTillHTMLStablealgorithm to ensure complete page loading - Execute requested operations such as navigation, clicking, input, scrolling in a reasonable sequence
- Monitor network activity after clicking, wait for page navigation to complete if necessary
- Implement DOM stability detection based on the
Visual Feedback Generation
- Preferentially use WebP format to capture optimized screenshots (PNG format as a fallback solution)
- Record browser console logs to provide a basis for problem debugging
- Track mouse position and maintain paginated operation history records
Session Lifecycle Management
- Maintain browser runtime state across multiple operations
- Capture errors during execution and automatically clean up system resources -强制约束 compliance workflow execution sequence: start → interaction → close
Workflow Execution Sequence
Browser automation interaction must strictly follow the following execution sequence:
- Session Initialization: All browser automation workflows must start with a
launchoperation - Interaction Execution Phase: Can continuously execute multiple
click,type, and scrolling operations - Session Termination: All browser automation workflows must end with a
closeoperation - Tool Switching: After closing the browser session, other tools can be called to perform operations
Typical Application Examples
- Web form submission process: VJSP launches a browser and navigates to the form page, fills in each field through
typeoperation, and performs a click operation to submit the form - Responsive website testing: VJSP navigates to the target site and checks the adaptation effect of different page areas through scrolling operations
- Web application screenshot capture: VJSP navigates to different pages of the application and performs screenshot operations at each node to complete retention
- E-commerce checkout process demonstration: VJSP simulates the full process from product selection to payment confirmation
Usage Examples
Launch a browser and navigate to the specified website:
<browser_action>
<action>launch</action>
<url>https://example.com</url>
</browser_action>Perform a click operation at the specified coordinates (e.g., click a page button):
<browser_action>
<action>click</action>
<coordinate>450,300</coordinate>
</browser_action>Enter text into a focused input box:
<browser_action>
<action>type</action>
<text>Hello, World!</text>
</browser_action>Scroll down the page to view more content:
<browser_action>
<action>scroll_down</action>
</browser_action>Close the current browser session:
<browser_action>
<action>close</action>
</browser_action>