Browser API
The Browser API provides comprehensive web automation capabilities, allowing you to control a browser programmatically. You can navigate to websites, interact with page elements, extract content, and capture screenshots.
Overview
The browser module enables you to:
- Navigate: Open new pages and navigate to URLs
- Extract Content: Get HTML, text, markdown, and other page content
- Interact: Click elements, type text, scroll pages
- Capture: Take screenshots and snapshots
- Monitor: Get browser information and page statistics
Quick Start Example
// Initialize browser session
await codebolt.waitForConnection();
// Create a new page
await codebolt.browser.newPage();
// Navigate to a website
await codebolt.browser.goToPage('https://example.com');
// Extract page content
const content = await codebolt.browser.getContent();
console.log('Page content:', content);
// Take a screenshot
const screenshot = await codebolt.browser.screenshot();
console.log('Screenshot captured');
// Close the browser
codebolt.browser.close();
Response Structure
All browser API functions return responses with a consistent structure:
{
event: 'browserActionResponse',
eventId: 'actionName_timestamp',
payload: {
content: 'response data',
viewport: { width: 767, height: 577 },
currentUrl: 'https://current-page-url.com'
},
type: 'specificResponseType'
}
- newPage - Creates a new browser page or tab for web automation.
- getUrl - Gets the current URL of the active browser page.
- goToPage - Navigates the browser to a specific URL.
- screenshot - Captures a screenshot of the current page as base64 image data.
- getHTML - Retrieves the complete HTML source code of the current page.
- getMarkdown - Converts the current page content to Markdown format.
- getContent - Extracts the visible text content from the current page.
- extractText - Extracts clean, formatted text from the current page.
- getSnapShot - Takes a visual snapshot of the current page (similar to screenshot).
- getBrowserInfo - Gets detailed browser information including viewport, performance, and page statistics.
- scroll - Scrolls the page in a specified direction by a given number of pixels.
- type - Types text into a specific input element on the page.
- click - Clicks on a specific element using its element ID.
- enter - Simulates pressing the Enter key on the current page.
- search - Performs a search by typing a query into a search input element.
- close - Closes the current browser page or tab.
- getPDF - Retrieves PDF content from the current page.
- pdfToText - Converts PDF content on the current page to readable text.