Skip to main content

Browser API

The Browser API provides comprehensive web automation capabilities, allowing you to control a browser programmatically. You can navigate to websites, interact with page elements, extract content, and capture screenshots.

Overview

The browser module enables you to:

  • Navigate: Open new pages and navigate to URLs
  • Extract Content: Get HTML, text, markdown, and other page content
  • Interact: Click elements, type text, scroll pages
  • Capture: Take screenshots and snapshots
  • Monitor: Get browser information and page statistics

Quick Start Example

// Initialize browser session
await codebolt.waitForConnection();

// Create a new page
await codebolt.browser.newPage();

// Navigate to a website
await codebolt.browser.goToPage('https://example.com');

// Extract page content
const content = await codebolt.browser.getContent();
console.log('Page content:', content);

// Take a screenshot
const screenshot = await codebolt.browser.screenshot();
console.log('Screenshot captured');

// Close the browser
codebolt.browser.close();

Response Structure

All browser API functions return responses with a consistent structure:

{
event: 'browserActionResponse',
eventId: 'actionName_timestamp',
payload: {
content: 'response data',
viewport: { width: 767, height: 577 },
currentUrl: 'https://current-page-url.com'
},
type: 'specificResponseType'
}
  • newPage - Creates a new browser page or tab for web automation.
  • getUrl - Gets the current URL of the active browser page.
  • goToPage - Navigates the browser to a specific URL.
  • screenshot - Captures a screenshot of the current page as base64 image data.
  • getHTML - Retrieves the complete HTML source code of the current page.
  • getMarkdown - Converts the current page content to Markdown format.
  • getContent - Extracts the visible text content from the current page.
  • extractText - Extracts clean, formatted text from the current page.
  • getSnapShot - Takes a visual snapshot of the current page (similar to screenshot).
  • getBrowserInfo - Gets detailed browser information including viewport, performance, and page statistics.
  • scroll - Scrolls the page in a specified direction by a given number of pixels.
  • type - Types text into a specific input element on the page.
  • click - Clicks on a specific element using its element ID.
  • enter - Simulates pressing the Enter key on the current page.
  • search - Performs a search by typing a query into a search input element.
  • close - Closes the current browser page or tab.
  • getPDF - Retrieves PDF content from the current page.
  • pdfToText - Converts PDF content on the current page to readable text.