Browser API

The Browser API provides comprehensive web automation capabilities, allowing you to control a browser programmatically. You can navigate to websites, interact with page elements, extract content, and capture screenshots.

Overview

The browser module enables you to:

Navigate: Open new pages and navigate to URLs
Extract Content: Get HTML, text, markdown, and other page content
Interact: Click elements, type text, scroll pages
Capture: Take screenshots and snapshots
Monitor: Get browser information and page statistics

Quick Start Example

// Initialize browser session
await codebolt.waitForConnection();

// Create a new page
await codebolt.browser.newPage();

// Navigate to a website
await codebolt.browser.goToPage('https://example.com');

// Extract page content
const content = await codebolt.browser.getContent();
console.log('Page content:', content);

// Take a screenshot
const screenshot = await codebolt.browser.screenshot();
console.log('Screenshot captured');

// Close the browser
codebolt.browser.close();

Response Structure

All browser API functions return responses with a consistent structure:

{
  event: 'browserActionResponse',
  eventId: 'actionName_timestamp',
  payload: {
    content: 'response data',
    viewport: { width: 767, height: 577 },
    currentUrl: 'https://current-page-url.com'
  },
  type: 'specificResponseType'
}

newPage - Creates a new browser page or tab for web automation.
getUrl - Gets the current URL of the active browser page.
goToPage - Navigates the browser to a specific URL.
screenshot - Captures a screenshot of the current page as base64 image data.
getHTML - Retrieves the complete HTML source code of the current page.
getMarkdown - Converts the current page content to Markdown format.
getContent - Extracts the visible text content from the current page.
extractText - Extracts clean, formatted text from the current page.
getSnapShot - Takes a visual snapshot of the current page (similar to screenshot).
getBrowserInfo - Gets detailed browser information including viewport, performance, and page statistics.
scroll - Scrolls the page in a specified direction by a given number of pixels.
type - Types text into a specific input element on the page.
click - Clicks on a specific element using its element ID.
enter - Simulates pressing the Enter key on the current page.
search - Performs a search by typing a query into a search input element.
close - Closes the current browser page or tab.
getPDF - Retrieves PDF content from the current page.
pdfToText - Converts PDF content on the current page to readable text.

Overview​

Quick Start Example​

Response Structure​

Overview

Quick Start Example

Response Structure