inference

codebolt.llm.inference(message: LLMInferenceParams, message.messages: Message[], message.tools: Tool[], message.tool_choice: string | object, message.llmrole: string, message.max_tokens: number, message.temperature: number, message.stream: boolean): Promise<LLMResponse>

Sends an inference request to the LLM using OpenAI message format with tools support. The model is selected based on the provided llmrole parameter. If the specific model for the role is not found, it falls back to the default model for the current agent, and ultimately to the default application-wide LLM if necessary.

Parameters

Name	Type	Description
message	LLMInferenceParams	The inference parameters object containing messages, tools, and configuration options.
message.messages	Message[]	Array of conversation messages with roles ('user', 'assistant', 'tool', 'system') and content.
message.tools	Tool[]	Optional: Available tools for the model to use. Each tool has a type and function definition.
message.tool_choice	string \| object	Optional: How the model should use tools. Can be 'auto', 'none', 'required', or an object specifying a specific function.
message.llmrole	string	The LLM role to determine which model to use for processing the request.
message.max_tokens	number	Optional: Maximum number of tokens to generate in the response.
message.temperature	number	Optional: Temperature for response generation (0.0 to 2.0). Higher values make output more random.
message.stream	boolean	Optional: Whether to stream the response. Defaults to false.

Response Structure

The method returns a Promise that resolves to a LLMResponse object with the following properties:

type (string): Always "llmResponse".
content (string): The actual response content from the LLM.
role (string): Always "assistant" for LLM responses.
model (string, optional): The specific model used for the inference.
usage (object, optional): Token usage information including:
- prompt_tokens (number): Number of tokens in the prompt.
- completion_tokens (number): Number of tokens in the completion.
- total_tokens (number): Total tokens used (prompt + completion).
finish_reason (string, optional): Reason why the model stopped generating (e.g., "stop", "length", "tool_calls").
choices (array, optional): Array of completion choices with message and finish_reason.
success (boolean, optional): Indicates if the operation was successful.
message (string, optional): Additional information about the response.
error (string, optional): Error details if the operation failed.
messageId (string, optional): Unique identifier for the message.
threadId (string, optional): Thread identifier for conversation context.

Examples

// Example 1: Basic inference with simple message
const response = await codebolt.llm.inference({
  messages: [
    {
      role: 'user',
      content: 'Hello! This is a test message. Please respond with a simple greeting.'
    }
  ],
  llmrole: 'assistant'
});
console.log("Response:", response.content);

// Example 2: Multi-turn conversation
const conversationResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'system',
      content: 'You are a helpful coding assistant.'
    },
    {
      role: 'user',
      content: 'Write a simple JavaScript function that adds two numbers.'
    }
  ],
  llmrole: 'assistant',
  max_tokens: 500,
  temperature: 0.7
});
console.log("Code Response:", conversationResponse.content);

// Example 3: Using tools with the LLM
const toolResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'user',
      content: 'What is the weather like today?'
    }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get the current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: {
              type: 'string',
              description: 'The city and state, e.g. San Francisco, CA'
            }
          },
          required: ['location']
        }
      }
    }
  ],
  tool_choice: 'auto',
  llmrole: 'assistant'
});
console.log("Tool Response:", toolResponse);

// Example 4: Forcing tool usage
const forcedToolResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'user',
      content: 'Calculate the sum of 25 and 17'
    }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'calculate',
        description: 'Perform mathematical calculations',
        parameters: {
          type: 'object',
          properties: {
            operation: { type: 'string' },
            numbers: { type: 'array', items: { type: 'number' } }
          },
          required: ['operation', 'numbers']
        }
      }
    }
  ],
  tool_choice: 'required',
  llmrole: 'assistant'
});
console.log("Forced Tool Response:", forcedToolResponse);

// Example 5: Handling tool call responses
const toolCallResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'user',
      content: 'What files are in the current directory?'
    },
    {
      role: 'assistant',
      content: null,
      tool_calls: [
        {
          id: 'call_123',
          type: 'function',
          function: {
            name: 'list_files',
            arguments: '{"path": "."}'
          }
        }
      ]
    },
    {
      role: 'tool',
      tool_call_id: 'call_123',
      content: 'file1.txt, file2.js, folder1/'
    }
  ],
  llmrole: 'assistant'
});
console.log("Tool Call Response:", toolCallResponse.content);

// Example 6: Streaming response (if supported)
const streamResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'user',
      content: 'Write a detailed explanation of machine learning.'
    }
  ],
  llmrole: 'assistant',
  stream: true,
  max_tokens: 1000
});
console.log("Stream Response:", streamResponse);

// Example 7: Error handling
try {
  const errorResponse = await codebolt.llm.inference({
    messages: [
      {
        role: 'user',
        content: 'Test message'
      }
    ],
    llmrole: 'invalid_role'
  });
  console.log("Response:", errorResponse);
} catch (error) {
  console.error("Error:", error);
}

// Example 8: Complex conversation with system prompt
const complexResponse = await codebolt.llm.inference({
  messages: [
    {
      role: 'system',
      content: 'You are an expert software architect. Provide detailed technical explanations.'
    },
    {
      role: 'user',
      content: 'Explain the differences between microservices and monolithic architecture.'
    }
  ],
  llmrole: 'assistant',
  temperature: 0.3,
  max_tokens: 800
});
console.log("Complex Response:", complexResponse.content);

Common Use Cases

Conversational AI: Build chatbots and interactive assistants
Code Generation: Generate, review, and explain code
Content Creation: Write articles, documentation, and creative content
Tool Integration: Use LLMs with external tools and APIs
Data Analysis: Analyze and interpret data with AI assistance
Problem Solving: Get help with complex technical problems

Notes

The messages array maintains conversation history and context
llmrole determines which model variant to use for the request
Tool integration allows LLMs to perform actions and access external data
temperature controls response randomness (0.0 = deterministic, 2.0 = very random)
max_tokens limits response length to manage costs and performance
The response includes detailed usage information for monitoring and billing
Error handling is important as LLM requests can fail due to various reasons
System messages help define the AI's behavior and personality

Parameters

Response Structure​

Examples​

Common Use Cases​

Notes​

Response Structure

Examples

Common Use Cases

Notes