Generate a Chat Completion

Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using “stream”: false. The final response object will include statistics and additional data from the request.

Endpoint

Parameters

Parameter

Type

Description

Required

model

string

The model name

Yes

messages

array

The messages of the chat, used to keep chat memory

Yes

tools

array

List of tools in JSON for the model to use if supported

No

Message Object

Field

Type

Description

Required

role

string

The role of the message: “system”, “user”, “assistant”, or “tool”

Yes

content

string

The content of the message

Yes

images

array

A list of images to include in the message (for multimodal models)

No

tool_calls

array

A list of tools in JSON that the model wants to use

No

Advanced Parameters (Optional)

Parameter

Type

Description

format

string/object

The format to return a response in. Can be “json” or a JSON schema

options

object

Additional model parameters (temperature, etc.)

stream

boolean

If false, the response will be returned as a single object rather than a stream

keep_alive

string

Controls how long the model will stay loaded into memory (default: 5m)

Response

The API returns a stream of JSON objects by default. Each object contains:

Field

Type

Description

model

string

The model name

created_at

string

The timestamp when the response was created

message

object

Contains role, content, and other message data

done

boolean

Whether the generation is complete

The final response in the stream includes additional statistics:

Field

Type

Description

total_duration

number

Time spent generating the response (nanoseconds)

load_duration

number

Time spent loading the model (nanoseconds)

prompt_eval_count

number

Number of tokens in the prompt

prompt_eval_duration

number

Time spent evaluating the prompt (nanoseconds)

eval_count

number

Number of tokens in the response

eval_duration

number

Time spent generating the response (nanoseconds)

Examples

Chat Request (Streaming)

Request

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    },
    {
      "role": "assistant",
      "content": "due to rayleigh scattering."
    },
    {
      "role": "user",
      "content": "how is that different than mie scattering?"
    }
  ]
}'

Response

{
  "model": "llama3.2",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "message": {
    "role": "assistant",
    "content": "The",
    "images": null
  },
  "done": false
}
{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "done": true,
  "total_duration": 4883583458,
  "load_duration": 1334875,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 342546000,
  "eval_count": 282,
  "eval_duration": 4535599000
}