Local (on-device)

What you can do with Local (on-device) through the harness, by feature family: the capability rows you can rely on (linked to vendor docs) and the parameters you can set for each. Configure parameters through the workflow surfaces described in the Workflow Schema — model/turn/budget fields on agent, provider knobs under harness_config.sdk_settings.local, and tool/sandbox policy at top level.

Core generation

Capabilities (4/4 rows usable):

local.llamacpp.openai_compat (llm.complete) — vendor docs
local.llamacpp.server (llm.complete) — vendor docs
local.ollama.generate (llm.complete) — vendor docs
local.ollama.openai_compat (llm.complete) — vendor docs

Parameters (7):

Parameter	Type	Default	Allowed	Risk	Notes
`base_url`	string	`"http://127.0.0.1:8080/v1"`	—	medium	docs
`model`	string	`"gemma-4-e2b-it"`	—	low	docs
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	docs
`base_url`	string	`"http://127.0.0.1:11434"`	—	medium	docs
`model`	string	—	—	low	Discovered via local.ollama.tags.
`options.num_ctx`	number	`2048`	—	low	Context length window — model-dependent maximum.
`base_url`	string	`"http://127.0.0.1:11434/v1"`	—	medium	docs

llama-cpp-python

Capabilities (13/13 rows usable):

local.llamacpp.python.create_chat_completion (llm.chat) — vendor docs
local.llamacpp.python.create_completion (llm.complete) — vendor docs
local.llamacpp.python.create_embedding (llm.embed) — vendor docs
local.llamacpp.python.eval_sample_generate (llm.streaming) — vendor docs
local.llamacpp.python.from_pretrained (provider.models) — vendor docs
local.llamacpp.python.llama_cache_state (provider.slots) — vendor docs
local.llamacpp.python.llama_class (provider.models) — vendor docs
local.llamacpp.python.llama_grammar (llm.structured_output) — vendor docs
local.llamacpp.python.logits_processor (llm.sampling) — vendor docs
local.llamacpp.python.save_load_state (provider.slots) — vendor docs
local.llamacpp.python.server_module (provider.lifecycle) — vendor docs
local.llamacpp.python.stopping_criteria (llm.sampling) — vendor docs
local.llamacpp.python.tokenize_detokenize (llm.tokenize) — vendor docs

Parameters (13):

Parameter	Type	Default	Allowed	Risk	Notes
`create_chat_completion.messages`	array	—	—	low	docs
`create_completion.prompt`	string	—	—	low	docs
`create_embedding.input`	string	—	—	low	Requires an embedding GGUF (embedding=True).
`eval.tokens`	array	—	—	low	docs
`from_pretrained.repo_id`	string	—	—	medium	Live-probed with ggml-org/models tinyllamas/stories260K.gguf.
`set_cache`	object	—	—	low	docs
`Llama.model_path`	string	—	—	low	docs
`create_completion.grammar`	object	—	—	low	GBNF; live-probed constraining output to yes\|no.
`create_completion.logits_processor`	array	—	—	medium	docs
`save_state`	object	—	—	low	docs
`args.port`	number	—	—	medium	python -m llama_cpp.server; OpenAI-compatible completions live-probed.
`create_completion.stopping_criteria`	array	—	—	low	docs
`tokenize.text`	string	—	—	low	docs

llama.cpp CLI

Capabilities (4/4 rows usable):

local.llamacpp.bench (eval.benchmark) — vendor docs
local.llamacpp.cli (provider.cli) — vendor docs
local.llamacpp.model_acquisition_cache (provider.models) — vendor docs
local.llamacpp.perplexity (eval.perplexity) — vendor docs

Parameters (4):

Parameter	Type	Default	Allowed	Risk	Notes
`args.output_format`	string	`"json"`	—	low	docs
`args.prompt`	string	—	—	low	docs
`args.hf_repo`	string	—	—	medium	-hf fetch + —cache-list; live-probed with ggml-org/SmolVLM-256M-Instruct-GGUF.
`args.file`	string	—	—	low	docs

llama.cpp CLI Tools

Capabilities (15/20 rows usable):

local.llamacpp.cli.batched (eval.benchmark) — vendor docs
local.llamacpp.cli.batched_bench (eval.benchmark) — vendor docs
local.llamacpp.cli.embedding (llm.embed) — vendor docs
local.llamacpp.cli.gguf_inspect (provider.diagnostics) — vendor docs
local.llamacpp.cli.gguf_split (file.split_merge) — vendor docs
local.llamacpp.cli.imatrix (provider.quantization) — vendor docs
local.llamacpp.cli.lookahead (llm.speculative_decoding) — vendor docs
local.llamacpp.cli.mtmd_cli (media.multimodal) — vendor docs
local.llamacpp.cli.parallel (eval.benchmark) — vendor docs
local.llamacpp.cli.passkey (eval.benchmark) — vendor docs
local.llamacpp.cli.quantize (provider.quantization) — vendor docs
local.llamacpp.cli.retrieval (llm.rag) — vendor docs
local.llamacpp.cli.simple (llm.complete) — vendor docs
local.llamacpp.cli.simple_chat (llm.chat) — vendor docs
local.llamacpp.cli.tokenize (llm.tokenize) — vendor docs

Parameters (15):

Parameter	Type	Default	Allowed	Risk	Notes
`args.npl`	string	—	—	low	docs
`args.n_parallel`	number	—	—	low	docs
`args.embd_output_format`	string	`"json"`	—	low	docs
`args.mode`	string	`"r"`	—	low	docs
`args.split_max_size`	string	`"50M"`	—	low	Split + merge round trip live-probed.
`args.train_file`	string	—	—	low	docs
`args.n_predict`	number	—	—	low	docs
`args.image`	string	—	—	low	Vision CLI with -hf SmolVLM; live image description.
`args.n_sequences`	number	—	—	low	docs
`args.junk`	number	—	—	low	docs
`args.ftype`	string	—	—	medium	docs
`args.top_k`	number	—	—	low	docs
`args.ctx_size`	number	—	—	low	docs
`args.n_predict`	number	—	—	low	docs
`args.prompt`	string	—	—	low	docs

llama.cpp Evaluation

Capabilities (1/1 rows usable):

local.llamacpp.eval.choice_logit_tasks (eval.benchmark) · model-dependent — vendor docs

llama.cpp Quant Types

Capabilities (7/7 rows usable):

local.llamacpp.quant.bf16_f16_f32 (provider.quantization) — vendor docs
local.llamacpp.quant.iq_extreme (provider.quantization) — vendor docs
local.llamacpp.quant.iq2 (provider.quantization) — vendor docs
local.llamacpp.quant.iq3 (provider.quantization) — vendor docs
local.llamacpp.quant.iq4 (provider.quantization) — vendor docs
local.llamacpp.quant.k_quants (provider.quantization) — vendor docs
local.llamacpp.quant.legacy_q (provider.quantization) — vendor docs

Parameters (7):

Parameter	Type	Default	Allowed	Risk	Notes
`args.ftype`	string	`"f32"`	—	low	docs
`args.ftype`	string	`"iq1_s"`	—	medium	Requires —imatrix; live-probed with a probe-built imatrix.
`args.ftype`	string	`"iq2_xs"`	—	medium	Requires —imatrix; live-probed with a probe-built imatrix.
`args.ftype`	string	`"iq3_xxs"`	—	medium	Requires —imatrix; live-probed with a probe-built imatrix.
`args.ftype`	string	`"iq4_xs"`	—	low	docs
`args.ftype`	string	`"q4_K_M"`	—	low	docs
`args.ftype`	string	`"q4_0"`	—	low	docs

llama.cpp Runtime

Capabilities (4/7 rows usable):

local.llamacpp.runtime.cpu_memory (provider.runtime) — vendor docs
local.llamacpp.runtime.kv_cache_context (provider.context) — vendor docs
local.llamacpp.runtime.threads_batch (provider.runtime) — vendor docs
local.llamacpp.sampling_controls (llm.sampling) — vendor docs

Parameters (4):

Parameter	Type	Default	Allowed	Risk	Notes
`args.threads`	number	—	—	low	-t/—no-mmap/—mlock live boot + completion.
`args.ctx_size`	number	`2048`	—	low	-c with -ctk/-ctv q8_0; props echoes n_ctx.
`args.threads_batch`	number	—	—	low	system_info echoes n_threads_batch.
`sampling`	object	—	—	low	top_k/top_p/min_p/temperature/penalties echoed in generation_settings.

llama.cpp Server Anthropic Format

Capabilities (2/2 rows usable):

local.llamacpp.server.anthropic_count_tokens (anthropic.messages.count_tokens) — vendor docs
local.llamacpp.server.anthropic_messages (anthropic.messages) — vendor docs

Parameters (2):

Parameter	Type	Default	Allowed	Risk	Notes
`messages`	array	—	—	low	docs
`max_tokens`	number	`1024`	—	low	docs

llama.cpp Server Native

Capabilities (21/26 rows usable):

local.llamacpp.server.apply_template (llm.chat_template) — vendor docs
local.llamacpp.server.auth_tls (provider.auth) — vendor docs
local.llamacpp.server.completion_native (llm.completion) — vendor docs
local.llamacpp.server.detokenize (llm.tokenize) — vendor docs
local.llamacpp.server.embeddings_native (llm.embedding) — vendor docs
local.llamacpp.server.gpu_backend (provider.gpu_offload) — vendor docs
local.llamacpp.server.grammar (llm.structured_output) — vendor docs
local.llamacpp.server.health (provider.health) — vendor docs
local.llamacpp.server.infill (llm.fim) · model-dependent — vendor docs
local.llamacpp.server.lora_adapters (tuning.lora_runtime) · model-dependent — vendor docs
local.llamacpp.server.metrics (provider.metrics) — vendor docs
local.llamacpp.server.parallel_batching (provider.parallel_decoding) — vendor docs
local.llamacpp.server.props (provider.runtime_config) — vendor docs
local.llamacpp.server.props_post (provider.runtime_config) — vendor docs
local.llamacpp.server.reasoning (llm.reasoning) · model-dependent — vendor docs
local.llamacpp.server.reranking (llm.rerank) — vendor docs
local.llamacpp.server.slots (provider.slots) — vendor docs
local.llamacpp.server.slots_save_restore (provider.slots) — vendor docs
local.llamacpp.server.speculative (llm.speculative_decoding) — vendor docs
local.llamacpp.server.tokenize (llm.tokenize) — vendor docs
local.llamacpp.server.webui (docs.webui) — vendor docs

Parameters (20):

Parameter	Type	Default	Allowed	Risk	Notes
`messages`	array	—	—	low	docs
`args.api_key`	string	—	—	high	401 keyless / 200 bearer; HTTPS via —ssl-key-file/—ssl-cert-file live-probed.
`base_url`	string	`"http://127.0.0.1:8080"`	—	medium	docs
`n_predict`	number	`128`	—	low	docs
`tokens`	array	—	—	low	docs
`content`	string	—	—	low	Requires llama-server —embeddings.
`args.n_gpu_layers`	number	—	—	low	Metal backend on Apple Silicon; -ngl 99 live boot + completion.
`json_schema`	object	—	—	low	GBNF —grammar twin; schema-constrained decoding live-probed.
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	docs
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	Requires llama-server —metrics.
`messages.image_url`	object	—	—	low	Vision via mmproj; needs explicit —mmproj wiring on this build.
`args.n_parallel`	number	`1`	—	low	props total_slots reflects -np; concurrent completions live-probed.
`body`	object	—	—	medium	Mutates global server properties; requires llama-server —props.
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	docs
`query`	string	—	—	low	Requires llama-server —reranking with an embedding/rerank model.
`filename`	string	—	—	medium	Requires llama-server —slot-save-path; writes slot KV cache to disk.
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	Requires llama-server —slots.
`args.model_draft`	string	—	—	medium	timings.draft_n present in completions.
`content`	string	—	—	low	docs
`base_url`	string	`"http://127.0.0.1:8080"`	—	low	docs

llama.cpp Server OpenAI Format

Capabilities (2/2 rows usable):

local.llamacpp.server.responses (llm.responses) — vendor docs
local.llamacpp.server.tools (tool.function_calling) · model-dependent — vendor docs

Parameters (1):

Parameter	Type	Default	Allowed	Risk	Notes
`input`	string	—	—	low	docs

LocalAI Anthropic Format

Capabilities (1/1 rows usable):

local.localai.anthropic_messages (anthropic.messages) — vendor docs

LocalAI Backends

Capabilities (1/23 rows usable):

local.localai.backend.llamacpp (provider.backend) — vendor docs

LocalAI Galleries

Capabilities (2/6 rows usable):

local.localai.gallery_available (provider.gallery) — vendor docs
local.localai.gallery_jobs (provider.gallery) — vendor docs

LocalAI Ollama Format

Capabilities (1/1 rows usable):

local.localai.ollama_compat (llm.complete) — vendor docs

LocalAI OpenAI Format

Capabilities (3/16 rows usable):

local.localai.openai_chat (llm.chat) — vendor docs
local.localai.openai_completions (llm.completion) — vendor docs
local.localai.openai_models (provider.models) — vendor docs

MLX Apple Platform

Capabilities (2/2 rows usable):

local.mlx.platform.lazy_eval (provider.runtime) — vendor docs
local.mlx.platform.macos_unified_memory (provider.gpu_offload) — vendor docs

MLX Distributed

Capabilities (2/2 rows usable):

local.mlx.distributed.launch (provider.distributed_inference) — vendor docs
local.mlx.distributed.primitives (provider.distributed_inference) — vendor docs

MLX Examples

Capabilities (2/12 rows usable):

local.mlx.ex.bert (llm.embed) · model-dependent — vendor docs
local.mlx.ex.t5 (llm.complete) · model-dependent — vendor docs

MLX Fast Kernels

Capabilities (5/5 rows usable):

local.mlx.fast.metal_kernel (provider.runtime) — vendor docs
local.mlx.fast.quantized_matmul (provider.quantization) — vendor docs
local.mlx.fast.rms_norm (provider.runtime) — vendor docs
local.mlx.fast.rope (provider.runtime) — vendor docs
local.mlx.fast.scaled_dot_product_attention (provider.attention_backend) — vendor docs

MLX Optimizers

Capabilities (1/1 rows usable):

local.mlx.optimizers (tuning.training_reference) — vendor docs

MLX Profiling

Capabilities (3/3 rows usable):

local.mlx.metal.cache_limit (provider.gpu_offload) — vendor docs
local.mlx.metal.capture (provider.observability) — vendor docs
local.mlx.utils.tree (provider.runtime) — vendor docs

MLX-LM CLI

Capabilities (10/17 rows usable):

local.mlx.lm.cli_benchmark (eval.benchmark) — vendor docs
local.mlx.lm.cli_cache_prompt (provider.slots) — vendor docs
local.mlx.lm.cli_chat (llm.chat) — vendor docs
local.mlx.lm.cli_convert (provider.models) — vendor docs
local.mlx.lm.cli_fuse (tuning.lora_runtime) — vendor docs
local.mlx.lm.cli_generate (llm.complete) — vendor docs
local.mlx.lm.cli_lora (tuning.lora_runtime) — vendor docs
local.mlx.lm.cli_manage (provider.models) — vendor docs
local.mlx.lm.cli_perplexity (eval.perplexity) · model-dependent — vendor docs
local.mlx.lm.cli_server (llm.chat) — vendor docs

MLX-LM Python SDK

Capabilities (7/9 rows usable):

local.mlx.lm.kv_cache_quantized (provider.kv_cache) — vendor docs
local.mlx.lm.kv_cache_rotating (provider.kv_cache) — vendor docs
local.mlx.lm.py_generate (llm.complete) — vendor docs
local.mlx.lm.py_load (provider.models) — vendor docs
local.mlx.lm.py_prompt_cache (provider.slots) — vendor docs
local.mlx.lm.py_sample_utils (llm.sampling) — vendor docs
local.mlx.lm.py_stream_generate (llm.streaming) — vendor docs

MLX-LM Server

Capabilities (1/1 rows usable):

local.mlx.lm.speculative (llm.speculative_decoding) — vendor docs

Ollama Anthropic Format

Capabilities (1/3 rows usable):

local.ollama.anthropic_messages (anthropic.messages) — vendor docs

Parameters (1):

Parameter	Type	Default	Allowed	Risk	Notes
`max_tokens`	number	`1024`	—	low	docs

Ollama Blobs

Capabilities (2/2 rows usable):

local.ollama.api_blobs_head (file.exists) — vendor docs
local.ollama.api_blobs_post (file.upload) — vendor docs

Parameters (2):

Parameter	Type	Default	Allowed	Risk	Notes
`digest`	string	—	—	low	docs
`digest`	string	—	—	medium	Uploads content-addressed blobs (GGUF/adapter) to the local daemon.

Ollama CLI

Capabilities (6/7 rows usable):

local.ollama.cli_cp (provider.models) — vendor docs
local.ollama.cli_pull_rm_ls (provider.models) — vendor docs
local.ollama.cli_run (llm.chat) — vendor docs
local.ollama.cli_serve (provider.lifecycle) — vendor docs
local.ollama.cli_show (provider.admin.read) — vendor docs
local.ollama.cli_stop (provider.lifecycle) — vendor docs

Parameters (6):

Parameter	Type	Default	Allowed	Risk	Notes
`args.source_dest`	string	—	—	low	docs
`args.model`	string	—	—	medium	pull/rm/ls/create/cp management set.
`args.prompt`	string	—	—	low	docs
`env.OLLAMA_HOST`	string	`"127.0.0.1:11434"`	—	medium	docs
`args.inspect_flag`	string	—	—	low	—modelfile/—parameters/—template/—system/—license.
`args.model`	string	—	—	low	docs

Ollama Environment

Capabilities (13/15 rows usable):

local.ollama.env.context_length (provider.context) — vendor docs
local.ollama.env.debug (provider.observability) — vendor docs
local.ollama.env.flash_attention (provider.attention_backend) — vendor docs
local.ollama.env.gpu_overhead (provider.gpu_offload) — vendor docs
local.ollama.env.host (provider.connectivity) — vendor docs
local.ollama.env.keep_alive (provider.lifecycle) — vendor docs
local.ollama.env.kv_cache_type (provider.kv_cache) — vendor docs
local.ollama.env.max_loaded (provider.lifecycle) — vendor docs
local.ollama.env.max_queue (provider.parallel_decoding) — vendor docs
local.ollama.env.models_dir (file.manage) — vendor docs
local.ollama.env.num_parallel (provider.parallel_decoding) — vendor docs
local.ollama.env.origins (provider.connectivity) — vendor docs
local.ollama.env.sched_spread (provider.gpu_offload) — vendor docs

Parameters (13):

Parameter	Type	Default	Allowed	Risk	Notes
`env.OLLAMA_CONTEXT_LENGTH`	number	—	—	low	Live-probed: /api/ps reports the env-set context_length.
`env.OLLAMA_DEBUG`	boolean	—	—	low	Live-probed: DEBUG-level log lines under OLLAMA_DEBUG=1.
`env.OLLAMA_FLASH_ATTENTION`	boolean	—	—	low	Config echo + live generation under the flag.
`env.OLLAMA_GPU_OVERHEAD`	number	—	—	medium	Applied into scheduler config (startup config echo).
`env.OLLAMA_HOST`	string	`"127.0.0.1:11434"`	—	medium	docs
`env.OLLAMA_KEEP_ALIVE`	string	`"5m"`	—	low	Live-probed: model evicted from /api/ps after expiry.
`env.OLLAMA_KV_CACHE_TYPE`	string	—	—	medium	Config echo + live generation under q8_0.
`env.OLLAMA_MAX_LOADED_MODELS`	number	—	—	low	Applied into scheduler config (startup config echo).
`env.OLLAMA_MAX_QUEUE`	number	—	—	low	Applied into scheduler config (startup config echo).
`env.OLLAMA_MODELS`	string	—	—	medium	docs
`env.OLLAMA_NUM_PARALLEL`	number	—	—	low	Applied into scheduler config (startup config echo).
`env.OLLAMA_ORIGINS`	string	—	—	medium	CORS allowlist; live-probed 403 deny / 200 allow.
`env.OLLAMA_SCHED_SPREAD`	boolean	—	—	low	Applied into scheduler config (startup config echo).

Ollama Generate

Capabilities (6/6 rows usable):

local.ollama.chat (llm.chat) — vendor docs
local.ollama.generate_context (llm.state.continue) — vendor docs
local.ollama.generate_keep_alive (provider.lifecycle) — vendor docs
local.ollama.generate_options_full (llm.sampling) — vendor docs
local.ollama.generate_raw (llm.complete) — vendor docs
local.ollama.generate_suffix (llm.fim) · model-dependent — vendor docs

Parameters (5):

Parameter	Type	Default	Allowed	Risk	Notes
`messages`	array	—	—	low	docs
`context`	array	—	—	low	docs
`keep_alive`	string	`"5m"`	—	low	docs
`options`	object	—	—	low	Full sampler dict: temperature/top_p/top_k/repeat_penalty/seed/num_ctx/num_predict (live-probed).
`raw`	boolean	`false`	—	medium	Bypasses the model prompt template.

Ollama Model Management

Capabilities (11/11 rows usable):

local.ollama.api_create_adapters (tuning.lora_runtime) · model-dependent — vendor docs
local.ollama.api_create_quantize (provider.quantization) — vendor docs
local.ollama.api_create_safetensors (provider.models) · model-dependent — vendor docs
local.ollama.api_show (provider.admin.read) — vendor docs
local.ollama.copy (provider.models) — vendor docs
local.ollama.create (provider.models) — vendor docs
local.ollama.delete (provider.models) — vendor docs
local.ollama.ps (provider.lifecycle) — vendor docs
local.ollama.pull (provider.models) — vendor docs
local.ollama.push (provider.models) · model-dependent — vendor docs
local.ollama.tags (provider.models) — vendor docs

Parameters (8):

Parameter	Type	Default	Allowed	Risk	Notes
`quantize`	string	`"q4_K_M"`	—	medium	Requires an F16/F32 source tag; live-probed from smollm:135m-instruct-v0.2-fp16.
`model`	string	—	—	low	docs
`destination`	string	—	—	medium	docs
`from`	string	—	—	medium	Medium risk: writes a new model manifest to the local store.
`model`	string	—	—	high	High risk: destructive — removes a model from the local store.
`base_url`	string	`"http://127.0.0.1:11434"`	—	low	docs
`model`	string	—	—	medium	Medium risk: downloads model layers to local disk.
`base_url`	string	`"http://127.0.0.1:11434"`	—	low	docs

Ollama Modelfile

Capabilities (2/2 rows usable):

local.ollama.context_length (provider.context) — vendor docs
local.ollama.modelfile (provider.modelfile) — vendor docs

Parameters (2):

Parameter	Type	Default	Allowed	Risk	Notes
`options.num_ctx`	number	`4096`	—	low	docs
`modelfile`	object	—	—	medium	from/system/parameters/template create fields; live-probed via /api/create + /api/show.

Ollama OpenAI Format

Capabilities (6/6 rows usable):

local.ollama.openai_chat (llm.chat) — vendor docs
local.ollama.openai_completions (llm.completions) — vendor docs
local.ollama.openai_embeddings (llm.embeddings) — vendor docs
local.ollama.openai_images (media.image_generation) · model-dependent — vendor docs
local.ollama.openai_models (provider.models) — vendor docs
local.ollama.openai_responses (llm.responses) — vendor docs

Parameters (5):

Parameter	Type	Default	Allowed	Risk	Notes
`messages`	array	—	—	low	docs
`prompt`	string	—	—	low	docs
`input`	string	—	—	low	Requires an embedding model (e.g. nomic-embed-text).
`base_url`	string	`"http://127.0.0.1:11434/v1"`	—	low	docs
`input`	string	—	—	low	docs

Ollama Server

Capabilities (1/1 rows usable):

local.ollama.api_version (provider.health) — vendor docs

Parameters (1):

Parameter	Type	Default	Allowed	Risk	Notes
`base_url`	string	`"http://127.0.0.1:11434"`	—	low	docs

Ollama Streaming

Capabilities (1/1 rows usable):

local.ollama.streaming (llm.streaming) — vendor docs

Parameters (1):

Parameter	Type	Default	Allowed	Risk	Notes
`stream`	boolean	`true`	—	low	docs

Ollama Structured Output

Capabilities (1/1 rows usable):

local.ollama.structured_outputs (llm.structured_output) — vendor docs

Parameters (1):

Parameter	Type	Default	Allowed	Risk	Notes
`format`	object	—	—	low	JSON-schema constrained decoding; live-probed with a required-name object schema.

Ollama Thinking

Capabilities (1/1 rows usable):

local.ollama.thinking (llm.thinking) · model-dependent — vendor docs

Ollama Vision

Capabilities (2/2 rows usable):

local.ollama.generate_image_input (media.image_input) · model-dependent — vendor docs
local.ollama.vision (media.image_input) · model-dependent — vendor docs

ollama-js SDK

Capabilities (3/5 rows usable):

local.ollama.js_abort_method (llm.cancel) — vendor docs
local.ollama.js_async_iterator (llm.streaming) — vendor docs
local.ollama.js_client_class (provider.admin.read) — vendor docs

Parameters (3):

Parameter	Type	Default	Allowed	Risk	Notes
`abort`	object	—	—	low	AbortError interrupts in-flight streamed generation (live-probed).
`chat.stream`	boolean	`true`	—	low	docs
`Ollama.host`	string	—	—	low	docs

ollama-python SDK

Capabilities (10/11 rows usable):

local.ollama.python_async_client (provider.admin.read) — vendor docs
local.ollama.python_chat_method (llm.chat) — vendor docs
local.ollama.python_client_class (provider.admin.read) — vendor docs
local.ollama.python_copy_delete (provider.models) — vendor docs
local.ollama.python_create_modelfile (provider.models) — vendor docs
local.ollama.python_embed_method (llm.embed) — vendor docs
local.ollama.python_generate_method (llm.complete) — vendor docs
local.ollama.python_list_method (provider.models) — vendor docs
local.ollama.python_ps_method (provider.lifecycle) — vendor docs
local.ollama.python_show_method (provider.admin.read) — vendor docs

Parameters (10):

Parameter	Type	Default	Allowed	Risk	Notes
`AsyncClient.host`	string	—	—	low	docs
`chat.messages`	array	—	—	low	docs
`Client.host`	string	—	—	low	docs
`copy.source_dest`	string	—	—	medium	docs
`create.from_`	string	—	—	medium	docs
`embed.input`	string	—	—	low	docs
`generate.prompt`	string	—	—	low	docs
`list`	object	—	—	low	docs
`ps`	object	—	—	low	docs
`show.model`	string	—	—	low	docs

Retrieval/files/embeddings

Capabilities (4/4 rows usable):

local.ollama.embed_dimensions (llm.embed) — vendor docs
local.ollama.embed_truncate (llm.embed) — vendor docs
local.ollama.embeddings (llm.embed) — vendor docs
local.ollama.embeddings_legacy (llm.embed) — vendor docs

Parameters (4):

Parameter	Type	Default	Allowed	Risk	Notes
`dimensions`	number	—	—	low	Matryoshka truncation; honored by nomic-embed-text (live-probed at 64).
`truncate`	boolean	`true`	—	low	docs
`prompt`	string	—	—	low	docs
`input`	array	—	—	low	Requires an embedding model (e.g. nomic-embed-text); generation runners refuse embedding requests.

Tools

Capabilities (1/1 rows usable):

local.ollama.tools (tool.call) · model-dependent — vendor docs

​Core generation

​llama-cpp-python

​llama.cpp CLI

​llama.cpp CLI Tools

​llama.cpp Evaluation

​llama.cpp Quant Types

​llama.cpp Runtime

​llama.cpp Server Anthropic Format

​llama.cpp Server Native

​llama.cpp Server OpenAI Format

​LocalAI Anthropic Format

​LocalAI Backends

​LocalAI Galleries

​LocalAI Ollama Format

​LocalAI OpenAI Format

​MLX Apple Platform

​MLX Distributed

​MLX Examples

​MLX Fast Kernels

​MLX Optimizers

​MLX Profiling

​MLX-LM CLI

​MLX-LM Python SDK

​MLX-LM Server

​Ollama Anthropic Format

​Ollama Blobs

​Ollama CLI

​Ollama Environment

​Ollama Generate

​Ollama Model Management

​Ollama Modelfile

​Ollama OpenAI Format

​Ollama Server

​Ollama Streaming

​Ollama Structured Output

​Ollama Thinking

​Ollama Vision

​ollama-js SDK

​ollama-python SDK

​Retrieval/files/embeddings

​Tools

Core generation

llama-cpp-python

llama.cpp CLI

llama.cpp CLI Tools

llama.cpp Evaluation

llama.cpp Quant Types

llama.cpp Runtime

llama.cpp Server Anthropic Format

llama.cpp Server Native

llama.cpp Server OpenAI Format

LocalAI Anthropic Format

LocalAI Backends

LocalAI Galleries

LocalAI Ollama Format

LocalAI OpenAI Format

MLX Apple Platform

MLX Distributed

MLX Examples

MLX Fast Kernels

MLX Optimizers

MLX Profiling

MLX-LM CLI

MLX-LM Python SDK

MLX-LM Server

Ollama Anthropic Format

Ollama Blobs

Ollama CLI

Ollama Environment

Ollama Generate

Ollama Model Management

Ollama Modelfile

Ollama OpenAI Format

Ollama Server

Ollama Streaming

Ollama Structured Output

Ollama Thinking

Ollama Vision

ollama-js SDK

ollama-python SDK

Retrieval/files/embeddings

Tools