Configuration¶
Sieve reads its configuration from a single YAML file. After sieve init, that file lives at ~/.sieve/sieve.yaml. You can override the path with sieve start --config <path> or by setting SIEVE_CONFIG.
The shipping example — with commentary on every option — is sieve.example.yaml. This page documents the same options with more context.
File lookup order¶
sieve start (and sieve status, etc.) resolves the config in this order:
--configflag on the command lineSIEVE_CONFIGenvironment variable./sieve.yamlin the current working directory~/.sieve/sieve.yaml
The first match wins; nothing is merged.
Editing via the CLI¶
You rarely need to open sieve.yaml in a text editor. The sieve config commands cover the common cases:
sieve config show # current values (non-default highlighted)
sieve config set listen.port 11436 # coerced + validated before write
sieve config set provider.base_url http://host:PORT
sieve config edit # open in $EDITOR, rolls back invalid YAML
sieve config reset # ship defaults; preserves provider URL + store path
sieve config set whitelists the keys that can be set this way — see the CLI reference for the full list and validation rules.
Reference¶
Every top-level section below maps to a block in sieve.yaml. Defaults shown are what sieve init writes.
listen¶
The proxy's HTTP listener.
| Key | Default | Notes |
|---|---|---|
host |
127.0.0.1 |
Loopback only by default. Set to 0.0.0.0 to accept from the LAN — do this only if you have another layer enforcing authentication. |
port |
11435 |
Deliberately adjacent to Ollama's 11434 to make intent obvious. |
Override either at startup without editing the file: sieve start --host 0.0.0.0 --port 11500.
provider¶
The upstream LLM endpoint.
provider:
type: auto
base_url: http://127.0.0.1:11434
default_model: qwen3.5:9b
options:
think: false
| Key | Default | Notes |
|---|---|---|
type |
auto |
Leave as auto — Sieve detects the wire protocol (Ollama vs OpenAI-compatible) from the endpoint's responses. |
base_url |
http://127.0.0.1:11434 |
Where to forward requests. Any OpenAI-compatible endpoint or Ollama server. |
default_model |
qwen3.5:9b |
Used when the inbound request does not pin a model, and for internal prompts (classification, writer, etc.) when writer.model is auto. |
options.think |
false |
Sent to model families that support a "thinking" mode (Qwen, DeepSeek). Leave off for Gemma, Mistral, and other families that do not understand the flag. |
embeddings¶
The embedding backend used for vector retrieval.
| Key | Default | Notes |
|---|---|---|
provider |
fastembed |
In-process ONNX Runtime using BAAI/bge-small-en-v1.5 (384-dim, ~50 MB). Auto-downloaded and cached by FastEmbed. |
ollama_url |
— | Only consulted when provider: ollama. Base URL of the Ollama server to call for embeddings. |
ollama_model |
— | Only consulted when provider: ollama. Embedding model name, e.g. nomic-embed-text-v2-moe. |
To switch to an Ollama-hosted embedding pipeline:
embeddings:
provider: ollama
ollama_url: http://127.0.0.1:11434
ollama_model: nomic-embed-text-v2-moe
The FastEmbed default is the recommended path. Use Ollama only if you already operate an embedding pipeline there and want to consolidate.
store¶
The encrypted memory store.
| Key | Default | Notes |
|---|---|---|
path |
~/.sieve/memory.db |
SQLCipher-encrypted SQLite database. The keyfile is written next to it on first init. Back both up together. |
embedding_model |
— | Only used when embeddings.provider: ollama. Records which model produced the stored vectors so incompatible swaps are rejected. |
embedding_dimensions |
— | Only used when embeddings.provider: ollama. Dimensionality of the stored vectors. Must match the model. |
Under FastEmbed the dimensions and model name are fixed; the two Ollama-only keys are ignored.
pipeline¶
Retrieval pipeline shape.
| Key | Default | Notes |
|---|---|---|
conversation_turns |
3 |
Recent turns preserved verbatim in the lean payload. Anything older is compressed or dropped. |
max_rounds |
5 |
Upper bound on multi-hop retrieval rounds per request. |
core_facts_size |
30 |
Number of always-on "core" facts included in every lean payload. |
context_format |
auto |
How retrieved context is formatted for the upstream model. Leave as auto. |
profile_owner¶
The canonical identity for the conversation. Pinned into fact extraction and used by the ghost-fact validator to reject fabrications.
| Key | Default | Notes |
|---|---|---|
name |
"Jamie Rivera" |
Canonical display name. Change this to the actual user's name for a real deployment. |
aliases |
list of common pronouns | Tokens the extractor should resolve back to name when it sees them in first-person text. |
For a single-user personal setup, set name to the user's name; the defaults for aliases are usually fine.
writer¶
The Stage-2 fact extractor. Runs after a turn completes to distil durable facts into the store.
| Key | Default | Notes |
|---|---|---|
model |
auto |
auto routes extraction to provider.default_model — no second model to load. Override with an explicit model name to pin a dedicated writer. |
fallback_model |
auto |
Used when the primary writer call fails. auto means the same as model. |
num_ctx |
4096 |
Context window allocated to the writer call. |
ghost_validator_enabled |
true |
Post-extraction validator that rejects facts unsupported by the turn text. Keep on. |
retrieval¶
Retrieval-side knobs.
| Key | Default | Notes |
|---|---|---|
temporal_dedup |
true |
Collapses near-duplicate facts that disagree only on timestamp, keeping the most recent. Keep on unless you are debugging retrieval. |
Note
The shipping sieve.example.yaml uses the key temporal_dedup_enabled, which the loader does not read — it falls back to the default. Use temporal_dedup (no suffix) to actually override the value. This will be unified in a future release.
tools¶
Tool-schema compression — the largest single win on agent payloads.
tools:
enabled: true
compression: moderate
l1_threshold: 0.5
fallback_include_all: true
max_tools_injected: 10
| Key | Default | Notes |
|---|---|---|
enabled |
true |
Master switch. Turn off only to measure the uncompressed baseline. |
compression |
moderate |
Shape of the compression — moderate strips schemas and keeps signatures; other levels are reserved for future use. |
l1_threshold |
0.5 |
Lexical-match threshold for the first-pass tool selector. Lower means more tools considered. |
fallback_include_all |
true |
If selection fails, forward all tools rather than dropping the request. |
max_tools_injected |
10 |
Hard cap on tools surfaced back to the model per turn. |
learning¶
Adaptive tuning loop.
| Key | Default | Notes |
|---|---|---|
tune_interval |
50 |
Turns between re-tuning passes. |
relevance_threshold |
0.7 |
Minimum relevance for a retrieved fact to stay in the lean payload. |
core_facts_size |
30 |
Size of the core-facts pool maintained by the learning loop. Mirrors pipeline.core_facts_size. |
security¶
Proxy-level access control.
| Key | Default | Notes |
|---|---|---|
auth_token |
null |
When set, clients must present it as a bearer token. Leave null for single-user local setups. Evaluation runs require null. |
allowed_origins |
["127.0.0.1"] |
CORS allow-list. |
If you expose Sieve beyond loopback, set auth_token to a random secret and change listen.host to 0.0.0.0 — in that order.
ablation¶
Per-subsystem on/off switches. Exposed so you can reproduce ablation measurements and diagnose regressions. The shipping defaults are what was evaluated — only change these if you are actively running an experiment.
ablation:
fingerprinting: true
classifier: true
pre_populate: true
graph_traversal: true
temporal_versioning: true
learning_loop: true
coherence_integrity: true
stage2_writer: true
recall_tool: true
absence_signal: true
closed_world: false
response_verification: false
schema_v2: false
tier2_classifier: false
extreme_summary: true
The most consequential flags:
absence_signal(on). Refuses to fabricate when a recall query targets a fact not in the store. Responsible for the hallucination-reduction numbers.stage2_writer(on). Runs the fact extractor after each turn. Without this the store never grows.extreme_summary(on). Narrative summariser for long conversations.closed_world(off). An earlier, stricter absence posture. Permanently off — superseded byabsence_signal.response_verification(off). Pattern-based output check. Disabled pending pattern coverage.
Common setups¶
Local Ollama (default)¶
The sieve init defaults cover this case. No edits required.
Any OpenAI-compatible endpoint¶
Change provider.base_url to the endpoint and provider.default_model to the model name:
provider:
type: auto
base_url: https://your-openai-compatible-host/v1
default_model: your-model-name
The specific URL, model name, and authentication method depend on your endpoint. Authentication headers (Authorization: Bearer ..., etc.) are set on the agent-side client and forwarded by Sieve; Sieve itself has no provider credentials.
Exposing Sieve to another machine¶
listen:
host: 0.0.0.0
port: 11435
security:
auth_token: "<long-random-string>"
allowed_origins: ["10.0.0.0/8"]
Management endpoints under /sieve/* now require an X-Sieve-Token: <your-token> header. Proxy pass-through endpoints (/api/*, /v1/*) remain unauthenticated so your agent does not need Sieve-specific credentials — rely on network-level restrictions (loopback, LAN segmentation, a reverse proxy with TLS) for those. Do not expose Sieve directly to the public internet.
Applying changes¶
sieve start reads the config at startup. To apply a change:
There is no live reload. For interactive tuning, edit the file, restart, and watch sieve status and the per-response X-Sieve-* headers.