Gent API Documentation

Activity Timeline

A per-contact CRM feed — a chronological view of messages, delivery status events, ticket workflow changes, spam complaints, and calendar events associated with one contact. Query it with a stable contact_id when available, or an email address when the contact is not linked yet. This endpoint does not return inbox-wide activity. Requires context:read.

Scope: context:read required for token auth.
GET /v1/activity/?inbox_id=&contact_id= Per-contact activity feed  scope: context:read
{ "events": [ { "id": "evt_01HX9K2NPM", "inbox_id": "<inbox-id>", "contact_address": "alice@example.com", "contact_id": "cnt_01HX...", "type": "message.sent", "resource_id": "msg_01HX9K2NPMQWERTY", "participants": ["alice@example.com"], "summary": "Re: Project Alpha timeline", "occurred_at": "2026-05-06T14:23:00Z", "metadata": { "thread_id": "thread-xyz", "has_attachments": false, "preview": "Looks good — let's sync Friday.", "is_read": true } }, { "id": "evt_01HX9K3QRS", "inbox_id": "<inbox-id>", "contact_address": "alice@example.com", "contact_id": "cnt_01HX...", "type": "message.bounced", "resource_id": "event_01HX9K3QRS", "participants": null, "summary": "Delivery failed: 550 5.1.1 No such user", "occurred_at": "2026-05-06T14:24:10Z", "metadata": { "smtp_reply": "550 5.1.1 No such user" } }, { "id": "evt_01HX9K4TUV", "inbox_id": "<inbox-id>", "contact_address": "alice@example.com", "contact_id": "cnt_01HX...", "type": "calendar_event.created", "resource_id": "cal_01HX9K4TUV", "participants": null, "summary": "Q2 Planning Session", "occurred_at": "2026-05-07T09:00:00Z", "metadata": { "start": "2026-05-07T09:00:00Z", "duration": "PT1H", "time_zone": "America/New_York", "location": "Zoom" } } ], "cursor": "eyJwb3MiOjUwfQ..." }
FieldTypeRequiredDescription
eventsarrayoptionalMessage entry.
→ Activity timeline query params
Contact filter required: Pass either contact_id or address. Prefer contact_id because it is stable across email address changes and future channel identities. Use address only for address-only history before a contact is linked.
Pagination: Results are returned newest-first. Pass the cursor value from the previous response as the cursor query param to fetch the next page. When cursor is null the last page has been reached. The since filter is honoured on every page — passing both cursor and since is safe and will not skip entries.
Entry types — the metadata field is always present ({} if not applicable):
  • message.sent · message.receivedthread_id, has_attachments, preview, is_read
  • message.delivered · message.bouncedsmtp_reply, bounce_category
  • message.spam_complaintfeedback_type
  • calendar_event.createdstart, duration, time_zone, location
Deduplication & retention: Each entry has a deterministic ID derived from inbox_id + contact_address + type + resource_id — re-processing the same event is always idempotent. Entries are automatically pruned after 2 years (TTL on the occurred_at timestamp).
Relationships

A graph-based view of communication strength between contacts. Nodes are contacts; edge weight is derived from interaction frequency and recency. The full graph shows all contacts for an inbox; the subgraph endpoint centres on one contact. Read-only. Requires context:read.

Scope: context:read required for token auth.
GET /v1/relationships/?inbox_id= Full relationship graph  scope: context:read
{ "nodes": [ { "id": "cnt_01", "email": "alice@example.com", "contact_id": "cnt_01", "inbox_id": "<inbox-id>", "email_sent_count": 14, "email_received_count": 8, "email_first_at": "2025-11-01T09:00:00Z", "email_last_at": "2026-05-20T14:00:00Z", "last_sent_at": "2026-05-20T14:00:00Z", "last_received_at": "2026-05-19T10:30:00Z", "last_contact_at": "2026-05-20T14:00:00Z", "intelligence": { "behavioral": { "weight": 0.82, "outbound_strength": 0.71, "inbound_strength": 0.93, "communication_pattern": "initiator", "avg_response_time_hours": 4.2, "preferred_contact_hours": [9, 10, 14], "reply_rate": 0.85, "days_since_contact": 3, "last_subject": "Re: Q2 proposal", "thread_count": 12, "bounce_detected": false, "relationship_health": "strong" }, "llm": null, "labels": ["vip", "client"], "dimensions": {} } } ], "edges": [ { "source": "alice@example.com", "target": "bob@example.com", "types": ["shared_thread", "same_domain", "same_company"], "shared_thread_count": 7, "shared_calendar_count": 0, "weight": 0.8341, "last_seen_at": "2026-05-20T14:00:00+00:00" } ] }
FieldTypeRequiredDescription
outbound_strengthnumberoptionalHow actively inbox owner emails them.
inbound_strengthnumberoptionalHow actively they email inbox owner.
communication_patternstringoptionalOne of: initiator, responder, balanced.
avg_response_time_hoursnumberoptionalWelford running avg; null if no data.
preferred_contact_hoursarrayoptionalTop-3 UTC hours by frequency.
reply_ratenumberoptionalReply rate — replies divided by received messages. Null if no received messages.
days_since_contactnumberoptionalInteger days; null if no contact yet.
relationship_healthstringoptionalOne of: new, strong, at_risk, dormant.
llmnulloptionalPopulated when LLM enrichment is enabled.
labelsstringoptionalContact labels from context groups.
dimensionsobjectoptionalExternal enrichment (e.g. LinkedIn)
GET /v1/relationships/{contact_id_or_address}/?inbox_id= Subgraph for one contact  scope: context:read
{ "nodes": [ { "id": "cnt_01", "email": "alice@example.com", "contact_id": "cnt_01", "inbox_id": "<inbox-id>", "email_sent_count": 14, "email_received_count": 8, "email_first_at": "2025-11-01T09:00:00Z", "email_last_at": "2026-05-20T14:00:00Z", "last_sent_at": "2026-05-20T14:00:00Z", "last_received_at": "2026-05-19T10:30:00Z", "last_contact_at": "2026-05-20T14:00:00Z", "intelligence": { "behavioral": { }, "llm": null, "labels": [], "dimensions": {} } } ], "edges": [ { "source": "alice@example.com", "target": "carol@example.com", "types": ["shared_calendar_event"], "shared_thread_count": 0, "shared_calendar_count": 2, "weight": 0.2103, "last_seen_at": "2026-05-07T09:00:00+00:00" } ] }
The subgraph endpoint accepts either a contact_id or a bare email address as the path segment. The intelligence block is always present. behavioral is derived from interaction history and requires no LLM features. llm is null when the inbox has no LLM plan or the contact has not yet been enriched — enable via the LLM settings and call ?enrich=true to trigger on-demand. labels lists contact-level context group labels.
Edges connect pairs of contacts with one or more typed reasons:
  • shared_thread — co-appeared in the same message thread.
  • shared_calendar_event — shared a calendar invite.
  • same_domain — same email domain, computed on read.
  • same_company — identical organisation names on their contact cards.
  • weight — 0–1 score combining frequency (log(count+1)/log(50)) and recency decay (half-life 90 days). Use for edge thickness. 0.0 for domain/company-only edges with no co-occurrence.
  • last_seen_at — most recent co-occurrence event; null for domain/company-only edges.
  • Full graph: only edges where both endpoints are on the current page. Subgraph: all edges for the focal contact.
Plan gate: Relationship endpoints require the intelligence feature tier to be lite or higher. Tenants on the free plan receive 403 Forbidden with {"detail": "intelligence lite or above required"}.
is_gent_user: true when an inbound email from this contact carried the X-Gent: 1 header, indicating they are also a gent.mx user. Populated automatically on inbound email processing — no action required. null when not yet detected.
collaboration block: When the tenant has collaboration_enabled: true (team governance plan, admin-set) and the requesting inbox has collaboration_opt_in: true, each relationship node includes a collaboration field with an overlap list showing other opted-in team members who also know this contact. Each overlap entry includes inbox_id, inbox_address, strength, health, and last_contact_at. This powers both "who else on the team knows this contact?" (overlap detection) and "who knows this contact best?" (expertise discovery) — the same query, sorted by strength. Empty list when collaboration is disabled or the inbox has not opted in.
intelligence.llm — when populated
"llm": { "next_step": { "action": "Schedule follow-up call", "rationale": "Last email went unanswered for 5 days", "urgency": "medium" }, "contact_digest": "Alice is a senior PM at Acme Corp. Key projects: Q2 redesign, vendor contracts.", "enriched_at": "2026-05-20T08:00:00Z", "needs_refresh": false }
Context Groups

Label-based cross-entity views. One context group per label — carries entity counts and lists, aggregate communication stats, a merged activity feed across all members, member suggestions based on co-occurrence, and a weekly AI-generated digest. All endpoints require intelligence: full plan tier.

GET /v1/context/?inbox_id= List context groups  scope: context:read
{ "groups": [{ "id": "lbl_01", "label": "Project Alpha", "color": "#4a90d9", "contacts": 4, "events": 3, "messages": 12, "files": 0, "summary": { "preview": "Kick-off planning, design review", "enriched_at": "2026-05-28T08:00:00Z" }, "digest": null, "digest_generated_at": null }], "cursor": null }
GET /v1/context/{label_id}/?inbox_id= Group detail with full entity lists  scope: context:read
{ "id": "lbl_01", "label": "Project Alpha", "color": "#4a90d9", "contacts": 4, "events": 3, "messages": 12, "files": 0, "summary": { "preview": "Kick-off planning, design review", "enriched_at": "2026-05-28T08:00:00Z" }, "digest": "This week: 3 new emails from Alice and Bob. Carol is at risk — no contact in 45 days. Recommend reaching out to Carol before Friday.", "digest_generated_at": "2026-05-26T00:00:00Z", "entities": { "contacts": [{ "id": "cnt_01", "full_name": "Alice" }], "events": [{ "id": "<event-id>", "title": "Kick-off", "start": "2026-05-10T10:00:00" }], "files": [], "messages": [] } }
Pass ?enrich=summary or ?enrich=next_step to trigger on-demand AI enrichment. Add &fresh=1 when you need a newly generated result instead of the recent cached result.
GET /v1/context/{label_id}/activity/?inbox_id= Merged activity feed across all group contacts  scope: context:read
{ "events": [{ "entry_id": "...", "contact_address": "alice@example.com", "type": "message.received", "summary": "Email from alice re: Q2 proposal", "occurred_at": "2026-05-28T14:00:00Z" }], "cursor": null }
Fan-out across all group contacts, merged and sorted newest-first. Same entry shape as Activity Timeline. ?limit= max 200.
GET /v1/context/{label_id}/stats/?inbox_id= Aggregate communication stats & health distribution  scope: context:read
{ "member_count": 4, "total_sent": 38, "total_received": 52, "avg_response_time_hours": 3.7, "last_contact_at": "2026-05-28T14:00:00Z", "health_distribution": { "strong": 2, "at_risk": 1, "dormant": 1, "new": 0 } }
Aggregated from Relationship records — no extra storage. health_distribution counts each member's relationship health category (see Relationships for health definitions). Useful for identifying which contacts in the group need attention.
GET /v1/context/{label_id}/suggestions/?inbox_id= Contacts to add, ranked by co-occurrence  scope: context:read
[{ "address": "dave@example.com", "contact_id": "cnt_04", "co_occurrence_count": 9 }]
FieldTypeRequiredDescription
co_occurrence_countnumberoptionalCombined shared-thread + calendar count with group members.
Surfaces contacts that frequently appear in threads or calendar events alongside current members but aren't yet labeled. Add them via POST /entities/. ?limit= max 20.
POST /v1/context/{label_id}/entities/ Apply label to entity  scope: context:write
Request
{ "entity_type": "contact", "entity_id": "cnt_01" }
FieldTypeRequiredDescription
entity_typestringoptionalContact · calendar_event.
201 Created
DEL /v1/context/{label_id}/entities/{entity_type}/{entity_id}/ Remove label from entity  scope: context:write
# 204 No Content
→ Context group field reference
Context groups are one-per-label; the label_id is the same ID used in the Labels API. Messages are included automatically when the label keyword is applied to a message via the Messages API.
digest: A weekly narrative (3–5 sentences) combining health distribution, recent activity, and a recommended action. Generated by the llm.sweep_group_digests scheduled task (runs weekly per inbox). null until the first run. Requires LLM config and tenant LLM consent.
Workflow — sender_in_group condition: Workflows can match emails from any member of a group using {"field": "sender_label_ids", "op": "includes", "value": "<label_id>"} as a condition. Example: fire a create_task action whenever anyone from your VIP Clients group emails you.
Enrichment

Optional AI-powered enrichment on top of the intelligence layer. Inference runs in your tenant's region and is billed at provider token cost plus 5%. No API keys required. Each feature can be enabled individually per inbox, or triggered on-demand via ?enrich= on any supported endpoint.

GET /v1/enrichment/reference/ List enrichment models and features  no auth required
200 OK
{ "version": "2026-06-06", "resource": "enrichment", "payload": { "get_config": { "method": "GET", "path": "/v1/enrichment/" }, "update_config": { "method": "PUT", "path": "/v1/enrichment/" } }, "fields": [...], "pricing_model": { "unit": "per_million_tokens" }, "models": [{ "id": "claude-haiku-4-5", "provider": "anthropic", "input_mtok": 1.00, "output_mtok": 5.00, "context_window": 200000, "max_output": 64000, "available": true }], "features": [{ "feature": "phishing_detection", "group": "Security", "description": "Classify suspicious inbound messages.", "requires_consent": true }] }
Returns only models available for the tenant's region. Use the model id field as the model value in PUT /v1/enrichment/. Use feature feature values in enabled_features.
GET /v1/enrichment/models/ List region-available model options  no auth required
{ "resource": "enrichment_models", "models": [...], "counts": {"models": 6} }
GET /v1/enrichment/features/ List enrichment feature options  no auth required
{ "resource": "enrichment_features", "features": [...], "counts": {"features": 9} }
PUT /v1/enrichment/ Create or replace LLM config  scope: llm:write
Request
{ "inbox_id": "you@example.com", "model": "claude-haiku-4-5", "enabled_features": ["phishing_detection", "next_step"], "budget_limit": 20.00 }
FieldTypeRequiredDescription
inbox_idstringrequired
modelstringrequiredRegion-dependent canonical key. Fetch available options from GET /v1/enrichment/reference/.
enabled_featuresarrayrequiredFeature keys from GET /v1/enrichment/reference/.
budget_limitfloatoptionalMonthly spend cap in USD. null = unlimited
200 OK
{ "inbox_id": "...", "model": "claude-haiku-4-5", "enabled_features": ["phishing_detection", "next_step"], "budget_limit": 20.00, "updated_at": "2026-05-22T09:00:00Z" }
GET /v1/enrichment/?inbox_id= Retrieve LLM config  scope: llm:read
{ "inbox_id": "...", "model": "claude-haiku-4-5", "enabled_features": ["phishing_detection", "next_step"], "budget_limit": 20.00, "updated_at": "2026-05-22T09:00:00Z" }
DEL /v1/enrichment/?inbox_id= Remove LLM config — disables all features  scope: llm:write
# 204 No Content — empty body
→ Enrichment config field reference
On-demand enrichment
# Append ?enrich= to a supported endpoint. Returns cached value if available; # computes synchronously if not. Returns null + "enriching": true if a # background task is in-flight. Add &fresh=1 to force recompute (5s cooldown applies). GET /v1/contacts/{id}/?enrich=next_step,contact_digest GET /v1/context/{label_id}/?enrich=next_step,summary GET /v1/messages/threads/{id}/?enrich=thread_summary # Extra flags in the response: # "enriching": true — background task in-flight; poll again shortly # "on_cooldown": true — &fresh=1 requested within the 5s cooldown window # "budget_exceeded": true — monthly budget_limit reached; enrichment skipped
Always-on features run as background tasks triggered by inbox events — phishing_detection fires on every inbound email; next_step and contact_digest re-run whenever the contact's activity timeline is updated. Results are cached on the resource until the next triggering event. budget_limit caps total monthly LLM spend (USD) across all features for the inbox — when reached all enrichment calls are silently skipped. For team tenants, LLM config is managed by admins and applies to all team inboxes.
label_suggest — when enabled, each inbound email is scored against previously labelled emails in the inbox using natural language processing techniques. Labels whose best-match confidence score meets the inbox's label_inference_threshold (default 0.65, configurable via Inbox Settings) are surfaced as suggestions. Suggestions appear in the label_suggestions field of every email response and fire a label.suggested event notification. Each suggestion includes label_id and label_name — confirm with POST /v1/messages/{id}/suggested-labels/{label_id}/ or dismiss with DELETE. This feature does not use AI credits.
calendar_suggest — when enabled, inbound emails are first screened using natural language processing techniques (keyword and date/time pattern matching). Only emails that contain both a scheduling-intent signal (meet, call, invite, standup, etc.) and a time or date reference are passed to AI for structured extraction. This pre-filter means newsletters, invoices, and general correspondence are discarded at near-zero cost. When a schedulable event is detected, a suggestion is stored and an event.suggested alert fires. The suggestion appears as event_suggestion on the email response (includes title, start, duration, location, description, and participants). Confirm with POST /v1/messages/{id}/suggested-events/ (supply calendar_id to create the calendar event) or dismiss with DELETE. Requires tenant LLM consent.
Retrieval

Reusable retrieval source buckets let agents and clients ground email replies, ticket prep, automations, and summaries in approved sources. Sources can include email history, Files folders, public company pages, contacts, calendar context, and configured intelligence records. Answer delivery still happens through email or the calling workflow; there is no separate chat surface.

Plan gate: Retrieval is available from Startup. Tokens need retrieval:read to list buckets/runs, retrieval:write to manage source buckets, and retrieval:run to run retrieval tasks.
Cost controls: Refresh, retention, and public crawls default to manual behavior. Tenant admins configure chunking, extraction file-type allowlists, model selection, and cost controls. Vector embedding usage is billed at provider cost plus 5%; generated answers use normal AI token usage. OCR settings expose tesseract and textract as initial options. Tesseract is packaged in the API runtime; Textract requires regional AWS availability and billable provider calls. Use the extraction test endpoint before indexing customer documents.
Sensitive sources: Broad or derived private sources such as inbox-wide email, label history, contact history, intelligence records, and broad calendar sources are marked in bucket policy. By default, indexing requires an explicit sensitive_source_approved acknowledgement.
GET /v1/retrieval/reference/ Retrieval builder contract  no auth required
{ "resource": "retrieval", "source_types": ["email", "files", "public_web", "contacts", "calendar", "intelligence"], "bucket_types": [...], "tasks": [...], "output_types": ["answer", "brief", "evidence"], "config_defaults": {"refresh": {"default_mode": "manual"}}, "sensitive_source_policy": {"policy_key": "policy.sensitive_source_approved"}, "ocr_engines": [{"engine": "none", "connected": true}, {"engine": "tesseract", "connected": true}, {"engine": "textract", "connected": true}], "vector_storage": {"default_backend": "managed_vectors"}, "refresh_policy": {"cadences": ["manual", "daily", "weekly"]}, "usage_tracking": {"categories": ["retrieval_runs", "retrieval_index_chunks", "llm_tokens"]} }
GET /v1/retrieval/models/ List embedding model options  scope: retrieval:read
{ "resource": "retrieval_models", "pricing_policy": {"billing_multiplier": 1.05}, "models": [{ "model_id": "amazon.titan-embed-text-v2:0", "dimensions": [256, 512, 1024] }] }
GET /v1/retrieval/config/ Get tenant retrieval configuration  tenant admin, scope: retrieval:read
200 OK
{ "tenant_id": "tenant_123", "config": { "embedding": {"model_id": "amazon.titan-embed-text-v2:0", "dimension": 1024}, "chunking": {"strategy": "recursive_text", "max_chars": 3000, "overlap_chars": 300} } }
PATCH /v1/retrieval/config/ Update tenant retrieval configuration  tenant admin, scope: retrieval:write
Request
{ "chunking": {"strategy": "recursive_text", "max_chars": 3000, "overlap_chars": 300}, "extraction": {"file_types": ["text", "html", "pdf", "docx", "xlsx"], "ocr_engine": "none"}, "public_web": {"default_mode": "manual", "max_pages": 25}, "sensitive_sources": {"require_approval": true} }
POST /v1/retrieval/extraction-tests/ Test document extraction before indexing  scope: retrieval:read
Request — uses the tenant config plus optional temporary overrides
{ "file_id": "file_123", "chunking": {"max_chars": 1200, "overlap_chars": 120}, "extraction": {"ocr_engine": "none", "file_types": ["pdf", "docx", "xlsx"]}, "include_sample": true, "sample_chars": 800 }
200 OK
{ "file_id": "file_123", "status": "ok", "extraction_type": "pdf", "extractor": "pdf", "chunk_count": 3, "total_chars": 6412, "ocr_required": false, "truncated": false, "page_count": 4, "row_count": null, "rows_extracted": null, "source_warnings": [], "warnings": [] }
Use this before indexing a Files source to show whether a sample document extracts cleanly. If a scanned PDF or image needs OCR but OCR is disabled for that extraction test or tenant config, the response returns status: "ocr_required". If OCR is selected but the provider fails, the response returns status: "ocr_failed".
POST /v1/retrieval/source-buckets/ Create source bucket  scope: retrieval:write
Request
{ "name": "Billing support knowledge", "bucket_type": "support_class", "owner_type": "inbox", "owner_id": "inbox_123", "sources": [ {"type": "email", "scope": "label", "label_id": "ticket/class/billing"}, {"type": "files", "folder_id": "folder_123", "include_children": true}, {"type": "public_web", "allowed_domains": ["example.com"], "refresh": "weekly"} ] }
GET /v1/retrieval/source-buckets/ List source buckets  scope: retrieval:read
[{ "bucket_id": "rb_123", "name": "Billing support knowledge", "bucket_type": "support_class", "vector_backend": "managed_vectors", "index_status": "ready", "last_index_summary": { "chunks": 42, "vectors": 42, "extraction_metadata": {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "page_count": 18} }, "last_index_error": null, "last_index_error_code": null, "refresh": {"cadence": "weekly", "next_refresh_at": "2026-06-22T12:00:00Z"}, "policy": { "refresh": {"cadence": "weekly", "next_refresh_at": "2026-06-22T12:00:00Z"} } }]
Read responses expose index summary and refresh state at top level for clients. The same data may also be persisted in bucket policy for server bookkeeping, but clients should prefer the top-level fields.
GET /v1/retrieval/source-buckets/{bucket_id}/ Get source bucket  scope: retrieval:read
200 OK — source bucket record
PATCH /v1/retrieval/source-buckets/{bucket_id}/ Update source bucket  scope: retrieval:write
Request — all fields optional; at least one required
{ "name": "Billing support knowledge v2", "index_status": "stale", "policy": {"max_age_days": 90} }
DELETE /v1/retrieval/source-buckets/{bucket_id}/ Delete source bucket and indexed chunks  scope: retrieval:write
Deletes the bucket, chunk metadata, and configured vector records. If configured vector cleanup fails, deletion aborts so it can be retried.
204 No Content
POST /v1/retrieval/source-buckets/{bucket_id}/estimate/ Estimate source bucket indexing  scope: retrieval:read
Request — same optional explicit chunks accepted by the index endpoint
{ "sensitive_source_approved": true, "chunks": [ {"source_type": "files", "source_id": "file_123", "text": "How refunds are handled..."} ] }
Pass sensitive_source_approved: true only after showing the user which sensitive source categories will be indexed. Buckets without sensitive sources do not need this field.
200 OK
{ "bucket_id": "rb_123", "estimate_id": "7f2d9c4a8b12e3f0a1b2c3d4", "estimated_chunks": 42, "estimated_input_tokens": 31500, "estimated_embedding_requests": 42, "source_estimates": [ { "source_type": "files", "method": "extraction", "status": "extracted", "chunk_count": 12, "extraction_metadata": {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "page_count": 18}, "warnings": [] } ], "extraction_metadata": {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "truncated_sources": 0, "page_count": 18}, "skipped_sources": [], "extraction_warnings": [], "billable_warnings": ["Indexing creates billable embedding requests when vector embedding is invoked."], "requires_confirmation": true, "warnings": [] }
Call this before indexing to show users likely chunk and embedding volume. Estimates use extraction previews for safe local sources such as Files and email, and conservative configured limits for sources that should not be crawled during preflight, such as public websites.
POST /v1/retrieval/source-buckets/{bucket_id}/index/ Queue source bucket indexing  scope: retrieval:write
Request — optional explicit chunks for manual or supplemental indexing
{ "chunks": [ { "source_type": "files", "source_id": "file_123", "text": "How refunds are handled...", "visibility": "bucket" } ] }
202 Accepted
{ "bucket_id": "rb_123", "index_status": "indexing", "queued": true, "task": "retrieval.index_bucket", "trigger": "manual" }
GET /v1/retrieval/runs/ List retrieval runs  scope: retrieval:read
[{ "run_id": "rr_123", "bucket_id": "rb_123", "task": "email_answer", "status": "completed", "output_type": "answer" }]
POST /v1/retrieval/runs/ Run retrieval  scope: retrieval:run
Request
{ "bucket_id": "rb_123", "task": "email_answer", "query": "What should we tell this customer?", "output_type": "answer", "context": {"thread_id": "thread_123"} }
201 Created
{ "run_id": "rr_123", "status": "completed", "answer": { "text": "...", "reason_code": "no_evidence", "action_required": "Try a broader query or reindex the source bucket if recently changed.", "bucket": {"bucket_id": "rb_123", "index_status": "ready", "last_index_summary": {"chunks": 42}} }, "citations": [...], "usage": {"retrieval_units": 0} }
GET /v1/retrieval/runs/{run_id}/ Get retrieval run  scope: retrieval:read
200 OK — retrieval run record with answer, citations, usage, and context
The current implementation stores the bucket/run contract, chunk metadata, and managed-index configuration. Files sources can index configured file types across text-like files, cleaned HTML, DOCX, XLSX, basic PDF text streams, spreadsheet rows, and OCR-enabled image or scanned-PDF content when an OCR engine is selected. Email sources can index thread, inbox, label, contact, and contact-history scopes; public-web sources can index text pages on allowed domains with page, content-type, timeout, and crawler-policy guardrails. Indexed buckets can return retrieval evidence and generate answers when citation excerpts and an inbox AI model are available. Source-level refresh supports manual, daily, and weekly; successful indexing stores the next scheduled refresh and exposes it as refresh.next_refresh_at. Retrieval runs and indexed chunks are tracked in usage at zero direct retrieval unit cost; answer generation records normal AI token usage. Index summaries are exposed as last_index_summary; failed or incomplete runs include answer.reason_code, answer.action_required, and bucket index context.