Intelligence APIs.

Intelligence

Activity Timeline

A per-contact CRM feed — a chronological view of messages, delivery status events, ticket workflow changes, spam complaints, and calendar events associated with one contact. Query it with a stable contact_id when available, or an email address when the contact is not linked yet. This endpoint does not return inbox-wide activity. Requires context:read.

Scope: context:read required for token auth.

GET /v1/activity/?inbox_id=&contact_id= Per-contact activity feed scope: context:read

{
  "events": [
    {
      "id":              "evt_01HX9K2NPM",
      "inbox_id":        "<inbox-id>",
      "contact_address": "alice@example.com",
      "contact_id":      "cnt_01HX...",
      "type":            "message.sent",
      "resource_id":     "msg_01HX9K2NPMQWERTY",
      "participants":    ["alice@example.com"],
      "summary":         "Re: Project Alpha timeline",
      "occurred_at":     "2026-05-06T14:23:00Z",
      "metadata": {
        "thread_id":       "thread-xyz",
        "has_attachments": false,
        "preview":         "Looks good — let's sync Friday.",
        "is_read":         true
      }
    },
    {
      "id":              "evt_01HX9K3QRS",
      "inbox_id":        "<inbox-id>",
      "contact_address": "alice@example.com",
      "contact_id":      "cnt_01HX...",
      "type":            "message.bounced",
      "resource_id":     "event_01HX9K3QRS",
      "participants":    null,
      "summary":         "Delivery failed: 550 5.1.1 No such user",
      "occurred_at":     "2026-05-06T14:24:10Z",
      "metadata": {
        "smtp_reply":      "550 5.1.1 No such user"
      }
    },
    {
      "id":              "evt_01HX9K4TUV",
      "inbox_id":        "<inbox-id>",
      "contact_address": "alice@example.com",
      "contact_id":      "cnt_01HX...",
      "type":            "calendar_event.created",
      "resource_id":     "cal_01HX9K4TUV",
      "participants":    null,
      "summary":         "Q2 Planning Session",
      "occurred_at":     "2026-05-07T09:00:00Z",
      "metadata": {
        "start":           "2026-05-07T09:00:00Z",
        "duration":        "PT1H",
        "time_zone":       "America/New_York",
        "location":        "Zoom"
      }
    }
  ],
  "cursor": "eyJwb3MiOjUwfQ..."
}

Field	Type	Required	Description
events	array	optional	Message entry.

→ Activity timeline query params

Contact filter required: Pass either contact_id or address. Prefer contact_id because it is stable across email address changes and future channel identities. Use address only for address-only history before a contact is linked.

Pagination: Results are returned newest-first. Pass the cursor value from the previous response as the cursor query param to fetch the next page. When cursor is null the last page has been reached. The since filter is honoured on every page — passing both cursor and since is safe and will not skip entries.

Entry types — the metadata field is always present ({} if not applicable):

message.sent · message.received — thread_id, has_attachments, preview, is_read
message.delivered · message.bounced — smtp_reply, bounce_category
message.spam_complaint — feedback_type
calendar_event.created — start, duration, time_zone, location

Deduplication & retention: Each entry has a deterministic ID derived from inbox_id + contact_address + type + resource_id — re-processing the same event is always idempotent. Entries are automatically pruned after 2 years (TTL on the occurred_at timestamp).

Intelligence

Relationships

A graph-based view of communication strength between contacts. Nodes are contacts; edge weight is derived from interaction frequency and recency. The full graph shows all contacts for an inbox; the subgraph endpoint centres on one contact. Read-only. Requires context:read.

Scope: context:read required for token auth.

GET /v1/relationships/?inbox_id= Full relationship graph scope: context:read

{
  "nodes": [
    {
      "id":                   "cnt_01",
      "email":                "alice@example.com",
      "contact_id":           "cnt_01",
      "inbox_id":             "<inbox-id>",
      "email_sent_count":     14,
      "email_received_count": 8,
      "email_first_at":       "2025-11-01T09:00:00Z",
      "email_last_at":        "2026-05-20T14:00:00Z",
      "last_sent_at":         "2026-05-20T14:00:00Z",
      "last_received_at":     "2026-05-19T10:30:00Z",
      "last_contact_at":      "2026-05-20T14:00:00Z",
      "intelligence": {
        "behavioral": {
          "weight":                   0.82,
          "outbound_strength":        0.71,
          "inbound_strength":         0.93,
          "communication_pattern":    "initiator",
          "avg_response_time_hours":  4.2,
          "preferred_contact_hours":  [9, 10, 14],
          "reply_rate":               0.85,
          "days_since_contact":       3,
          "last_subject":             "Re: Q2 proposal",
          "thread_count":             12,
          "bounce_detected":          false,
          "relationship_health":      "strong"
        },
        "llm": null,
        "labels": ["vip", "client"],
        "dimensions": {}
      }
    }
  ],
  "edges": [
    {
      "source":                "alice@example.com",
      "target":                "bob@example.com",
      "types":                 ["shared_thread", "same_domain", "same_company"],
      "shared_thread_count":   7,
      "shared_calendar_count": 0,
      "weight":                0.8341,
      "last_seen_at":          "2026-05-20T14:00:00+00:00"
    }
  ]
}

Field	Type	Required	Description
outbound_strength	number	optional	How actively inbox owner emails them.
inbound_strength	number	optional	How actively they email inbox owner.
communication_pattern	string	optional	One of: `initiator`, `responder`, `balanced`.
avg_response_time_hours	number	optional	Welford running avg; null if no data.
preferred_contact_hours	array	optional	Top-3 UTC hours by frequency.
reply_rate	number	optional	Reply rate — replies divided by received messages. Null if no received messages.
days_since_contact	number	optional	Integer days; null if no contact yet.
relationship_health	string	optional	One of: `new`, `strong`, `at_risk`, `dormant`.
llm	null	optional	Populated when LLM enrichment is enabled.
labels	string	optional	Contact labels from context groups.
dimensions	object	optional	External enrichment (e.g. LinkedIn)

GET /v1/relationships/{contact_id_or_address}/?inbox_id= Subgraph for one contact scope: context:read

{
  "nodes": [
    {
      "id":                   "cnt_01",
      "email":                "alice@example.com",
      "contact_id":           "cnt_01",
      "inbox_id":             "<inbox-id>",
      "email_sent_count":     14,
      "email_received_count": 8,
      "email_first_at":       "2025-11-01T09:00:00Z",
      "email_last_at":        "2026-05-20T14:00:00Z",
      "last_sent_at":         "2026-05-20T14:00:00Z",
      "last_received_at":     "2026-05-19T10:30:00Z",
      "last_contact_at":      "2026-05-20T14:00:00Z",
      "intelligence": {
        "behavioral": { },
        "llm": null,
        "labels": [],
        "dimensions": {}
      }
    }
  ],
  "edges": [
    {
      "source":                "alice@example.com",
      "target":                "carol@example.com",
      "types":                 ["shared_calendar_event"],
      "shared_thread_count":   0,
      "shared_calendar_count": 2,
      "weight":                0.2103,
      "last_seen_at":          "2026-05-07T09:00:00+00:00"
    }
  ]
}

The subgraph endpoint accepts either a contact_id or a bare email address as the path segment. The intelligence block is always present. behavioral is derived from interaction history and requires no LLM features. llm is null when the inbox has no LLM plan or the contact has not yet been enriched — enable via the LLM settings and call ?enrich=true to trigger on-demand. labels lists contact-level context group labels.

Edges connect pairs of contacts with one or more typed reasons:

shared_thread — co-appeared in the same message thread.
shared_calendar_event — shared a calendar invite.
same_domain — same email domain, computed on read.
same_company — identical organisation names on their contact cards.

weight — 0–1 score combining frequency (log(count+1)/log(50)) and recency decay (half-life 90 days). Use for edge thickness. 0.0 for domain/company-only edges with no co-occurrence.
last_seen_at — most recent co-occurrence event; null for domain/company-only edges.
Full graph: only edges where both endpoints are on the current page. Subgraph: all edges for the focal contact.

Plan gate: Relationship endpoints require the intelligence feature tier to be lite or higher. Tenants on the free plan receive 403 Forbidden with {"detail": "intelligence lite or above required"}.

is_gent_user: true when an inbound email from this contact carried the X-Gent: 1 header, indicating they are also a gent.mx user. Populated automatically on inbound email processing — no action required. null when not yet detected.

collaboration block: When the tenant has collaboration_enabled: true (team governance plan, admin-set) and the requesting inbox has collaboration_opt_in: true, each relationship node includes a collaboration field with an overlap list showing other opted-in team members who also know this contact. Each overlap entry includes inbox_id, inbox_address, strength, health, and last_contact_at. This powers both "who else on the team knows this contact?" (overlap detection) and "who knows this contact best?" (expertise discovery) — the same query, sorted by strength. Empty list when collaboration is disabled or the inbox has not opted in.

intelligence.llm — when populated

"llm": {
  "next_step":     {
    "action":    "Schedule follow-up call",
    "rationale": "Last email went unanswered for 5 days",
    "urgency":   "medium"
  },
  "contact_digest": "Alice is a senior PM at Acme Corp. Key projects: Q2 redesign, vendor contracts.",
  "enriched_at":    "2026-05-20T08:00:00Z",
  "needs_refresh":  false
}

Intelligence

Context Groups

Label-based cross-entity views. One context group per label — carries entity counts and lists, aggregate communication stats, a merged activity feed across all members, member suggestions based on co-occurrence, and a weekly AI-generated digest. All endpoints require intelligence: full plan tier.

GET /v1/context/?inbox_id= List context groups scope: context:read

{
  "groups": [{
    "id":                  "lbl_01",
    "label":               "Project Alpha",
    "color":               "#4a90d9",
    "contacts":            4,
    "events":              3,
    "messages":            12,
    "files":               0,
    "summary":             { "preview": "Kick-off planning, design review", "enriched_at": "2026-05-28T08:00:00Z" },
    "digest":              null,
    "digest_generated_at": null
  }],
  "cursor": null
}

GET /v1/context/{label_id}/?inbox_id= Group detail with full entity lists scope: context:read

{
  "id":       "lbl_01",
  "label":    "Project Alpha",
  "color":    "#4a90d9",
  "contacts": 4, "events": 3, "messages": 12, "files": 0,
  "summary":             { "preview": "Kick-off planning, design review", "enriched_at": "2026-05-28T08:00:00Z" },
  "digest":              "This week: 3 new emails from Alice and Bob. Carol is at risk — no contact in 45 days. Recommend reaching out to Carol before Friday.",
  "digest_generated_at": "2026-05-26T00:00:00Z",
  "entities": {
    "contacts": [{ "id": "cnt_01", "full_name": "Alice" }],
    "events":   [{ "id": "<event-id>", "title": "Kick-off", "start": "2026-05-10T10:00:00" }],
    "files":    [],
    "messages": []
  }
}

Pass ?enrich=summary or ?enrich=next_step to trigger on-demand AI enrichment. Add &fresh=1 when you need a newly generated result instead of the recent cached result.

GET /v1/context/{label_id}/activity/?inbox_id= Merged activity feed across all group contacts scope: context:read

{
  "events": [{
    "entry_id":        "...",
    "contact_address": "alice@example.com",
    "type":            "message.received",
    "summary":         "Email from alice re: Q2 proposal",
    "occurred_at":     "2026-05-28T14:00:00Z"
  }],
  "cursor": null
}

Fan-out across all group contacts, merged and sorted newest-first. Same entry shape as Activity Timeline. ?limit= max 200.

GET /v1/context/{label_id}/stats/?inbox_id= Aggregate communication stats & health distribution scope: context:read

{
  "member_count":            4,
  "total_sent":              38,
  "total_received":          52,
  "avg_response_time_hours":  3.7,
  "last_contact_at":         "2026-05-28T14:00:00Z",
  "health_distribution": {
    "strong":  2,
    "at_risk": 1,
    "dormant": 1,
    "new":     0
  }
}

Aggregated from Relationship records — no extra storage. health_distribution counts each member's relationship health category (see Relationships for health definitions). Useful for identifying which contacts in the group need attention.

GET /v1/context/{label_id}/suggestions/?inbox_id= Contacts to add, ranked by co-occurrence scope: context:read

[{
  "address":            "dave@example.com",
  "contact_id":         "cnt_04",
  "co_occurrence_count": 9
}]

Field	Type	Required	Description
co_occurrence_count	number	optional	Combined shared-thread + calendar count with group members.

Surfaces contacts that frequently appear in threads or calendar events alongside current members but aren't yet labeled. Add them via POST /entities/. ?limit= max 20.

POST /v1/context/{label_id}/entities/ Apply label to entity scope: context:write

Request

{
  "entity_type": "contact",
  "entity_id":   "cnt_01"
}

Field	Type	Required	Description
entity_type	string	optional	Contact · calendar_event.

201 Created

DEL /v1/context/{label_id}/entities/{entity_type}/{entity_id}/ Remove label from entity scope: context:write

# 204 No Content

→ Context group field reference

Context groups are one-per-label; the label_id is the same ID used in the Labels API. Messages are included automatically when the label keyword is applied to a message via the Messages API.

digest: A weekly narrative (3–5 sentences) combining health distribution, recent activity, and a recommended action. Generated by the llm.sweep_group_digests scheduled task (runs weekly per inbox). null until the first run. Requires LLM config and tenant LLM consent.

Workflow — sender_in_group condition: Workflows can match emails from any member of a group using {"field": "sender_label_ids", "op": "includes", "value": "<label_id>"} as a condition. Example: fire a create_task action whenever anyone from your VIP Clients group emails you.

Intelligence

Enrichment

Optional AI-powered enrichment on top of the intelligence layer. Inference runs in your tenant's region and is billed at provider token cost plus 5%. No API keys required. Each feature can be enabled individually per inbox, or triggered on-demand via ?enrich= on any supported endpoint.

GET /v1/enrichment/reference/ List enrichment models and features no auth required

200 OK

{
  "version": "2026-06-06",
  "resource": "enrichment",
  "payload": {
    "get_config": { "method": "GET", "path": "/v1/enrichment/" },
    "update_config": { "method": "PUT", "path": "/v1/enrichment/" }
  },
  "fields": [...],
  "pricing_model": { "unit": "per_million_tokens" },
  "models": [{
    "id":             "claude-haiku-4-5",
    "provider":       "anthropic",
    "input_mtok":     1.00,
    "output_mtok":    5.00,
    "context_window": 200000,
    "max_output":     64000,
    "available":      true
  }],
  "features": [{
    "feature":          "phishing_detection",
    "group":            "Security",
    "description":      "Classify suspicious inbound messages.",
    "requires_consent": true
  }]
}

Returns only models available for the tenant's region. Use the model id field as the model value in PUT /v1/enrichment/. Use feature feature values in enabled_features.

GET /v1/enrichment/models/ List region-available model options no auth required

{
  "resource": "enrichment_models",
  "models": [...],
  "counts": {"models": 6}
}

GET /v1/enrichment/features/ List enrichment feature options no auth required

{
  "resource": "enrichment_features",
  "features": [...],
  "counts": {"features": 9}
}

PUT /v1/enrichment/ Create or replace LLM config scope: llm:write

Request

{
  "inbox_id":         "you@example.com",
  "model":            "claude-haiku-4-5",
  "enabled_features": ["phishing_detection", "next_step"],
  "budget_limit":     20.00
}

Field	Type	Required	Description
inbox_id	string	required	—
model	string	required	Region-dependent canonical key. Fetch available options from `GET /v1/enrichment/reference/`.
enabled_features	array	required	Feature keys from `GET /v1/enrichment/reference/`.
budget_limit	float	optional	Monthly spend cap in USD. null = unlimited

200 OK

{
  "inbox_id":          "...",
  "model":             "claude-haiku-4-5",
  "enabled_features":  ["phishing_detection", "next_step"],
  "budget_limit":      20.00,
  "updated_at":        "2026-05-22T09:00:00Z"
}

GET /v1/enrichment/?inbox_id= Retrieve LLM config scope: llm:read

{
  "inbox_id":          "...",
  "model":             "claude-haiku-4-5",
  "enabled_features":  ["phishing_detection", "next_step"],
  "budget_limit":      20.00,
  "updated_at":        "2026-05-22T09:00:00Z"
}

DEL /v1/enrichment/?inbox_id= Remove LLM config — disables all features scope: llm:write

# 204 No Content — empty body

→ Enrichment config field reference

On-demand enrichment

# Append ?enrich= to a supported endpoint. Returns cached value if available;
# computes synchronously if not. Returns null + "enriching": true if a
# background task is in-flight. Add &fresh=1 to force recompute (5s cooldown applies).

GET /v1/contacts/{id}/?enrich=next_step,contact_digest
GET /v1/context/{label_id}/?enrich=next_step,summary
GET /v1/messages/threads/{id}/?enrich=thread_summary

# Extra flags in the response:
# "enriching": true      — background task in-flight; poll again shortly
# "on_cooldown": true    — &fresh=1 requested within the 5s cooldown window
# "budget_exceeded": true — monthly budget_limit reached; enrichment skipped

Always-on features run as background tasks triggered by inbox events — phishing_detection fires on every inbound email; next_step and contact_digest re-run whenever the contact's activity timeline is updated. Results are cached on the resource until the next triggering event. budget_limit caps total monthly LLM spend (USD) across all features for the inbox — when reached all enrichment calls are silently skipped. For team tenants, LLM config is managed by admins and applies to all team inboxes.

label_suggest — when enabled, each inbound email is scored against previously labelled emails in the inbox using natural language processing techniques. Labels whose best-match confidence score meets the inbox's label_inference_threshold (default 0.65, configurable via Inbox Settings) are surfaced as suggestions. Suggestions appear in the label_suggestions field of every email response and fire a label.suggested event notification. Each suggestion includes label_id and label_name — confirm with POST /v1/messages/{id}/suggested-labels/{label_id}/ or dismiss with DELETE. This feature does not use AI credits.

calendar_suggest — when enabled, inbound emails are first screened using natural language processing techniques (keyword and date/time pattern matching). Only emails that contain both a scheduling-intent signal (meet, call, invite, standup, etc.) and a time or date reference are passed to AI for structured extraction. This pre-filter means newsletters, invoices, and general correspondence are discarded at near-zero cost. When a schedulable event is detected, a suggestion is stored and an event.suggested alert fires. The suggestion appears as event_suggestion on the email response (includes title, start, duration, location, description, and participants). Confirm with POST /v1/messages/{id}/suggested-events/ (supply calendar_id to create the calendar event) or dismiss with DELETE. Requires tenant LLM consent.

Retrieval

Reusable retrieval source buckets let agents and clients ground email replies, ticket prep, automations, and summaries in approved sources. Sources can include email history, Files folders, public company pages, contacts, calendar context, and configured intelligence records. Answer delivery still happens through email or the calling workflow; there is no separate chat surface.

Plan gate: Retrieval is available from Startup. Tokens need retrieval:read to list buckets/runs, retrieval:write to manage source buckets, and retrieval:run to run retrieval tasks.

Cost controls: Refresh, retention, and public crawls default to manual behavior. Tenant admins configure chunking, extraction file-type allowlists, model selection, and cost controls. Vector embedding usage is billed at provider cost plus 5%; generated answers use normal AI token usage. OCR settings expose tesseract and textract as initial options. Tesseract is packaged in the API runtime; Textract requires regional AWS availability and billable provider calls. Use the extraction test endpoint before indexing customer documents.

Sensitive sources: Broad or derived private sources such as inbox-wide email, label history, contact history, intelligence records, and broad calendar sources are marked in bucket policy. By default, indexing requires an explicit sensitive_source_approved acknowledgement.

GET /v1/retrieval/reference/ Retrieval builder contract no auth required

{
  "resource": "retrieval",
  "source_types": ["email", "files", "public_web", "contacts", "calendar", "intelligence"],
  "bucket_types": [...],
  "tasks": [...],
  "output_types": ["answer", "brief", "evidence"],
  "config_defaults": {"refresh": {"default_mode": "manual"}},
  "sensitive_source_policy": {"policy_key": "policy.sensitive_source_approved"},
  "ocr_engines": [{"engine": "none", "connected": true}, {"engine": "tesseract", "connected": true}, {"engine": "textract", "connected": true}],
  "vector_storage": {"default_backend": "managed_vectors"},
  "refresh_policy": {"cadences": ["manual", "daily", "weekly"]},
  "usage_tracking": {"categories": ["retrieval_runs", "retrieval_index_chunks", "llm_tokens"]}
}

GET /v1/retrieval/models/ List embedding model options scope: retrieval:read

{
  "resource": "retrieval_models",
  "pricing_policy": {"billing_multiplier": 1.05},
  "models": [{
    "model_id": "amazon.titan-embed-text-v2:0",
    "dimensions": [256, 512, 1024]
  }]
}

GET /v1/retrieval/config/ Get tenant retrieval configuration tenant admin, scope: retrieval:read

200 OK

{
  "tenant_id": "tenant_123",
  "config": {
    "embedding": {"model_id": "amazon.titan-embed-text-v2:0", "dimension": 1024},
    "chunking": {"strategy": "recursive_text", "max_chars": 3000, "overlap_chars": 300}
  }
}

PATCH /v1/retrieval/config/ Update tenant retrieval configuration tenant admin, scope: retrieval:write

Request

{
  "chunking": {"strategy": "recursive_text", "max_chars": 3000, "overlap_chars": 300},
  "extraction": {"file_types": ["text", "html", "pdf", "docx", "xlsx"], "ocr_engine": "none"},
  "public_web": {"default_mode": "manual", "max_pages": 25},
  "sensitive_sources": {"require_approval": true}
}

POST /v1/retrieval/extraction-tests/ Test document extraction before indexing scope: retrieval:read

Request — uses the tenant config plus optional temporary overrides

{
  "file_id": "file_123",
  "chunking": {"max_chars": 1200, "overlap_chars": 120},
  "extraction": {"ocr_engine": "none", "file_types": ["pdf", "docx", "xlsx"]},
  "include_sample": true,
  "sample_chars": 800
}

200 OK

{
  "file_id": "file_123",
  "status": "ok",
  "extraction_type": "pdf",
  "extractor": "pdf",
  "chunk_count": 3,
  "total_chars": 6412,
  "ocr_required": false,
  "truncated": false,
  "page_count": 4,
  "row_count": null,
  "rows_extracted": null,
  "source_warnings": [],
  "warnings": []
}

Use this before indexing a Files source to show whether a sample document extracts cleanly. If a scanned PDF or image needs OCR but OCR is disabled for that extraction test or tenant config, the response returns status: "ocr_required". If OCR is selected but the provider fails, the response returns status: "ocr_failed".

POST /v1/retrieval/source-buckets/ Create source bucket scope: retrieval:write

Request

{
  "name":        "Billing support knowledge",
  "bucket_type": "support_class",
  "owner_type":  "inbox",
  "owner_id":    "inbox_123",
  "sources": [
    {"type": "email", "scope": "label", "label_id": "ticket/class/billing"},
    {"type": "files", "folder_id": "folder_123", "include_children": true},
    {"type": "public_web", "allowed_domains": ["example.com"], "refresh": "weekly"}
  ]
}

GET /v1/retrieval/source-buckets/ List source buckets scope: retrieval:read

[{
  "bucket_id":    "rb_123",
  "name":         "Billing support knowledge",
  "bucket_type":  "support_class",
  "vector_backend": "managed_vectors",
  "index_status": "ready",
  "last_index_summary": {
    "chunks": 42,
    "vectors": 42,
    "extraction_metadata": {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "page_count": 18}
  },
  "last_index_error": null,
  "last_index_error_code": null,
  "refresh": {"cadence": "weekly", "next_refresh_at": "2026-06-22T12:00:00Z"},
  "policy": {
    "refresh": {"cadence": "weekly", "next_refresh_at": "2026-06-22T12:00:00Z"}
  }
}]

Read responses expose index summary and refresh state at top level for clients. The same data may also be persisted in bucket policy for server bookkeeping, but clients should prefer the top-level fields.

GET /v1/retrieval/source-buckets/{bucket_id}/ Get source bucket scope: retrieval:read

200 OK — source bucket record

PATCH /v1/retrieval/source-buckets/{bucket_id}/ Update source bucket scope: retrieval:write

Request — all fields optional; at least one required

{
  "name":         "Billing support knowledge v2",
  "index_status": "stale",
  "policy":       {"max_age_days": 90}
}

DELETE /v1/retrieval/source-buckets/{bucket_id}/ Delete source bucket and indexed chunks scope: retrieval:write

Deletes the bucket, chunk metadata, and configured vector records. If configured vector cleanup fails, deletion aborts so it can be retried.

204 No Content

POST /v1/retrieval/source-buckets/{bucket_id}/estimate/ Estimate source bucket indexing scope: retrieval:read

Request — same optional explicit chunks accepted by the index endpoint

{
  "sensitive_source_approved": true,
  "chunks": [
    {"source_type": "files", "source_id": "file_123", "text": "How refunds are handled..."}
  ]
}

Pass sensitive_source_approved: true only after showing the user which sensitive source categories will be indexed. Buckets without sensitive sources do not need this field.

200 OK

{
  "bucket_id":                    "rb_123",
  "estimate_id":                  "7f2d9c4a8b12e3f0a1b2c3d4",
  "estimated_chunks":             42,
  "estimated_input_tokens":       31500,
  "estimated_embedding_requests": 42,
  "source_estimates":              [
    {
      "source_type": "files",
      "method":      "extraction",
      "status":      "extracted",
      "chunk_count": 12,
      "extraction_metadata": {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "page_count": 18},
      "warnings":    []
    }
  ],
  "extraction_metadata":           {"extractors": {"pdf": 8, "ocr:tesseract": 4}, "truncated_sources": 0, "page_count": 18},
  "skipped_sources":                [],
  "extraction_warnings":            [],
  "billable_warnings":              ["Indexing creates billable embedding requests when vector embedding is invoked."],
  "requires_confirmation":        true,
  "warnings":                     []
}

Call this before indexing to show users likely chunk and embedding volume. Estimates use extraction previews for safe local sources such as Files and email, and conservative configured limits for sources that should not be crawled during preflight, such as public websites.

POST /v1/retrieval/source-buckets/{bucket_id}/index/ Queue source bucket indexing scope: retrieval:write

Request — optional explicit chunks for manual or supplemental indexing

{
  "chunks": [
    {
      "source_type": "files",
      "source_id":   "file_123",
      "text":        "How refunds are handled...",
      "visibility":  "bucket"
    }
  ]
}

202 Accepted

{
  "bucket_id":    "rb_123",
  "index_status": "indexing",
  "queued":       true,
  "task":         "retrieval.index_bucket",
  "trigger":      "manual"
}

GET /v1/retrieval/runs/ List retrieval runs scope: retrieval:read

[{
  "run_id":      "rr_123",
  "bucket_id":   "rb_123",
  "task":        "email_answer",
  "status":      "completed",
  "output_type": "answer"
}]

POST /v1/retrieval/runs/ Run retrieval scope: retrieval:run

Request

{
  "bucket_id":   "rb_123",
  "task":        "email_answer",
  "query":       "What should we tell this customer?",
  "output_type": "answer",
  "context":     {"thread_id": "thread_123"}
}

201 Created

{
  "run_id":   "rr_123",
  "status":   "completed",
  "answer":   {
    "text": "...",
    "reason_code": "no_evidence",
    "action_required": "Try a broader query or reindex the source bucket if recently changed.",
    "bucket": {"bucket_id": "rb_123", "index_status": "ready", "last_index_summary": {"chunks": 42}}
  },
  "citations": [...],
  "usage":    {"retrieval_units": 0}
}

GET /v1/retrieval/runs/{run_id}/ Get retrieval run scope: retrieval:read

200 OK — retrieval run record with answer, citations, usage, and context

The current implementation stores the bucket/run contract, chunk metadata, and managed-index configuration. Files sources can index configured file types across text-like files, cleaned HTML, DOCX, XLSX, basic PDF text streams, spreadsheet rows, and OCR-enabled image or scanned-PDF content when an OCR engine is selected. Email sources can index thread, inbox, label, contact, and contact-history scopes; public-web sources can index text pages on allowed domains with page, content-type, timeout, and crawler-policy guardrails. Indexed buckets can return retrieval evidence and generate answers when citation excerpts and an inbox AI model are available. Source-level refresh supports manual, daily, and weekly; successful indexing stores the next scheduled refresh and exposes it as refresh.next_refresh_at. Retrieval runs and indexed chunks are tracked in usage at zero direct retrieval unit cost; answer generation records normal AI token usage. Index summaries are exposed as last_index_summary; failed or incomplete runs include answer.reason_code, answer.action_required, and bucket index context.

Gent API Documentation