Skip to content

Add cost tracking for model providers#6286

Open
AntoineToussaint wants to merge 18 commits intotensorzero:mainfrom
AntoineToussaint:cost-tracking-backend
Open

Add cost tracking for model providers#6286
AntoineToussaint wants to merge 18 commits intotensorzero:mainfrom
AntoineToussaint:cost-tracking-backend

Conversation

@AntoineToussaint
Copy link

@AntoineToussaint AntoineToussaint commented Feb 11, 2026

Summary

Adds cost tracking infrastructure for model and embedding providers. Cost is computed at inference time from raw provider responses using user-configured JSON Pointer mappings, then stored in both ClickHouse and Postgres.

New module: cost.rs

  • Cost type alias for rust_decimal::Decimal (stored as Decimal(18, 9) / NUMERIC(18, 9))
  • Uninitialized → Normalized config pattern:
    • UninitializedCostConfigEntry / UninitializedCostRate — user-facing types with cost_per_million and cost_per_unit, deserialized from TOML/JSON
    • CostConfigEntry / CostRate — normalized runtime types (always per-unit), converted at config load time via TryFrom / From impls
  • CostPointerConfig (unified or split streaming/non-streaming pointers) — stored as String in user-facing config, validated and parsed into json_pointer::JsonPointer at config load time
  • ResponseMode enum (NonStreaming / Streaming) instead of boolean flag
  • compute_cost_from_response() / compute_cost_from_json() — extracts values via JSON Pointers, applies rates, sums contributions
  • CostConfigEntry::compute_contribution() — per-entry cost computation, easily extensible for future expression-based cost models
  • Custom decimal_from_value serde visitor — formats f64 to shortest decimal string (Ryu algorithm) then parses to Decimal, avoiding binary float precision loss (e.g. 0.1 in TOML → exactly Decimal(0.1))
  • Negative rates are allowed (e.g., caching discounts); negative totals log a warning and return None

Configuration

  • New optional cost: Option<CostConfig> field on model providers and embedding providers
  • Pointers validated at config load time using json_pointer crate
  • Example:
    [models."gpt-4o-mini-2024-07-18".providers.openai]
    type = "openai"
    model_name = "gpt-4o-mini-2024-07-18"
    cost = [
      { pointer = "/usage/prompt_tokens", cost_per_million = 0.15, required = true },
      { pointer = "/usage/completion_tokens", cost_per_million = 0.60, required = true },
    ]

Database

  • ClickHouse migration 0047: ALTER TABLE ModelInference ADD COLUMN cost Nullable(Decimal(18, 9))
  • Postgres migration: ALTER TABLE tensorzero.model_inferences ADD COLUMN cost NUMERIC(18, 9)
  • All SELECT/INSERT queries for model_inferences updated to include cost

Inference pipeline

  • Non-streaming: cost computed immediately after provider response from raw_response
  • Streaming: cost computed per-chunk inside stream processing loop, enriching chunks before yielding to downstream consumers
  • Embeddings: cost computed after embedding response (always non-streaming)
  • Usage struct gains cost: Option<Cost> field (derives Copy)
  • Streaming aggregation: uses streaming_max for cost (same strategy as tokens)
  • Cross-model aggregation: uses sum_or_poison — sums cost when all inferences have cost; any None → total None (poison semantics)

API / wire types

  • ModelInference (internal API): new cost: Option<Cost> (omitted when None)
  • StoredModelInference: new cost: Option<Cost>
  • OpenAICompatibleUsage: new tensorzero_cost: Option<Decimal> extension field

Provider changes

All 16 providers updated to initialize cost: None in Usage construction (cost is computed later from raw_response): Anthropic, AWS Bedrock, Dummy, Fireworks, GCP Vertex Anthropic, GCP Vertex Gemini, Google AI Studio Gemini, Groq, Mistral, OpenAI, OpenAI Responses, OpenRouter, TGI, Together, XAI, relay.

Variant changes

cost field threaded through best_of_n_sampling, chat_completion, mixture_of_n, and variant mod.rs.

Optimizer changes

All 4 optimizers updated with cost: None: DICL, Fireworks SFT, OpenAI, Together SFT.

Dependencies

  • rust_decimal = { version = "1.37", features = ["serde-float"] } (new)
  • json_pointer (new, for pointer validation)
  • sqlx feature "rust_decimal" added

Tests

69 cost-related tests passing (unit + E2E):

Unit tests in cost.rs (45 tests)

  • Rate conversion: per-million → per-unit normalization, per-unit passthrough
  • Cost computation: basic token pricing, per-unit pricing (web searches), required/optional missing fields, negative totals, empty config, split pointers (streaming vs non-streaming), unparseable values, string-encoded numbers, booleans, zero tokens, float values, mixed positive/negative (caching discount), discount exceeding base cost
  • Response parsing: valid JSON, invalid JSON
  • TOML deserialization: unified pointers, per-unit rates, split pointers, invalid pointers, invalid split pointers, missing rates, missing pointers, invalid cost values, exact decimal precision (0.1, 0.3), string-quoted decimals, negative costs, both rates present, empty pointer string
  • Config normalization: per-million → per-unit, per-unit passthrough, load_cost_config with valid config
  • JSON Pointer validation: valid, invalid (missing leading slash)
  • Full pipeline (TOML → normalize → provider JSON → cost): OpenAI-style, Anthropic-style with cache, required field missing, per-unit web search, invalid JSON, zero tokens, large token counts (GPT-4), Gemini-style nested usage, direct cost passthrough
  • Warning assertion: test_compute_cost_required_missing_logs_warning uses capture_logs() to verify warning is logged when required pointer is missing

Unit tests in usage.rs (20 tests)

  • Streaming aggregation: empty, single chunk, cumulative (Anthropic-style), final-only (OpenAI-style), all None, mixed None/Some, non-cumulative detection (debug_assert), Anthropic real-world pattern
  • Cross-model aggregation: empty, single, multiple all-Some, None propagation (poison semantics), both None propagate, all None
  • Cost aggregation: both present (sum), one missing poisons total, missing first poisons total, all None, zero is present (not missing), empty returns zero, three-way mix poisons (mixture-of-N scenario)

E2E tests (16 tests)

  • Simple cost (non-streaming + streaming): cost present in API response, cost stored correctly in ClickHouse
  • DICL poison semantics (non-streaming + streaming): model has cost config, embedding model doesn't → total cost is None
  • Best-of-N poison semantics (non-streaming + streaming): candidates have cost, evaluator doesn't → total cost is None
  • Best-of-N all-cost (non-streaming + streaming): all providers (candidates + evaluator) have cost config → total cost is sum
  • Mixture-of-N poison semantics (non-streaming + streaming): one candidate model lacks cost → total cost is None
  • Mixture-of-N all-cost (non-streaming + streaming): all providers (candidates + fuser) have cost config → total cost is sum

Test infrastructure additions

  • best_of_n_0_with_usage dummy provider: returns evaluator content with standard usage fields for cost tracking tests
  • dummy_dicl variant: uses dummy-embedding-model (no cost config) for DICL poison semantics tests
  • cost_test_best_of_n, cost_test_best_of_n_all_cost, cost_test_mixture_of_n, cost_test_mixture_of_n_all_cost functions in E2E config

Test plan

  • cargo check --all-targets --all-features
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --check
  • Unit tests: 65 tests passing (cost.rs + usage.rs)
  • E2E tests: 69 tests run, 69 passed (including 16 cost-specific E2E tests)
  • All existing tests continue to pass (6387 skipped = non-cost tests filtered out)

Closes #6260, closes #6261

Add the ability to configure per-provider cost tracking using JSON Pointer
mappings from raw provider responses. Cost is computed at inference time
and stored in both ClickHouse and Postgres as Decimal(18, 9).

Key changes:
- New `cost` module with config types, computation, and validation
- Support for `cost_per_million` and `cost_per_unit` rate types
- Split pointer config for streaming vs non-streaming responses
- Negative rates allowed (caching discounts); negative totals → None
- DB migrations for ClickHouse (migration 0046) and Postgres
- Cost field threaded through all providers, variants, and inference types
- 22 unit tests covering computation, edge cases, and TOML deserialization

Closes tensorzero#6260, closes tensorzero#6261

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Member

@GabrielBianconi GabrielBianconi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see tests (possibly E2E tests) that cover:

  • Gateway relay
  • Advanced variant types (best/mixture of N, DICL)

^ non-streaming and streaming versions of all of the above

Copy link
Member

@GabrielBianconi GabrielBianconi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my comment above, in addition to success cases, we'll want to test full error cases (missing required cost info) data and mixed cases (e.g. DICL - embeddings doesn't have proper cost - None; model inference does; combined should downgrade to missing).

Copy link
Member

@virajmehta virajmehta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline comments, nice start!

AntoineToussaint and others added 2 commits February 12, 2026 10:26
…raw responses

- Introduce UninitializedCostConfig/CostConfig pattern for per_million -> per_unit normalization
- Add JsonPointer newtype with validation
- Compute cost per-chunk in streaming, aggregate via max
- Add cost column to Postgres model_inferences
- Add E2E tests for cost in streaming/non-streaming responses and ClickHouse persistence
- Replace hardcoded bind parameter numbers in tests with generated values

Co-authored-by: Cursor <cursoragent@cursor.com>
…l for v12

Co-authored-by: Cursor <cursoragent@cursor.com>
virajmehta
virajmehta previously approved these changes Feb 13, 2026
Verify that cost is absent when cost config exists on the relay gateway
but not on the downstream, since the downstream is the one that actually
computes cost. Covers both non-streaming and streaming.

Co-authored-by: Cursor <cursoragent@cursor.com>
AntoineToussaint and others added 2 commits February 13, 2026 16:06
Verify that when both relay and downstream have cost config with
different rates, the downstream's computed cost is preserved — the
relay doesn't recompute or override. Covers streaming and non-streaming.

Co-authored-by: Cursor <cursoragent@cursor.com>
Cover advanced variant types through relay:
- Best-of-N with cost (non-streaming + streaming)
- Best-of-N poison semantics (evaluator lacks cost config)
- Mixture-of-N with cost (non-streaming + streaming)
- Mixture-of-N poison semantics (fuser lacks cost config)

Co-authored-by: Cursor <cursoragent@cursor.com>
@AntoineToussaint
Copy link
Author

Addressing @GabrielBianconi's review requests for relay + advanced variant + error/mixed case tests:

Gateway relay cost tests (gateway/tests/relay/cost.rs — 18 tests):

Basic cost propagation (8 tests):

  • Cost present when downstream has cost config (non-streaming + streaming)
  • Cost absent when no cost config (non-streaming + streaming)
  • Cost through function/variant (non-streaming + streaming)
  • Different cost rates, token counts alongside cost

Cost config location (4 tests):

  • Cost on relay only → absent (downstream is what matters) — non-streaming + streaming
  • Both sides have cost config → downstream's value wins — non-streaming + streaming

Advanced variants (6 tests):

  • Best-of-N with all cost configs — non-streaming + streaming
  • Mixture-of-N with all cost configs — non-streaming + streaming
  • Best-of-N poison semantics (evaluator lacks cost config → None) — non-streaming
  • Mixture-of-N poison semantics (fuser lacks cost config → None) — non-streaming

Error/mixed cases already covered across unit + E2E tests:

  • test_aggregate_cost_one_missing_poisons_total — DICL scenario: model has cost, embedding doesn't → None
  • test_aggregate_cost_three_way_mix_poisons — mixture-of-N: one candidate without cost → None
  • test_aggregate_cost_zero_is_presentSome(0) is distinct from None
  • E2E: test_cost_dicl_poison_* — embedding model lacks cost, chat model has cost → total None
  • E2E: test_cost_best_of_n_poison_* — evaluator lacks cost → total None
  • E2E: test_cost_mixture_of_n_poison_* — one candidate lacks cost → total None
  • Unit: test_compute_cost_required_missing_logs_warning — missing required field logs warning, returns None

DICL is not tested through relay because it requires embedding model lookups which add significant setup complexity; the E2E tests already cover DICL cost scenarios end-to-end.

@virajmehta
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efd75bc279

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

virajmehta
virajmehta previously approved these changes Feb 16, 2026
@GabrielBianconi GabrielBianconi dismissed stale reviews from virajmehta and themself via 8ecccfe February 16, 2026 23:40
@GabrielBianconi
Copy link
Member

/merge-queue

@GabrielBianconi
Copy link
Member

/check-fork

@github-actions
Copy link
Contributor

🚀 General Checks workflow triggered on new branch external-contributor/pr-6286!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/22080794691

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ecccfe03c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +274 to +278
fn relay_cost_config(&self) -> Option<&CostConfig> {
self.routing
.iter()
.filter_map(|name| self.providers.get(name))
.find_map(|p| p.cost.as_ref())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use relay cost config for the provider that actually answered

relay_cost_config() always picks the first routed provider’s pricing config, and that single config is then used for relay cost computation in both streaming and non-streaming paths. For models with fallback routing where providers have different cost pointers/rates, a successful response from a later provider will be priced with the wrong config (or fail pointer extraction), producing incorrect/missing cost values in production relay setups.

Useful? React with 👍 / 👎.

/// Cost of this model inference in dollars.
/// `None` means cost tracking was not configured for this provider or the provider did not send the necessary information.
#[serde(skip_serializing_if = "Option::is_none")]
#[cfg_attr(feature = "ts-bindings", ts(type = "number"))]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep ModelInference.cost optional in generated TS bindings

This field is serialized with skip_serializing_if = "Option::is_none", so responses legitimately omit cost when tracking is unavailable, but the added ts(type = "number") makes the generated binding non-optional (cost: number). That creates a wire-type mismatch for frontend consumers, who will treat cost as always present even though the API can omit it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store cost data in database Allow users to configure cost for model providers and embedding model providers

3 participants