Add cost tracking for model providers by AntoineToussaint · Pull Request #6286 · tensorzero/tensorzero

AntoineToussaint · 2026-02-11T22:15:11Z

Summary

Adds cost tracking infrastructure for model and embedding providers. Cost is computed at inference time from raw provider responses using user-configured JSON Pointer mappings, then stored in both ClickHouse and Postgres.

New module: `cost.rs`

Cost type alias for rust_decimal::Decimal (stored as Decimal(18, 9) / NUMERIC(18, 9))
Uninitialized → Normalized config pattern:
- UninitializedCostConfigEntry / UninitializedCostRate — user-facing types with cost_per_million and cost_per_unit, deserialized from TOML/JSON
- CostConfigEntry / CostRate — normalized runtime types (always per-unit), converted at config load time via TryFrom / From impls
CostPointerConfig (unified or split streaming/non-streaming pointers) — stored as String in user-facing config, validated and parsed into json_pointer::JsonPointer at config load time
ResponseMode enum (NonStreaming / Streaming) instead of boolean flag
compute_cost_from_response() / compute_cost_from_json() — extracts values via JSON Pointers, applies rates, sums contributions
CostConfigEntry::compute_contribution() — per-entry cost computation, easily extensible for future expression-based cost models
Custom decimal_from_value serde visitor — formats f64 to shortest decimal string (Ryu algorithm) then parses to Decimal, avoiding binary float precision loss (e.g. 0.1 in TOML → exactly Decimal(0.1))
Negative rates are allowed (e.g., caching discounts); negative totals log a warning and return None

Configuration

New optional cost: Option<CostConfig> field on model providers and embedding providers
Pointers validated at config load time using json_pointer crate

Example:

[models."gpt-4o-mini-2024-07-18".providers.openai]
type = "openai"
model_name = "gpt-4o-mini-2024-07-18"
cost = [
  { pointer = "/usage/prompt_tokens", cost_per_million = 0.15, required = true },
  { pointer = "/usage/completion_tokens", cost_per_million = 0.60, required = true },
]

Database

ClickHouse migration 0047: ALTER TABLE ModelInference ADD COLUMN cost Nullable(Decimal(18, 9))
Postgres migration: ALTER TABLE tensorzero.model_inferences ADD COLUMN cost NUMERIC(18, 9)
All SELECT/INSERT queries for model_inferences updated to include cost

Inference pipeline

Non-streaming: cost computed immediately after provider response from raw_response
Streaming: cost computed per-chunk inside stream processing loop, enriching chunks before yielding to downstream consumers
Embeddings: cost computed after embedding response (always non-streaming)
Usage struct gains cost: Option<Cost> field (derives Copy)
Streaming aggregation: uses streaming_max for cost (same strategy as tokens)
Cross-model aggregation: uses sum_or_poison — sums cost when all inferences have cost; any None → total None (poison semantics)

API / wire types

ModelInference (internal API): new cost: Option<Cost> (omitted when None)
StoredModelInference: new cost: Option<Cost>
OpenAICompatibleUsage: new tensorzero_cost: Option<Decimal> extension field

Provider changes

All 16 providers updated to initialize cost: None in Usage construction (cost is computed later from raw_response): Anthropic, AWS Bedrock, Dummy, Fireworks, GCP Vertex Anthropic, GCP Vertex Gemini, Google AI Studio Gemini, Groq, Mistral, OpenAI, OpenAI Responses, OpenRouter, TGI, Together, XAI, relay.

Variant changes

cost field threaded through best_of_n_sampling, chat_completion, mixture_of_n, and variant mod.rs.

Optimizer changes

All 4 optimizers updated with cost: None: DICL, Fireworks SFT, OpenAI, Together SFT.

Dependencies

rust_decimal = { version = "1.37", features = ["serde-float"] } (new)
json_pointer (new, for pointer validation)
sqlx feature "rust_decimal" added

Tests

69 cost-related tests passing (unit + E2E):

Unit tests in `cost.rs` (45 tests)

Rate conversion: per-million → per-unit normalization, per-unit passthrough
Cost computation: basic token pricing, per-unit pricing (web searches), required/optional missing fields, negative totals, empty config, split pointers (streaming vs non-streaming), unparseable values, string-encoded numbers, booleans, zero tokens, float values, mixed positive/negative (caching discount), discount exceeding base cost
Response parsing: valid JSON, invalid JSON
TOML deserialization: unified pointers, per-unit rates, split pointers, invalid pointers, invalid split pointers, missing rates, missing pointers, invalid cost values, exact decimal precision (0.1, 0.3), string-quoted decimals, negative costs, both rates present, empty pointer string
Config normalization: per-million → per-unit, per-unit passthrough, load_cost_config with valid config
JSON Pointer validation: valid, invalid (missing leading slash)
Full pipeline (TOML → normalize → provider JSON → cost): OpenAI-style, Anthropic-style with cache, required field missing, per-unit web search, invalid JSON, zero tokens, large token counts (GPT-4), Gemini-style nested usage, direct cost passthrough
Warning assertion: test_compute_cost_required_missing_logs_warning uses capture_logs() to verify warning is logged when required pointer is missing

Unit tests in `usage.rs` (20 tests)

Streaming aggregation: empty, single chunk, cumulative (Anthropic-style), final-only (OpenAI-style), all None, mixed None/Some, non-cumulative detection (debug_assert), Anthropic real-world pattern
Cross-model aggregation: empty, single, multiple all-Some, None propagation (poison semantics), both None propagate, all None
Cost aggregation: both present (sum), one missing poisons total, missing first poisons total, all None, zero is present (not missing), empty returns zero, three-way mix poisons (mixture-of-N scenario)

E2E tests (16 tests)

Simple cost (non-streaming + streaming): cost present in API response, cost stored correctly in ClickHouse
DICL poison semantics (non-streaming + streaming): model has cost config, embedding model doesn't → total cost is None
Best-of-N poison semantics (non-streaming + streaming): candidates have cost, evaluator doesn't → total cost is None
Best-of-N all-cost (non-streaming + streaming): all providers (candidates + evaluator) have cost config → total cost is sum
Mixture-of-N poison semantics (non-streaming + streaming): one candidate model lacks cost → total cost is None
Mixture-of-N all-cost (non-streaming + streaming): all providers (candidates + fuser) have cost config → total cost is sum

Test infrastructure additions

best_of_n_0_with_usage dummy provider: returns evaluator content with standard usage fields for cost tracking tests
dummy_dicl variant: uses dummy-embedding-model (no cost config) for DICL poison semantics tests
cost_test_best_of_n, cost_test_best_of_n_all_cost, cost_test_mixture_of_n, cost_test_mixture_of_n_all_cost functions in E2E config

Test plan

cargo check --all-targets --all-features
cargo clippy --all-targets --all-features -- -D warnings
cargo fmt --check
Unit tests: 65 tests passing (cost.rs + usage.rs)
E2E tests: 69 tests run, 69 passed (including 16 cost-specific E2E tests)
All existing tests continue to pass (6387 skipped = non-cost tests filtered out)

Closes #6260, closes #6261

Add the ability to configure per-provider cost tracking using JSON Pointer mappings from raw provider responses. Cost is computed at inference time and stored in both ClickHouse and Postgres as Decimal(18, 9). Key changes: - New `cost` module with config types, computation, and validation - Support for `cost_per_million` and `cost_per_unit` rate types - Split pointer config for streaming vs non-streaming responses - Negative rates allowed (caching discounts); negative totals → None - DB migrations for ClickHouse (migration 0046) and Postgres - Cost field threaded through all providers, variants, and inference types - 22 unit tests covering computation, edge cases, and TOML deserialization Closes tensorzero#6260, closes tensorzero#6261 Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-02-11T22:15:22Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

tensorzero-core/src/db/clickhouse/migration_manager/migrations/migration_0046.rs

tensorzero-core/src/db/postgres/model_inferences.rs

tensorzero-core/src/endpoints/openai_compatible/types/usage.rs

tensorzero-core/src/inference/types/usage.rs

tensorzero-core/src/cost.rs

GabrielBianconi

I'd like to see tests (possibly E2E tests) that cover:

Gateway relay
Advanced variant types (best/mixture of N, DICL)

^ non-streaming and streaming versions of all of the above

tensorzero-core/src/embeddings.rs

tensorzero-core/src/cost.rs

GabrielBianconi

Following up on my comment above, in addition to success cases, we'll want to test full error cases (missing required cost info) data and mixed cases (e.g. DICL - embeddings doesn't have proper cost - None; model inference does; combined should downgrade to missing).

tensorzero-core/src/cost.rs

tensorzero-core/src/embeddings.rs

tensorzero-core/src/cost.rs

tensorzero-core/src/model.rs

virajmehta

see inline comments, nice start!

…raw responses - Introduce UninitializedCostConfig/CostConfig pattern for per_million -> per_unit normalization - Add JsonPointer newtype with validation - Compute cost per-chunk in streaming, aggregate via max - Add cost column to Postgres model_inferences - Add E2E tests for cost in streaming/non-streaming responses and ClickHouse persistence - Replace hardcoded bind parameter numbers in tests with generated values Co-authored-by: Cursor <cursoragent@cursor.com>

…l for v12 Co-authored-by: Cursor <cursoragent@cursor.com>

gateway/tests/relay/cost.rs

Verify that cost is absent when cost config exists on the relay gateway but not on the downstream, since the downstream is the one that actually computes cost. Covers both non-streaming and streaming. Co-authored-by: Cursor <cursoragent@cursor.com>

Verify that when both relay and downstream have cost config with different rates, the downstream's computed cost is preserved — the relay doesn't recompute or override. Covers streaming and non-streaming. Co-authored-by: Cursor <cursoragent@cursor.com>

Cover advanced variant types through relay: - Best-of-N with cost (non-streaming + streaming) - Best-of-N poison semantics (evaluator lacks cost config) - Mixture-of-N with cost (non-streaming + streaming) - Mixture-of-N poison semantics (fuser lacks cost config) Co-authored-by: Cursor <cursoragent@cursor.com>

AntoineToussaint · 2026-02-13T21:15:17Z

Addressing @GabrielBianconi's review requests for relay + advanced variant + error/mixed case tests:

Gateway relay cost tests (gateway/tests/relay/cost.rs — 18 tests):

Basic cost propagation (8 tests):

Cost present when downstream has cost config (non-streaming + streaming)
Cost absent when no cost config (non-streaming + streaming)
Cost through function/variant (non-streaming + streaming)
Different cost rates, token counts alongside cost

Cost config location (4 tests):

Cost on relay only → absent (downstream is what matters) — non-streaming + streaming
Both sides have cost config → downstream's value wins — non-streaming + streaming

Advanced variants (6 tests):

Best-of-N with all cost configs — non-streaming + streaming
Mixture-of-N with all cost configs — non-streaming + streaming
Best-of-N poison semantics (evaluator lacks cost config → None) — non-streaming
Mixture-of-N poison semantics (fuser lacks cost config → None) — non-streaming

Error/mixed cases already covered across unit + E2E tests:

test_aggregate_cost_one_missing_poisons_total — DICL scenario: model has cost, embedding doesn't → None
test_aggregate_cost_three_way_mix_poisons — mixture-of-N: one candidate without cost → None
test_aggregate_cost_zero_is_present — Some(0) is distinct from None
E2E: test_cost_dicl_poison_* — embedding model lacks cost, chat model has cost → total None
E2E: test_cost_best_of_n_poison_* — evaluator lacks cost → total None
E2E: test_cost_mixture_of_n_poison_* — one candidate lacks cost → total None
Unit: test_compute_cost_required_missing_logs_warning — missing required field logs warning, returns None

DICL is not tested through relay because it requires embedding model lookups which add significant setup complexity; the E2E tests already cover DICL cost scenarios end-to-end.

…cost-tracking-backend

virajmehta · 2026-02-15T15:40:38Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efd75bc279

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tensorzero-core/src/inference/types/usage.rs

…cost-tracking-backend

GabrielBianconi · 2026-02-16T23:41:25Z

/merge-queue

GabrielBianconi · 2026-02-16T23:41:28Z

/check-fork

github-actions · 2026-02-16T23:42:19Z

🚀 General Checks workflow triggered on new branch external-contributor/pr-6286!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/22080794691

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ecccfe03c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-16T23:47:14Z

tensorzero-core/src/model.rs

+    fn relay_cost_config(&self) -> Option<&CostConfig> {
+        self.routing
+            .iter()
+            .filter_map(|name| self.providers.get(name))
+            .find_map(|p| p.cost.as_ref())


Use relay cost config for the provider that actually answered

relay_cost_config() always picks the first routed provider’s pricing config, and that single config is then used for relay cost computation in both streaming and non-streaming paths. For models with fallback routing where providers have different cost pointers/rates, a successful response from a later provider will be priced with the wrong config (or fail pointer extraction), producing incorrect/missing cost values in production relay setups.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-16T23:47:14Z

tensorzero-core/src/endpoints/internal/model_inferences.rs

+    /// Cost of this model inference in dollars.
+    /// `None` means cost tracking was not configured for this provider or the provider did not send the necessary information.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    #[cfg_attr(feature = "ts-bindings", ts(type = "number"))]


Keep ModelInference.cost optional in generated TS bindings

This field is serialized with skip_serializing_if = "Option::is_none", so responses legitimately omit cost when tracking is unavailable, but the added ts(type = "number") makes the generated binding non-optional (cost: number). That creates a wire-type mismatch for frontend consumers, who will treat cost as always present even though the API can omit it.

Useful? React with 👍 / 👎.

AntoineToussaint requested review from Aaron1011 and virajmehta as code owners February 11, 2026 22:15

AntoineToussaint mentioned this pull request Feb 11, 2026

Display cost data for individual inferences in the UI #6288

Open

5 tasks

github-actions bot added the has-merge-conflicts label Feb 11, 2026