Add cost tracking for model providers#6286
Add cost tracking for model providers#6286AntoineToussaint wants to merge 18 commits intotensorzero:mainfrom
Conversation
Add the ability to configure per-provider cost tracking using JSON Pointer mappings from raw provider responses. Cost is computed at inference time and stored in both ClickHouse and Postgres as Decimal(18, 9). Key changes: - New `cost` module with config types, computation, and validation - Support for `cost_per_million` and `cost_per_unit` rate types - Split pointer config for streaming vs non-streaming responses - Negative rates allowed (caching discounts); negative totals → None - DB migrations for ClickHouse (migration 0046) and Postgres - Cost field threaded through all providers, variants, and inference types - 22 unit tests covering computation, edge cases, and TOML deserialization Closes tensorzero#6260, closes tensorzero#6261 Co-authored-by: Cursor <cursoragent@cursor.com>
|
All contributors have signed the CLA ✍️ ✅ |
tensorzero-core/src/db/clickhouse/migration_manager/migrations/migration_0046.rs
Show resolved
Hide resolved
GabrielBianconi
left a comment
There was a problem hiding this comment.
I'd like to see tests (possibly E2E tests) that cover:
- Gateway relay
- Advanced variant types (best/mixture of N, DICL)
^ non-streaming and streaming versions of all of the above
GabrielBianconi
left a comment
There was a problem hiding this comment.
Following up on my comment above, in addition to success cases, we'll want to test full error cases (missing required cost info) data and mixed cases (e.g. DICL - embeddings doesn't have proper cost - None; model inference does; combined should downgrade to missing).
virajmehta
left a comment
There was a problem hiding this comment.
see inline comments, nice start!
…raw responses - Introduce UninitializedCostConfig/CostConfig pattern for per_million -> per_unit normalization - Add JsonPointer newtype with validation - Compute cost per-chunk in streaming, aggregate via max - Add cost column to Postgres model_inferences - Add E2E tests for cost in streaming/non-streaming responses and ClickHouse persistence - Replace hardcoded bind parameter numbers in tests with generated values Co-authored-by: Cursor <cursoragent@cursor.com>
…l for v12 Co-authored-by: Cursor <cursoragent@cursor.com>
Verify that cost is absent when cost config exists on the relay gateway but not on the downstream, since the downstream is the one that actually computes cost. Covers both non-streaming and streaming. Co-authored-by: Cursor <cursoragent@cursor.com>
Verify that when both relay and downstream have cost config with different rates, the downstream's computed cost is preserved — the relay doesn't recompute or override. Covers streaming and non-streaming. Co-authored-by: Cursor <cursoragent@cursor.com>
Cover advanced variant types through relay: - Best-of-N with cost (non-streaming + streaming) - Best-of-N poison semantics (evaluator lacks cost config) - Mixture-of-N with cost (non-streaming + streaming) - Mixture-of-N poison semantics (fuser lacks cost config) Co-authored-by: Cursor <cursoragent@cursor.com>
|
Addressing @GabrielBianconi's review requests for relay + advanced variant + error/mixed case tests: Gateway relay cost tests ( Basic cost propagation (8 tests):
Cost config location (4 tests):
Advanced variants (6 tests):
Error/mixed cases already covered across unit + E2E tests:
DICL is not tested through relay because it requires embedding model lookups which add significant setup complexity; the E2E tests already cover DICL cost scenarios end-to-end. |
…cost-tracking-backend
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efd75bc279
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cost-tracking-backend
8ecccfe
|
/merge-queue |
|
/check-fork |
|
🚀 General Checks workflow triggered on new branch external-contributor/pr-6286! View the run: https://github.com/tensorzero/tensorzero/actions/runs/22080794691 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8ecccfe03c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn relay_cost_config(&self) -> Option<&CostConfig> { | ||
| self.routing | ||
| .iter() | ||
| .filter_map(|name| self.providers.get(name)) | ||
| .find_map(|p| p.cost.as_ref()) |
There was a problem hiding this comment.
Use relay cost config for the provider that actually answered
relay_cost_config() always picks the first routed provider’s pricing config, and that single config is then used for relay cost computation in both streaming and non-streaming paths. For models with fallback routing where providers have different cost pointers/rates, a successful response from a later provider will be priced with the wrong config (or fail pointer extraction), producing incorrect/missing cost values in production relay setups.
Useful? React with 👍 / 👎.
| /// Cost of this model inference in dollars. | ||
| /// `None` means cost tracking was not configured for this provider or the provider did not send the necessary information. | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| #[cfg_attr(feature = "ts-bindings", ts(type = "number"))] |
There was a problem hiding this comment.
Keep
ModelInference.cost optional in generated TS bindings
This field is serialized with skip_serializing_if = "Option::is_none", so responses legitimately omit cost when tracking is unavailable, but the added ts(type = "number") makes the generated binding non-optional (cost: number). That creates a wire-type mismatch for frontend consumers, who will treat cost as always present even though the API can omit it.
Useful? React with 👍 / 👎.
Summary
Adds cost tracking infrastructure for model and embedding providers. Cost is computed at inference time from raw provider responses using user-configured JSON Pointer mappings, then stored in both ClickHouse and Postgres.
New module:
cost.rsCosttype alias forrust_decimal::Decimal(stored asDecimal(18, 9)/NUMERIC(18, 9))UninitializedCostConfigEntry/UninitializedCostRate— user-facing types withcost_per_millionandcost_per_unit, deserialized from TOML/JSONCostConfigEntry/CostRate— normalized runtime types (always per-unit), converted at config load time viaTryFrom/FromimplsCostPointerConfig(unified or split streaming/non-streaming pointers) — stored asStringin user-facing config, validated and parsed intojson_pointer::JsonPointerat config load timeResponseModeenum (NonStreaming/Streaming) instead of boolean flagcompute_cost_from_response()/compute_cost_from_json()— extracts values via JSON Pointers, applies rates, sums contributionsCostConfigEntry::compute_contribution()— per-entry cost computation, easily extensible for future expression-based cost modelsdecimal_from_valueserde visitor — formats f64 to shortest decimal string (Ryu algorithm) then parses toDecimal, avoiding binary float precision loss (e.g.0.1in TOML → exactlyDecimal(0.1))NoneConfiguration
cost: Option<CostConfig>field on model providers and embedding providersjson_pointercrateDatabase
ALTER TABLE ModelInference ADD COLUMN cost Nullable(Decimal(18, 9))ALTER TABLE tensorzero.model_inferences ADD COLUMN cost NUMERIC(18, 9)model_inferencesupdated to includecostInference pipeline
raw_responseUsagestruct gainscost: Option<Cost>field (derivesCopy)streaming_maxfor cost (same strategy as tokens)sum_or_poison— sums cost when all inferences have cost; anyNone→ totalNone(poison semantics)API / wire types
ModelInference(internal API): newcost: Option<Cost>(omitted whenNone)StoredModelInference: newcost: Option<Cost>OpenAICompatibleUsage: newtensorzero_cost: Option<Decimal>extension fieldProvider changes
All 16 providers updated to initialize
cost: NoneinUsageconstruction (cost is computed later fromraw_response): Anthropic, AWS Bedrock, Dummy, Fireworks, GCP Vertex Anthropic, GCP Vertex Gemini, Google AI Studio Gemini, Groq, Mistral, OpenAI, OpenAI Responses, OpenRouter, TGI, Together, XAI, relay.Variant changes
costfield threaded throughbest_of_n_sampling,chat_completion,mixture_of_n, and variantmod.rs.Optimizer changes
All 4 optimizers updated with
cost: None: DICL, Fireworks SFT, OpenAI, Together SFT.Dependencies
rust_decimal = { version = "1.37", features = ["serde-float"] }(new)json_pointer(new, for pointer validation)sqlxfeature"rust_decimal"addedTests
69 cost-related tests passing (unit + E2E):
Unit tests in
cost.rs(45 tests)load_cost_configwith valid configtest_compute_cost_required_missing_logs_warningusescapture_logs()to verify warning is logged when required pointer is missingUnit tests in
usage.rs(20 tests)E2E tests (16 tests)
NoneNoneNoneTest infrastructure additions
best_of_n_0_with_usagedummy provider: returns evaluator content with standard usage fields for cost tracking testsdummy_diclvariant: usesdummy-embedding-model(no cost config) for DICL poison semantics testscost_test_best_of_n,cost_test_best_of_n_all_cost,cost_test_mixture_of_n,cost_test_mixture_of_n_all_costfunctions in E2E configTest plan
cargo check --all-targets --all-featurescargo clippy --all-targets --all-features -- -D warningscargo fmt --checkcost.rs+usage.rs)Closes #6260, closes #6261