Incremental latency quantiles computation by shuyangli · Pull Request #6310 · tensorzero/tensorzero

shuyangli · 2026-02-12T19:29:12Z

Instead of calculating and storing exact percentiles in a materialized view, this buckets request latencies into floor(log2(latency) * 64 buckets (so total number of buckets is practically bounded) to give us reasonable estimates of p99.9 latency. We can use midpoint of the bucket as the estimated latency, and if the tail latency is 1 minute, our worst-case error from estimation is ~300ms.

At query time, quantiles are computed from histogram CDFs:

Aggregate bucket counts over the requested window per (model, metric, bucket_id).
Build cumulative counts ordered by bucket_id.
For each target quantile, compute rank target = 1 + quantile * (total_count - 1).
Pick the first bucket where cumulative count >= rank target.
Interpolate within that bucket in log-space between bucket bounds to estimate the quantile value.

Also we should consider not returning p0.1 and p99.9 when count is small, because the error will be high.

A step towards #5691.

Note

Medium Risk
Touches core Postgres schema and background refresh scheduling for dashboard latency metrics; mistakes could lead to stale/incorrect quantiles or increased DB load during refresh windows.

Overview
Replaces the model_latency_quantiles* materialized views with incrementally maintained sparse latency histograms (minute + hour rollups) and regular views that compute approximate quantiles from those histograms.

Adds new Postgres tables/functions to bucket latencies (log2 buckets) and refresh rollups incrementally with persisted watermarks, and updates pg_cron setup/validation + e2e tests + fixture scripts to run the new tensorzero_refresh_model_latency_histograms_incremental job instead of REFRESH MATERIALIZED VIEW.

^{Written by Cursor Bugbot for commit ebf6027. This will update automatically on new commits. Configure here.}

shuyangli · 2026-02-12T20:31:51Z

@BugBot review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

tensorzero-core/tests/e2e/db/postgres/pgcron_tests.rs

shuyangli · 2026-02-12T21:42:02Z

@amishler - do you have any other suggestions on how we could incrementally compute latency quantiles, so queries don't have to scan the whole table? we can't take on tdigest extensions, and ideally we can implement in sql fully.

amishler · 2026-02-12T23:40:39Z

@amishler - do you have any other suggestions on how we could incrementally compute latency quantiles, so queries don't have to scan the whole table? we can't take on tdigest extensions, and ideally we can implement in sql fully.

For quantiles you need to maintain the full distribution, so the two options in principle to avoid computing over the whole table are (1) lossy or (2) lossless compression of the latency distribution. Lossy is what this approach does. Lossless would essentially mean maintaining a frequency table with counts per millisecond value - similar to bucketing in that you collapse rows into counts, but you don't bucket over the x-axis. That obviously only saves space if you have a lot of duplicate ms values, and I don't know if it's feasible space-wise for rollups over longer time scales like an hour. You could consider other lossless compression schemes like Huffman encoding but I have no idea if they can be implemented easily in sql.

Basically any lossy approach involves bucketing. The log-scaled buckets seems sensible if the latency distribution is heavily right-skewed (for example approximately log-normal), which seems like is usually the case in practice. You could potentially come up with more principled buckets for specific distributions + specific quantiles of interest, but you'd have to know the distribution in advance. Without that I'd favor this approach.

amishler

From a stats perspective, this approach makes sense. The relative error compared to the actual empirical quantiles is ~1% across all buckets which is nice. See my comment inline also.

Agree that small/large quantiles shouldn't be reported when counts are small. This is more a statistical issue than a data compression issue: empirical quantiles in the tails generally have higher variances than in the middle of the distribution, so you need more data to estimate the true quantiles well.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebf602739c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql

virajmehta

generally good, just some comments about maintainability. I wish we could test this code properly not sure if that is possible.

shuyangli · 2026-02-13T17:58:02Z

shifted the bucketing and quantiles logic into rust, now the migration is only for rolling up raw data into minutes/hours tables

github-actions bot added the stacked-pr-blocked-on-base-pr label Feb 12, 2026

shuyangli force-pushed the sl/incremental-latency-quantile branch 4 times, most recently from 6492b8a to ebf6027 Compare February 12, 2026 20:21

shuyangli marked this pull request as ready for review February 12, 2026 20:31

shuyangli requested review from Aaron1011 and virajmehta as code owners February 12, 2026 20:31

cursor bot reviewed Feb 12, 2026

View reviewed changes

tensorzero-core/tests/e2e/db/postgres/pgcron_tests.rs Outdated Show resolved Hide resolved

shuyangli assigned amishler Feb 12, 2026

shuyangli requested a review from amishler February 12, 2026 21:40

amishler reviewed Feb 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 13, 2026

View reviewed changes

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql Show resolved Hide resolved

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql Outdated Show resolved Hide resolved

shuyangli force-pushed the sl/incremental-latency-quantile branch from ebf6027 to e45f5cc Compare February 13, 2026 01:03

shuyangli force-pushed the sl/incremental-model-provider-stats branch from 9aced72 to 108658f Compare February 13, 2026 01:03

Base automatically changed from sl/incremental-model-provider-stats to main February 13, 2026 03:41

github-actions bot added has-merge-conflicts and removed stacked-pr-blocked-on-base-pr labels Feb 13, 2026

shuyangli force-pushed the sl/incremental-latency-quantile branch from e45f5cc to e3996ef Compare February 13, 2026 04:01

github-actions bot removed the has-merge-conflicts label Feb 13, 2026

shuyangli assigned virajmehta Feb 13, 2026

virajmehta reviewed Feb 13, 2026

View reviewed changes

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql Show resolved Hide resolved

virajmehta reviewed Feb 13, 2026

View reviewed changes

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql Outdated Show resolved Hide resolved

virajmehta reviewed Feb 13, 2026

View reviewed changes

...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql Outdated Show resolved Hide resolved

virajmehta requested changes Feb 13, 2026

View reviewed changes

virajmehta assigned shuyangli and unassigned virajmehta Feb 13, 2026

shuyangli unassigned amishler Feb 13, 2026

shuyangli force-pushed the sl/incremental-latency-quantile branch 6 times, most recently from 7c3f334 to 1f83cb2 Compare February 13, 2026 17:53

shuyangli requested a review from virajmehta February 13, 2026 17:55

shuyangli assigned virajmehta and unassigned shuyangli Feb 13, 2026

virajmehta previously approved these changes Feb 13, 2026

View reviewed changes

virajmehta added this pull request to the merge queue Feb 13, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 13, 2026

shuyangli added 3 commits February 13, 2026 17:15

Incremental latency quantiles computation

975bff0

Pull raw bucket numbers and compute quantiles in rust

0131a02

fix fixture loading

b0f1b3b

shuyangli dismissed virajmehta’s stale review via b0f1b3b February 13, 2026 22:15

shuyangli force-pushed the sl/incremental-latency-quantile branch from 1f83cb2 to b0f1b3b Compare February 13, 2026 22:15

shuyangli assigned shuyangli and unassigned virajmehta Feb 13, 2026

shuyangli enabled auto-merge February 13, 2026 22:30

Aaron1011 approved these changes Feb 13, 2026

View reviewed changes

shuyangli added this pull request to the merge queue Feb 13, 2026

Merged via the queue into main with commit 12b204d Feb 13, 2026
63 checks passed

shuyangli deleted the sl/incremental-latency-quantile branch February 13, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Incremental latency quantiles computation#6310

Incremental latency quantiles computation#6310
shuyangli merged 3 commits intomainfrom
sl/incremental-latency-quantile

shuyangli commented Feb 12, 2026 •

edited by cursor bot

Loading

Uh oh!

shuyangli commented Feb 12, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

shuyangli commented Feb 12, 2026

Uh oh!

amishler commented Feb 12, 2026 •

edited

Loading

Uh oh!

amishler left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

virajmehta left a comment

Uh oh!

shuyangli commented Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

shuyangli commented Feb 12, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuyangli commented Feb 12, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shuyangli commented Feb 12, 2026

Uh oh!

amishler commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amishler left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

virajmehta left a comment

Choose a reason for hiding this comment

Uh oh!

shuyangli commented Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shuyangli commented Feb 12, 2026 •

edited by cursor bot

Loading

amishler commented Feb 12, 2026 •

edited

Loading