Incremental latency quantiles computation#6310
Conversation
6492b8a to
ebf6027
Compare
|
@BugBot review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
@amishler - do you have any other suggestions on how we could incrementally compute latency quantiles, so queries don't have to scan the whole table? we can't take on tdigest extensions, and ideally we can implement in sql fully. |
For quantiles you need to maintain the full distribution, so the two options in principle to avoid computing over the whole table are (1) lossy or (2) lossless compression of the latency distribution. Lossy is what this approach does. Lossless would essentially mean maintaining a frequency table with counts per millisecond value - similar to bucketing in that you collapse rows into counts, but you don't bucket over the x-axis. That obviously only saves space if you have a lot of duplicate ms values, and I don't know if it's feasible space-wise for rollups over longer time scales like an hour. You could consider other lossless compression schemes like Huffman encoding but I have no idea if they can be implemented easily in sql. Basically any lossy approach involves bucketing. The log-scaled buckets seems sensible if the latency distribution is heavily right-skewed (for example approximately log-normal), which seems like is usually the case in practice. You could potentially come up with more principled buckets for specific distributions + specific quantiles of interest, but you'd have to know the distribution in advance. Without that I'd favor this approach. |
amishler
left a comment
There was a problem hiding this comment.
From a stats perspective, this approach makes sense. The relative error compared to the actual empirical quantiles is ~1% across all buckets which is nice. See my comment inline also.
Agree that small/large quantiles shouldn't be reported when counts are small. This is more a statistical issue than a data compression issue: empirical quantiles in the tails generally have higher variances than in the middle of the distribution, so you need more data to estimate the true quantiles well.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ebf602739c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql
Show resolved
Hide resolved
...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql
Outdated
Show resolved
Hide resolved
ebf6027 to
e45f5cc
Compare
9aced72 to
108658f
Compare
e45f5cc to
e3996ef
Compare
...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql
Show resolved
Hide resolved
...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql
Outdated
Show resolved
Hide resolved
...rzero-core/src/db/postgres/migrations/20260212110000_incremental_model_latency_quantiles.sql
Outdated
Show resolved
Hide resolved
virajmehta
left a comment
There was a problem hiding this comment.
generally good, just some comments about maintainability. I wish we could test this code properly not sure if that is possible.
7c3f334 to
1f83cb2
Compare
|
shifted the bucketing and quantiles logic into rust, now the migration is only for rolling up raw data into minutes/hours tables |
1f83cb2 to
b0f1b3b
Compare
Instead of calculating and storing exact percentiles in a materialized view, this buckets request latencies into
floor(log2(latency) * 64buckets (so total number of buckets is practically bounded) to give us reasonable estimates of p99.9 latency. We can use midpoint of the bucket as the estimated latency, and if the tail latency is 1 minute, our worst-case error from estimation is ~300ms.At query time, quantiles are computed from histogram CDFs:
Also we should consider not returning p0.1 and p99.9 when count is small, because the error will be high.
A step towards #5691.
Note
Medium Risk
Touches core Postgres schema and background refresh scheduling for dashboard latency metrics; mistakes could lead to stale/incorrect quantiles or increased DB load during refresh windows.
Overview
Replaces the
model_latency_quantiles*materialized views with incrementally maintained sparse latency histograms (minute + hour rollups) and regular views that compute approximate quantiles from those histograms.Adds new Postgres tables/functions to bucket latencies (log2 buckets) and refresh rollups incrementally with persisted watermarks, and updates pg_cron setup/validation + e2e tests + fixture scripts to run the new
tensorzero_refresh_model_latency_histograms_incrementaljob instead ofREFRESH MATERIALIZED VIEW.Written by Cursor Bugbot for commit ebf6027. This will update automatically on new commits. Configure here.