Reduce e2e test flakiness by AntoineToussaint · Pull Request #6285 · tensorzero/tensorzero

AntoineToussaint · 2026-02-11T21:58:05Z

Systematic fix for the most common sources of merge-queue flakes (#5609). Replaces fixed sleeps before database reads with polling helpers that retry with exponential backoff, and removes the sleep_for_writes_to_be_visible helper entirely in favor of flush_pending_writes (which calls SYSTEM FLUSH ASYNC INSERT QUEUE).

Also adds Docker health checks for Memcached/Tempo/OTel Collector and fixes two smaller races: concurrent inserts in test_count_feedback_by_target_id and log capture timing in test_warn_ignored_thought_block.

Made with Cursor

github-actions · 2026-02-11T21:58:15Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

AntoineToussaint · 2026-02-11T22:02:04Z

I have read the Contributor License Agreement (CLA) and hereby sign the CLA.

AntoineToussaint · 2026-02-11T22:09:47Z

recheck

virajmehta · 2026-02-11T22:32:50Z

@BugBot review

cursor · 2026-02-11T22:32:53Z

Bugbot couldn't run

Bugbot is not enabled for your user on this team.

Ask your team administrator to increase your team's hard limit for Bugbot seats or add you to the allowlist in the Cursor dashboard.

virajmehta · 2026-02-12T14:51:19Z

/check-fork

github-actions · 2026-02-12T14:52:24Z

🚀 General Checks workflow triggered on new branch external-contributor/pr-6285!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/21951514181

virajmehta · 2026-02-12T14:52:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9aa3900297

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-12T14:59:33Z

tensorzero-core/tests/e2e/best_of_n.rs

+    let results: Vec<Value> = poll_result_until_some(async || {
+        select_model_inferences_clickhouse(&clickhouse, inference_id).await
+    })


Wait for full model inference fan-out before asserting count

poll_result_until_some exits as soon as the closure returns Some, but select_model_inferences_clickhouse returns Some for any non-empty result set (it only returns None when zero rows are found). In this test, that means polling can stop after the first ModelInference row is visible and then immediately assert results.len() == 4, which reintroduces a race when the remaining rows are still being inserted. Please poll on the expected cardinality (for example, only return Some when rows.len() == 4) before asserting.

Useful? React with 👍 / 👎.

Addressed in 882349e. The polling now only returns Some when the expected cardinality is reached (e.g. rows.len() == 4), so we no longer exit early on the first non-empty result. The redundant assert_eq!(results.len(), ...) after polling is also removed since the cardinality is guaranteed by the polling condition itself.

virajmehta · 2026-02-12T15:13:20Z

/check-fork

github-actions · 2026-02-12T15:14:19Z

🚀 General Checks workflow triggered on new branch external-contributor/pr-6285!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/21952327910

Replace fixed sleeps before ClickHouse/Postgres reads with polling helpers that retry with exponential backoff. This addresses the most common source of merge-queue flakes (issue tensorzero#5609). Key changes: - Add `poll_result_until_some` helper and `poll_clickhouse_for_result!` macro - Replace ~60 `sleep_for_writes_to_be_visible` calls with `flush_pending_writes` - Remove the `sleep_for_writes_to_be_visible` trait method entirely - Convert all sleep-then-read patterns in e2e tests to use polling - Add Docker health checks for Memcached, Tempo, and OTel Collector - Fix race in `test_count_feedback_by_target_id` (concurrent inserts) - Fix race in `test_warn_ignored_thought_block` (log capture timing) - Replace fixed sleeps in cache tests with polling for cache consistency - Replace 25s sleep in otel_export with Tempo trace polling Co-authored-by: Cursor <cursoragent@cursor.com>

- providers/openai/mod.rs: poll_clickhouse_for_result for all selects, poll for embedding cache - providers/openai/custom_tools.rs: replace sleeps with poll_clickhouse_for_result - providers/anthropic.rs: replace sleeps with poll_clickhouse_for_result - openai_compatible.rs: replace sleeps with poll_clickhouse_for_result - dicl.rs: flush_pending_writes instead of sleep, poll_result_until_some for model inference - inference/mod.rs: remove sleep in dryrun test - cache.rs: poll for log capture, poll_clickhouse_for_result for model inference Co-authored-by: Cursor <cursoragent@cursor.com>

- cache.rs: Add check_test_streaming_cache_cached, use poll_result_until_some, remove sleeps - rate_limiting.rs: Replace 5s sleep before logs_contain with poll_result_until_some - timeouts.rs: Replace immediate log check with poll for timeout log - prometheus.rs: Replace all 1s sleeps before metrics checks with poll_result_until_some Co-authored-by: Cursor <cursoragent@cursor.com>

Avoid race where poll_result_until_some exits on first non-empty result before all model inference rows are inserted. Only return Some when rows.len() == expected_count. Co-authored-by: Cursor <cursoragent@cursor.com>

GabrielBianconi · 2026-02-16T16:39:09Z

Let's split this PR into smaller incremental changes. This one is too large, which makes it challenging to debug and review. It'll be tough to get all tests to pass with this. But agree the methodology is better!

AntoineToussaint force-pushed the fix/reduce-test-flakiness branch from 3d0abc4 to 9aa3900 Compare February 11, 2026 22:09

github-actions bot added a commit that referenced this pull request Feb 11, 2026

@AntoineToussaint has signed the CLA in #6285

1689159

virajmehta self-assigned this Feb 11, 2026

chatgpt-codex-connector bot reviewed Feb 12, 2026

View reviewed changes

github-actions bot added the has-merge-conflicts label Feb 12, 2026

AntoineToussaint and others added 4 commits February 12, 2026 13:24

AntoineToussaint force-pushed the fix/reduce-test-flakiness branch from 2e5a556 to 882349e Compare February 12, 2026 18:34

github-actions bot added has-merge-conflicts and removed has-merge-conflicts labels Feb 12, 2026

GabrielBianconi closed this Feb 16, 2026

Comments

Conversation

AntoineToussaint commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntoineToussaint commented Feb 11, 2026

Uh oh!

AntoineToussaint commented Feb 11, 2026

Uh oh!

virajmehta commented Feb 11, 2026

Uh oh!

cursor bot commented Feb 11, 2026

Bugbot couldn't run

Uh oh!

virajmehta commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

virajmehta commented Feb 12, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

AntoineToussaint Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

virajmehta commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

GabrielBianconi commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 11, 2026 •

edited

Loading