Skip to content

Comments

Reduce e2e test flakiness#6285

Closed
AntoineToussaint wants to merge 4 commits intotensorzero:mainfrom
AntoineToussaint:fix/reduce-test-flakiness
Closed

Reduce e2e test flakiness#6285
AntoineToussaint wants to merge 4 commits intotensorzero:mainfrom
AntoineToussaint:fix/reduce-test-flakiness

Conversation

@AntoineToussaint
Copy link
Contributor

Systematic fix for the most common sources of merge-queue flakes (#5609). Replaces fixed sleeps before database reads with polling helpers that retry with exponential backoff, and removes the sleep_for_writes_to_be_visible helper entirely in favor of flush_pending_writes (which calls SYSTEM FLUSH ASYNC INSERT QUEUE).

Also adds Docker health checks for Memcached/Tempo/OTel Collector and fixes two smaller races: concurrent inserts in test_count_feedback_by_target_id and log capture timing in test_warn_ignored_thought_block.

Made with Cursor

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@AntoineToussaint
Copy link
Contributor Author

I have read the Contributor License Agreement (CLA) and hereby sign the CLA.

@AntoineToussaint AntoineToussaint force-pushed the fix/reduce-test-flakiness branch from 3d0abc4 to 9aa3900 Compare February 11, 2026 22:09
@AntoineToussaint
Copy link
Contributor Author

recheck

github-actions bot added a commit that referenced this pull request Feb 11, 2026
@virajmehta virajmehta self-assigned this Feb 11, 2026
@virajmehta
Copy link
Member

@BugBot review

@cursor
Copy link

cursor bot commented Feb 11, 2026

Bugbot couldn't run

Bugbot is not enabled for your user on this team.

Ask your team administrator to increase your team's hard limit for Bugbot seats or add you to the allowlist in the Cursor dashboard.

@virajmehta
Copy link
Member

/check-fork

@github-actions
Copy link
Contributor

🚀 General Checks workflow triggered on new branch external-contributor/pr-6285!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/21951514181

@virajmehta
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9aa3900297

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 124 to 126
let results: Vec<Value> = poll_result_until_some(async || {
select_model_inferences_clickhouse(&clickhouse, inference_id).await
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wait for full model inference fan-out before asserting count

poll_result_until_some exits as soon as the closure returns Some, but select_model_inferences_clickhouse returns Some for any non-empty result set (it only returns None when zero rows are found). In this test, that means polling can stop after the first ModelInference row is visible and then immediately assert results.len() == 4, which reintroduces a race when the remaining rows are still being inserted. Please poll on the expected cardinality (for example, only return Some when rows.len() == 4) before asserting.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 882349e. The polling now only returns Some when the expected cardinality is reached (e.g. rows.len() == 4), so we no longer exit early on the first non-empty result. The redundant assert_eq!(results.len(), ...) after polling is also removed since the cardinality is guaranteed by the polling condition itself.

@virajmehta
Copy link
Member

/check-fork

@github-actions
Copy link
Contributor

🚀 General Checks workflow triggered on new branch external-contributor/pr-6285!

View the run: https://github.com/tensorzero/tensorzero/actions/runs/21952327910

AntoineToussaint and others added 4 commits February 12, 2026 13:24
Replace fixed sleeps before ClickHouse/Postgres reads with polling helpers
that retry with exponential backoff. This addresses the most common source
of merge-queue flakes (issue tensorzero#5609).

Key changes:
- Add `poll_result_until_some` helper and `poll_clickhouse_for_result!` macro
- Replace ~60 `sleep_for_writes_to_be_visible` calls with `flush_pending_writes`
- Remove the `sleep_for_writes_to_be_visible` trait method entirely
- Convert all sleep-then-read patterns in e2e tests to use polling
- Add Docker health checks for Memcached, Tempo, and OTel Collector
- Fix race in `test_count_feedback_by_target_id` (concurrent inserts)
- Fix race in `test_warn_ignored_thought_block` (log capture timing)
- Replace fixed sleeps in cache tests with polling for cache consistency
- Replace 25s sleep in otel_export with Tempo trace polling

Co-authored-by: Cursor <cursoragent@cursor.com>
- providers/openai/mod.rs: poll_clickhouse_for_result for all selects, poll for embedding cache
- providers/openai/custom_tools.rs: replace sleeps with poll_clickhouse_for_result
- providers/anthropic.rs: replace sleeps with poll_clickhouse_for_result
- openai_compatible.rs: replace sleeps with poll_clickhouse_for_result
- dicl.rs: flush_pending_writes instead of sleep, poll_result_until_some for model inference
- inference/mod.rs: remove sleep in dryrun test
- cache.rs: poll for log capture, poll_clickhouse_for_result for model inference

Co-authored-by: Cursor <cursoragent@cursor.com>
- cache.rs: Add check_test_streaming_cache_cached, use poll_result_until_some, remove sleeps
- rate_limiting.rs: Replace 5s sleep before logs_contain with poll_result_until_some
- timeouts.rs: Replace immediate log check with poll for timeout log
- prometheus.rs: Replace all 1s sleeps before metrics checks with poll_result_until_some

Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid race where poll_result_until_some exits on first non-empty result
before all model inference rows are inserted. Only return Some when
rows.len() == expected_count.

Co-authored-by: Cursor <cursoragent@cursor.com>
@GabrielBianconi
Copy link
Member

Let's split this PR into smaller incremental changes. This one is too large, which makes it challenging to debug and review. It'll be tough to get all tests to pass with this. But agree the methodology is better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants