Reduce e2e test flakiness#6285
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the Contributor License Agreement (CLA) and hereby sign the CLA. |
3d0abc4 to
9aa3900
Compare
|
recheck |
|
@BugBot review |
Bugbot couldn't runBugbot is not enabled for your user on this team. Ask your team administrator to increase your team's hard limit for Bugbot seats or add you to the allowlist in the Cursor dashboard. |
|
/check-fork |
|
🚀 General Checks workflow triggered on new branch external-contributor/pr-6285! View the run: https://github.com/tensorzero/tensorzero/actions/runs/21951514181 |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9aa3900297
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let results: Vec<Value> = poll_result_until_some(async || { | ||
| select_model_inferences_clickhouse(&clickhouse, inference_id).await | ||
| }) |
There was a problem hiding this comment.
Wait for full model inference fan-out before asserting count
poll_result_until_some exits as soon as the closure returns Some, but select_model_inferences_clickhouse returns Some for any non-empty result set (it only returns None when zero rows are found). In this test, that means polling can stop after the first ModelInference row is visible and then immediately assert results.len() == 4, which reintroduces a race when the remaining rows are still being inserted. Please poll on the expected cardinality (for example, only return Some when rows.len() == 4) before asserting.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 882349e. The polling now only returns Some when the expected cardinality is reached (e.g. rows.len() == 4), so we no longer exit early on the first non-empty result. The redundant assert_eq!(results.len(), ...) after polling is also removed since the cardinality is guaranteed by the polling condition itself.
|
/check-fork |
|
🚀 General Checks workflow triggered on new branch external-contributor/pr-6285! View the run: https://github.com/tensorzero/tensorzero/actions/runs/21952327910 |
Replace fixed sleeps before ClickHouse/Postgres reads with polling helpers that retry with exponential backoff. This addresses the most common source of merge-queue flakes (issue tensorzero#5609). Key changes: - Add `poll_result_until_some` helper and `poll_clickhouse_for_result!` macro - Replace ~60 `sleep_for_writes_to_be_visible` calls with `flush_pending_writes` - Remove the `sleep_for_writes_to_be_visible` trait method entirely - Convert all sleep-then-read patterns in e2e tests to use polling - Add Docker health checks for Memcached, Tempo, and OTel Collector - Fix race in `test_count_feedback_by_target_id` (concurrent inserts) - Fix race in `test_warn_ignored_thought_block` (log capture timing) - Replace fixed sleeps in cache tests with polling for cache consistency - Replace 25s sleep in otel_export with Tempo trace polling Co-authored-by: Cursor <cursoragent@cursor.com>
- providers/openai/mod.rs: poll_clickhouse_for_result for all selects, poll for embedding cache - providers/openai/custom_tools.rs: replace sleeps with poll_clickhouse_for_result - providers/anthropic.rs: replace sleeps with poll_clickhouse_for_result - openai_compatible.rs: replace sleeps with poll_clickhouse_for_result - dicl.rs: flush_pending_writes instead of sleep, poll_result_until_some for model inference - inference/mod.rs: remove sleep in dryrun test - cache.rs: poll for log capture, poll_clickhouse_for_result for model inference Co-authored-by: Cursor <cursoragent@cursor.com>
- cache.rs: Add check_test_streaming_cache_cached, use poll_result_until_some, remove sleeps - rate_limiting.rs: Replace 5s sleep before logs_contain with poll_result_until_some - timeouts.rs: Replace immediate log check with poll for timeout log - prometheus.rs: Replace all 1s sleeps before metrics checks with poll_result_until_some Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid race where poll_result_until_some exits on first non-empty result before all model inference rows are inserted. Only return Some when rows.len() == expected_count. Co-authored-by: Cursor <cursoragent@cursor.com>
2e5a556 to
882349e
Compare
|
Let's split this PR into smaller incremental changes. This one is too large, which makes it challenging to debug and review. It'll be tough to get all tests to pass with this. But agree the methodology is better! |
Systematic fix for the most common sources of merge-queue flakes (#5609). Replaces fixed sleeps before database reads with polling helpers that retry with exponential backoff, and removes the
sleep_for_writes_to_be_visiblehelper entirely in favor offlush_pending_writes(which callsSYSTEM FLUSH ASYNC INSERT QUEUE).Also adds Docker health checks for Memcached/Tempo/OTel Collector and fixes two smaller races: concurrent inserts in
test_count_feedback_by_target_idand log capture timing intest_warn_ignored_thought_block.Made with Cursor