feat(sdk): add Benchmark and AsyncBenchmark classes by sid-rl · Pull Request #714 · runloopai/api-client-python

sid-rl · 2025-12-18T01:26:36Z

Add SDK classes for managing benchmarks:

Benchmark: Synchronous class for benchmark operations
- get_info(): Retrieve benchmark details
- update(): Update benchmark metadata
- run(): Start a benchmark run, returns BenchmarkRun
- add_scenarios(): Add scenarios to the benchmark
- remove_scenarios(): Remove scenarios from the benchmark
- list_runs(): List benchmark runs with filtering
AsyncBenchmark: Async version with the same interface
New TypedDicts for SDK params:
- SDKBenchmarkUpdateParams
- SDKBenchmarkStartRunParams
- SDKBenchmarkListRunsParams
Unit tests for both sync and async classes
E2E smoketests

src/runloop_api_client/sdk/__init__.py

james-rl · 2025-12-18T21:53:26Z

Some minor feedback to make the tests more useful / less mysterious in case of failures, but this generally looks great

tests/smoketests/sdk/test_benchmark.py

jrvb-rl · 2025-12-18T19:32:43Z

src/runloop_api_client/sdk/async_benchmark.py

+
+    Provides async methods for retrieving benchmark details, updating the benchmark,
+    managing scenarios, and starting benchmark runs. Obtain instances via
+    ``runloop.benchmark.from_id()`` or ``runloop.benchmark.list()``.


Is there a way to create a link to the BenchmarkOps definitions here? That would make the resulting docs really easy to navigate. Eg, maybe something like this?

You obtain a benchmark with the [runloop.benchmark](some useful link) operations, such as runloop.benchmark.create() and runloop.benchmark.list()

Even better if we can link to the specific methods, but that is less critical IMO. (Just as long as we can get people close...)

will do once i add the BenchmarkOps classes! the plan is to add them in a separate pr once this one is merged

src/runloop_api_client/sdk/async_benchmark.py

jrvb-rl · 2025-12-18T20:12:33Z

src/runloop_api_client/sdk/async_benchmark.py

+from .async_benchmark_run import AsyncBenchmarkRun
+
+
+class AsyncBenchmark:


Lets highlight that this is a handle to benchmark management operations, but that to understand what is in the benchmark, you need a BenchmarkView. This is somewhat stated here, but I think it would be helpful to be more explicit. What do you think of this?

A handle for managing a Runloop Benchmark.

This provides async methods for retrieving benchmark details....

... The [BenchmarkView](some link) object contains details about the contents of the benchmark. The info() call and various update methods all return the most recent benchmark state.

Or something like that?

this is true of all the classes we have so far: to understand what is actually in the object X, we have to call get_info() and look at the XView. since BenchmarkView is listed as the return type of get_info() and update(), and is documented in the type reference, i think it's fine to leave as is

src/runloop_api_client/types/benchmark_start_run_params.py

jrvb-rl · 2025-12-18T23:29:05Z

tests/smoketests/sdk/test_async_benchmark.py

+        if benchmark_data.scenario_ids:
+            scenario = async_sdk_client.scenario.from_id(benchmark_data.scenario_ids[0])
+            scenario_runs.append(
+                await scenario.run(benchmark_run_id=run.id, run_name="sdk-smoketest-async-benchmark-run-scenario")


Presumably this bit starts the devbox.... We should set a small-ish lifetime for these in case some crash prevents us from cleaning up nicely.

can't set devbox keep_alive_time from scenario.run, but moved this to within the try block so that our cleanup is handled in case of an error

tests/smoketests/sdk/test_async_benchmark.py

jrvb-rl · 2025-12-18T23:33:55Z

tests/smoketests/sdk/test_benchmark.py

+        benchmark_data = benchmarks[0]
+
+        # Create Benchmark wrapper
+        benchmark = Benchmark(


Same comment on Benchmark obj creation -- let's use the SDK for these to keep things idiomatic.

jrvb-rl · 2025-12-18T23:35:20Z

tests/smoketests/sdk/test_benchmark.py

+
+        # If the benchmark has scenarios, run one
+        scenario_runs: list[ScenarioRun] = []
+        if benchmark_data.scenario_ids:


as w/ the async, test, let's do 2 scenario runs

jrvb-rl · 2025-12-18T23:36:06Z

tests/smoketests/sdk/test_benchmark.py

+            assert len(bench_scenario_runs) == len(scenario_runs)
+            for bench_scenario_run in bench_scenario_runs:
+                assert isinstance(bench_scenario_run, ScenarioRun)
+                assert bench_scenario_run.id == scenario_runs[0].id


same issue w/ the loop here as w/ the async case

… and test id cleanup)

…nchmark retrieval smoketest

jrvb-rl

I'm loving the new smoketests... these are really clean. Thanks for the changes!!

* fix(types): allow pyright to infer TypedDict types within SequenceNotStr * chore: add missing docstrings * feat(devbox): added stdin streaming endpoint * chore(internal): add missing files argument to base client * feat(benchmarks): add `update_scenarios` method to benchmarks resource * fix(benchmarks): `update()` for benchmarks and scenarios replaces all provided fields and does not modify unspecified fields (#6702) * feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes (#712) * update requirements-dev * pyproject formatting nit * feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes * fixed smoketests * `list_scenario_runs()` now returns a list of ScenarioRun/AsyncScenarioRun objects * cleanup(agents): unified version parameter across agent sources (#713) * cleanup(agents): unified version parameter across agent sources * increase snapshot test timeout * reinsert version parameter into example code * fix: use async_to_httpx_files in patch method * codegen metadata * feat(sdk): add Benchmark and AsyncBenchmark classes (#714) * feat(sdk): add Benchmark and AsyncBenchmark classes (with some import and test id cleanup) * raise exceptions instead of skipping, more defensively run scenario * rename benchmark `run()` to `start_run()` * more helpful example docstrings * comments about params type splitting for developer clarity * remove low value unit tests * add smoketest TODOs * skip list_runs() smoketest when no available benchmark runs * create/update custom benchmark and scenarios for smoketest, remove benchmark retrieval smoketest * feat(sdk): add BenchmarkOps and AsyncBenchmarkOps to SDK (#716) * chore(internal): add `--fix` argument to lint script * chore(internal): codegen related update * feat(client): add support for binary request streaming * feat(devbox): remove this one * feat(network-policy): add network policies to api * chore(internal): update `actions/checkout` version * feat(blueprint): Set cilium network policy on blueprint build (#7006) * chore(devbox): Remove network policy from devbox view; use launch params instead (#7025) * refactor(benchmark): Deprecate /benchmark/{id}/runs in favor of /benchmark_runs (#7019) * release: 1.3.0-alpha * cp dines --------- Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com> Co-authored-by: sid-rl <siddarth@runloop.ai> Co-authored-by: Alexander Dines <alex@runloop.ai>

sid-rl changed the title ~~feat(sdk): add Benchmark and AsyncBenchmark classes~~ WIP feat(sdk): add Benchmark and AsyncBenchmark classes Dec 18, 2025

sid-rl force-pushed the siddarth/benchmark-sdk branch from 4aab259 to 14962a1 Compare December 18, 2025 18:44

sid-rl changed the title ~~WIP feat(sdk): add Benchmark and AsyncBenchmark classes~~ feat(sdk): add Benchmark and AsyncBenchmark classes Dec 18, 2025

sid-rl requested review from james-rl and jrvb-rl December 18, 2025 18:45

sid-rl commented Dec 18, 2025

View reviewed changes

src/runloop_api_client/sdk/__init__.py Show resolved Hide resolved

sid-rl commented Dec 18, 2025

View reviewed changes

tests/smoketests/sdk/test_benchmark.py Outdated Show resolved Hide resolved

jrvb-rl reviewed Dec 18, 2025

View reviewed changes

stainless-app bot force-pushed the next branch from 6f4d954 to 88f8fb9 Compare December 18, 2025 23:45

sid-rl added 2 commits December 18, 2025 16:49

feat(sdk): add Benchmark and AsyncBenchmark classes (with some import…

f79e46c

… and test id cleanup)

raise exceptions instead of skipping, more defensively run scenario

1e71c1d

sid-rl force-pushed the siddarth/benchmark-sdk branch from 4d967d8 to 1e71c1d Compare December 19, 2025 00:49

sid-rl added 7 commits December 18, 2025 17:05

rename benchmark run() to start_run()

6d95fc0

more helpful example docstrings

b67f5f7

comments about params type splitting for developer clarity

591aa7a

remove low value unit tests

4959d83

add smoketest TODOs

0f8c60f

skip list_runs() smoketest when no available benchmark runs

bf54da8

create/update custom benchmark and scenarios for smoketest, remove be…

1cd6d70

…nchmark retrieval smoketest

sid-rl requested a review from jrvb-rl December 20, 2025 00:43

jrvb-rl approved these changes Dec 20, 2025

View reviewed changes

sid-rl merged commit 8909d8a into next Dec 20, 2025
6 of 7 checks passed

sid-rl deleted the siddarth/benchmark-sdk branch December 20, 2025 01:18

stainless-app bot mentioned this pull request Dec 20, 2025

release: 1.3.0-alpha #708

Merged

		from .async_benchmark_run import AsyncBenchmarkRun


		class AsyncBenchmark:

Comments

Conversation

sid-rl commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

james-rl commented Dec 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

... The [BenchmarkView](some link) object contains details about the contents of the benchmark. The info() call and various update methods all return the most recent benchmark state.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrvb-rl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sid-rl commented Dec 18, 2025 •

edited

Loading