Skip to content

tarasko/websocket-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Method

This benchmark measures the latency of various Python asyncio-based WebSocket client libraries.

Each client connects to a WebSocket server over the loopback interface. The client sends a message of a specified size and waits for the response. Once the response is received, the client immediately sends the next message.

This request–response loop runs for a fixed period of time, after which the average requests per second (RPS) is calculated.

All clients are tested in the same environment and compared against the same high-performance C++ server.

Tested libraries: picows, aiohttp, websockets, ws4py, tornado, c++ boost beast for reference.

Results (higher is better)

results/benchmark-256.png

results/benchmark-8192.png

results/benchmark-100000.png

Tornado

Not really a tornado when it comes to websockets. Can't say anything about HTTP though. This framework came consistently last across all async libraries. I briefly checked its source, they parse and build websocket frames in pure python, but apart from that I didn't notice anything super unusual.

Ws4py

First of all, this project seems to be NOT well maintained anymore. The author said so himself Lawouach/WebSocket-for-Python#297. Still I added it because it is quite popular and has >1k stars on github.

Second, I did ws4py synchronous client, it is not actually fair to compare it against async libraries that use event loops. Async libraries have almost always an extra system call for select, epoll_wait before reading data. This extra system call introduces significant latency.

Third, NEVER EVER USE py4ws WITHOUT wsaccel. ws4py has no C speedups for masking frame payload. It becomes 100 times slower than any other library when sending websocket frames of medium size (8192 bytes)

Websockets

A very popular websocket client library with the modern async interface. But for some reason significantly slower than its main competitor aiohttp. aiohttp offers similar features and a similar async interface, but it is just faster. Quickly going through the code I noticed that the library builds and parses frames in pure python, which may explain some performance degradation. Users of websockets either need some specific features that aiohttp doesn't have. Or they don't care about performance, which is absolutely fine, it is Python after all. Or maybe they are not aware of performance penalties.

Aiohttp

The most famous asynchronous HTTP Client/Server with websockets support. They do quite well across all message sizes while providing async read/write interface. The library implements frame parsing in Cython, but loose some performance due to high level features like async interface, message assembling and corresponding copying of frame data. I'd say it is a default choice if you just need a simple async websocket client

Picows

A lightweight websocket client and server library for asyncio with a focus on performance. Frame building and parsing implemented in C. The library is very efficient with memory usage and tries to minimize memory copying and Python object creations. Connection establishing is async, while the data interface is not async, it is simple callbacks through method overloading. This was a deliberate design choice because:

  • async interface introduces an extra hop through the event loop. The data is not immediately delivered to the user. First asyncio.Future is created and set and then event loop wakes up user coroutine it on next iteration.
  • when data can be delivered immediately it doesn't have to be copied. User handlers can efficiently process messages from read memory buffer directly through the memoryview.
  • cython definitions are available. You can completely eliminate python vectorcall protocol when calling library methods or providing callbacks on the most critical path.

If performance is your concern, you should definitely give it a try.

Boost.Beast

I added C++ client to see how good python libraries perform in comparison with actual high perfomance C++ code. Suprisingly when message size is >2K picows + uvloop can be even faster than Beast. After analyzing strace output I realized that Beast has made a dubious design decision. It always does first read with the maximum size of 1536 bytes. It is hardcode and AFAIK there is no option to change it. So if you transmit frames bigger than roughly 1536 bytes Beast will always do 2 system calls to just read your data. picows/uvloop has a bigger read buffer, so it is almost always a single system call.

Uvloop

Unfortunately, uvloop is not very well maintained anymore. It take years to get PRs merged, but it is still a little faster than vanilla asyncio from Python-3.13.

Build C++ Boost.Beast websocket echo server and client

  1. Create or reuse some python virtual environment. I use conda environments but it could be anything.
$ conda create -n wsbench
$ conda activate wsbench
  1. Install conan. Conan is a C++ package manager (simular to pip).
$ pip install conan
  1. Initialize conan default profile
$ conan profile detect
  1. Install C++ dependencies, create build project using default cmake generator.
$ conan install . --output-folder=build --build=missing
$ cd build
$ cmake .. -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake -DCMAKE_BUILD_TYPE=Release
# Go back to the root folder
$ cd ..
  1. Build server and client
$ cmake --build ./build --parallel
  1. Run websocket echo server. Must be run from the project root folder, otherwise it will complain about missing certificate. Server will listen on 2 ports, 9001: plain websocket, 9002: ssl websocket
$ ./build/src/ws_echo_server 127.0.0.1 9001 9002
  1. Test websocket echo client. Must be run from the project root folder. After succeful run, the client will dumpl RPS to stdout.
Usage: ws_echo_client <is_async{1|0} is_secure{1|0}> <host> <port> <msg_size> <duration_sec>
./build/src/ws_echo_client 1 0 127.0.0.1 9001 256 10

Build python benchmark

  1. Install dependencies
$ pip install -r requirements.txt
  1. Compile cython extensions
$ python setup.py build_ext --inplace
  1. Run benchmark for 256 message size, 10 seconds duration per client. Run it from the project root folder. ws_echo_server should be run manually prior to running benchmark.
$ python -m websocket_benchmark.benchmark --msg-size 256 --duration 10

Contribute

Feel free to add other libraries to this benchmark. PRs are welcome!

Releases

No releases published

Packages

No packages published