Code • Operating Systems (Computer Science)

Iterative DUMB server#

We’ll work with a TCP based service exposing a few text based commands:

request PING -> response PONG
request WAIT <s> -> blocks for s seconds, response DONE
request BURN <it> -> runs <it> iterations of CPU intensive, dumb calculations, response RESULT <x>
request STREAM <n> -> streams back <n> characters to client, then response DONE.

All delimited with \n character.

make dumb_iterative && ./dumb_iterative

dumb_iterative.c

Note how this simple iterative implementation makes the server not responsive while it is processing some request.

Long-lasting requests cause other clients to wait:

echo "PING" | nc -vN 127.0.0.1 8090

echo "WAIT 5000" | nc -vN 127.0.0.1 8090 &
PID_WAIT=$!

time echo "PING" | nc -vN 127.0.0.1 8090

wait $PID_WAIT

Also, slow clients (slowly receiving data) delay handling of the others:

echo "STREAM 100000000" | nc -vN 127.0.0.1 8090 | pv -q -L 10 > /dev/null &
PID_SLOW=$!

sleep 1.0

echo "PING" | timeout 5 nc -vN 127.0.0.1 8090
if [ $? -eq 124 ]; then
    echo "[!] TIMEOUT: Client 2 gave up. The server is completely paralyzed."
fi

sleep 1.0

kill $PID_SLOW 2>/dev/null
wait $PID_SLOW 2>/dev/null

Same for a client who slowly sends requests:

(
    echo -n "P"; sleep 2;
    echo -n "I"; sleep 2;
    echo -n "N"; sleep 2;
    echo "G"
) | nc -vN 127.0.0.1 8090 &
PID_SLOWLORIS=$!

sleep 1.0

echo "PING" | timeout 5 nc -vN 127.0.0.1 8090
if [ $? -eq 124 ]; then
    echo "[!] TIMEOUT: Client 2 blocked by a slow sender!"
fi

wait $PID_SLOWLORIS 2>/dev/null

The server could easily handle others while waiting for I/O.

Event multiplexers#

Build and run the simple TCP chat server based on select():

make select_chat && ./select_chat

select_chat.c

Connect a few clients and send some messages:

nc localhost 8090

Now try epoll() version:

make epoll_chat && ./epoll_chat

epoll_chat.c

Thread-Per-Connection#

Let’s try a different approach to solve the above problems. Server is now going to spawn a dedicated thread handling each connection.

make dumb_multithreaded && taskset -c 0 ./dumb_multithreaded

dumb_multithreaded.c

Note how:

PING request is handled in an instant while WAIT 5000 is still running
PING is simply handled while the other client slowly receives bytes from STREAM command
The same in case the other client is slowly sending P, I, N, G\n, request bytes.

Another problem appears. Spin up a monitoring dashboard:

docker compose up -d

And view the dashboard at http://localhost:3000/.

Then run more and more clients issuing small BURN requests. Each python instance runs on separate core and simulates 1k clients using the service.

taskset -c 1 python c10k_client.py

taskset -c 2 python c10k_client.py

taskset -c 3 python c10k_client.py

taskset -c 4 python c10k_client.py

taskset -c 5 python c10k_client.py

taskset -c 6 python c10k_client.py

taskset -c 7 python c10k_client.py

taskset -c 8 python c10k_client.py

taskset -c 9 python c10k_client.py

taskset -c 10 python c10k_client.py

Key observations:

Number of threads = number of clients
At some threshold (usually ~7k) the server becomes virtually unusable:
- latency grows high
- clients are not able to connect
- requests do timeout
CPU is utilized 100% at this time mostly by %sys (syscalls and context switches)
Server consumes HUGE amount of memory (proportional to the number of clients)
Huge number of context switches, including non-voluntary appears

Experiment:

change client request type (i.e. PONG/STREAM)
try to limit server memory

Solution with epoll#

Now do the same against epoll implementation:

make dumb_epoll && taskset -c 0 ./dumb_epoll

dumb_epoll.c

Observe the load on the dashboards. Such a server can easily handle 10k clients, even with a single core!

Follow-ups: Try to implement WAIT <n> command handling.

Scalable epoll#

make dumb_epoll_workers && taskset -c 0,1,2,3 ./dumb_epoll_workers

dumb_epoll_workers.c

Observe the load distribution on the dashboard at http://localhost:3000/dashboards: DUMB epoll workers

taskset -c 4 python c10k_client.py 2000

taskset -c 5 python c10k_client.py 2000

taskset -c 6 python c10k_client.py 2000

Reactor pattern#

make dumb_reactor && taskset -c 0,1,2,3 ./dumb_reactor

dumb_reactor.c

Coroutines#

make dumb_coro && taskset -c 0,1,2,3 ./dumb_coro

dumb_coro.c