Physicists say that a measurement result given without an error estimate is worthless. This applies
to benchmarking as well. We not only want to know how performant a computer program or a system is,
but we also want to know if we can trust the performance numbers. This article explains how to compute
uncertainty intervals and how to avoid some traps caused by applying commonly known
statistical methods without validating their assumptions first.
In the previous post I showed how to use asynchronous
Rust to measure throughput and response times of a Cassandra cluster.
That approach works pretty well on a developer’s laptop, but it turned out it doesn’t scale to bigger machines.
I’ve hit a hard limit around 150k requests per
second, and it wouldn’t go faster regardless of the performance of the server.
In this post I share a different approach that doesn’t have these scalability problems.
I was able to saturate a 24-core single node Cassandra server
at 800k read queries per second with a single client machine.
Performance of a database system depends on many factors: hardware, configuration,
database schema, amount of data, workload type, network latency, and many others.
Therefore, one typically can’t tell the actual performance of such system without
first measuring it. In this blog post I’m describing how to build a benchmarking tool
for Apache Cassandra from scratch in Rust and how to avoid many pitfalls.
The techniques I show are applicable to any system with an async API.
Recently I came across a blog post
whose author claims, from the perspective of good coding practices, polymorphism is strictly superior to branching.
In the post they make general statements about how branching statements lead to unreadable, unmaintainable, inflexible code and
how they are a sign of immaturity. However, in my opinion, the topic is much deeper and in this post
I try to objectively discuss the reasons for and against branching.
In the previous post, I showed how processing
file data in parallel can either boost or hurt performance
depending on the workload and device capabilities. Therefore, in complex programs that mix tasks
of different types using different physical resources, e.g. CPU, storage (e.g. HDD/SSD)
or network I/O, a need may arise to configure parallelism levels differently for each task type.
This is typically solved by scheduling tasks of different types on dedicated thread pools.
In this post I’m showing how to implement a solution in Rust with Rayon.
One of the well-known ways of speeding up a data processing task is partitioning the data into smaller
chunks and processing the chunks in parallel. Let’s assume we can partition the task easily, or the input data is already
partitioned into separate files which all reside on a single storage device. Let’s also assume the algorithm we run on those
data is simple enough so that the computation time is not a bottleneck. How much performance can we gain by reading the files in parallel?
Can we lose any?