Case 1: B2B SaaS
The client in our case study is an enterprise B2B SaaS which collects large volumes of data, processes and enriches that data, and then provides this enriched data to their customers through APIs and web interfaces.
Their cloud footprint was dominated by approximately even-sized Cassandra and ElasticSearch clusters, followed by their three biggest proprietary services. While a number of other services exist, they did not contribute much to the overall footprint (only 4% of overall cost), and we focused on the remaining big-ticket items.
Breakdown of cloud spending and per-service savings
How did we achieve these savings? Depending on the service, our optimizations ranged from configuration changes to improved caching to vectorization of third-party library code. In detail, our gains were due to the following improvements:
- Service A
Our optimizations reduced the runtime to about 10% (10x speedup!):
- We reached better overall CPU utilization through changes to the sizes of the work-packages that this service was receiving.
- We identified that an operating-system-provided regular expression package was not JIT’ing a number of regular expressions properly. We ensured the base images of the client ship with a JIT’ing version of the package.
- A particular third-party library spent 10% of CPU time in a particular loop. We ensured that this loop was properly vectorized, improving the performance of the loop 8-fold.
- Service B
Our optimizations reduced the cost on a representative work package to about 33% (3x speedup!):
- Improved a frequently-used data structure to be more efficient.
- Changed the base image and improved configuration settings to ensure that the service was always running in JIT’ed form.
- Service C
Our optimizations reduced the runtime on a representative work package to about 65% (more than 1.5x speedup!):
- Added an improved caching mechanism that de-duplicated data between many threads on the same host that would otherwise keep the data in duplicated form.
- Reduced duplicated initialization work by implementing a forkserver from which fully-initialized processes are forked (the previous solution would perform a lot of re-initialization work on each new process).
We identified and implemented an improved data compression scheme better-suited for the client’s access pattern.
We identified, benchmarked and enabled configuration settings that improved storage consumption.
Our team provided pull-requests to various infrastructure-as-code components, the actual code of Services A, B, and C, as well as infrastructure to rebuild and modify the upstream packages for the A.2 and A.3 optimizations. These were reviewed, approved, and deployed by the client to their great satisfaction.