Search the Asterisk Blog

Continuous Integration Update

By George Joseph

Back in December in my The Continuing Saga of Continuous Integration blog post I wrote about how we reduced the Testsuite’s “27” layers of file system access down to 3 by moving the Docker container’s /tmp filesystem to be memory backed.  That reduced the number of individual test failures by quite a bit but still only about 20% of the Gerrit reviews submitted to Jenkins for testing were passing and getting automatically merged.  After quite a bit of head scratching, Joshua Colp determined that we were still seeing storage I/O latency on the order of seconds, expecially when the Testsuite was starting Asterisk.  After even more head scratching, we decided to try changing the underlying VM disk image storage path.

My earlier post showed each VM host using XFS filesystems to store the Gluster bricks (Gluster Overview).  What it didn’t show was that the XFS filesystems were sitting on top of LVM Logical Volumes, Volume Groups, and Physical Volumes before actually getting to the SSDs.  This is, in fact, the recommended architecture for an oVirt Hyperconverged cluster but it just didn’t seem optimal.  But what were the alternatives?  Well, the most straightforward one was to replace the XFS/LVM architecture with Btrfs directly on the SSDs.  Why?  Well, the first reason is that Btrfs has built-in optmizations for SSDs which XFS doesn’t.  The second is that Btrfs’s “chunk” size of 1G fits better with the Gluster “shard” size of 512MB.  Finally, although LVM’s performance penalty is miniscule, Btrfs does its own multi-volume management so we don’t need the added configuration complexity of LVM.

The results:

Using Gluster’s profiling tools, we took before and after samples of WRITE operations across the 9 Gluster bricks in the cluster.

XFS over LVM over SSD

Btrfs over SSD

That’s a significant improvement!

So now what was the ultimate result from a Gerrit review auto-merge perspective?

The Gerrit auto-merge rate went from 20% to 90%!

Of course it’s not just the auto-merge rate we’re happy about.  Since the tests themselves are more reliable, the results are also more meaningful.  When we see failures, they’re more likely to be real issues with the code rather than artifacts of the testing infrastructure.

No Comments Yet

Get the conversation started!

Add to the Discussion

Your email address will not be published. Required fields are marked *

About the Author

George Joseph

See All of George's Articles