Chaos Testing

How to (not) Lose Messages on an Apache Pulsar Cluster

How to (not) Lose Messages on an Apache Pulsar Cluster

In this post we’ll put the protocols we covered in the Understanding How Apache Pulsar Works post to the test. As in previous tests of How to Lose Messages on a RabbitMQ Cluster and How to Lose Messages on a Apache Kafka Cluster, I’ll be using Blockade to kill off nodes, slow down the network and lose packets. Unlike in those previous tests, these tests are automated and go further, not only testing for data loss but also correct ordering and duplication.

In each scenario we’ll stand-up a new blockade cluster with a specific configuration of:

  • Apache Pulsar broker count

  • Apache BookKeeper node (Bookie) count

  • Ensemble size (E)

  • Write quorum size (Qw)

  • Ack quorum size (Qa)

How to Lose Messages on a Kafka Cluster - Part 1

In my previous post I used Blockade, Python and some Bash scripts to test a RabbitMQ cluster under various failure conditions such as failed nodes, network partitions, packet loss and a slow network. The aim was to find out how and when a RabbitMQ cluster loses messages. In this post we’ll do exactly the same but with a Kafka cluster. We’ll use our knowledge of the inside workings of Kafka and Zookeeper to produce various failure modes that produce message loss. Please read my post on Kafka fault tolerance as this post assumes you understand the basics of the acknowledgements and replication protocol.

How to Lose Messages on a RabbitMQ Cluster

In my RabbitMQ vs Kafka series Part 5 post I covered the theory of RabbitMQ clustering and some of the gotchas. In this post we'll demonstrate the message loss scenarios described in that post using Docker and Blockade. I recommend you read that post first as this post assumes understanding of the topics covered.

Blockade is a really easy way to test out how distributed systems cope with network partitions, flaky networks and slow networks. It was inspired by the Jepson series. In this post we'll either be killing off nodes, partitioning the cluster, introducing packet loss or slowing down the network. So with Blockade, some bash and python scripts we’ll test out some failure scenarios.