C++ library for Group Membership and Failure Detection using gossip style protocol.
library: a collection of types, functions, classes, etc. implementing a set of facilities (abstractions) meant to be potentially used as part of more that one program. From Cpp Code guidelines gloassay
The protocol is based on "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol" paper and some ideas implemented in Hashicorp's Memberlist Go library although implementations are independent and not compatible.
In distributed systems participants may not be pre-determined. It is not sufficient to have a static list of nodes forming a cluster. This property is critical as individual nodes may be taken out for service or replace. This also enables scalability of the cluster as new nodes can be added as required if the task can benefit from new nodes. Given that nodes can join and leave the group unpredictably, it is required for a given node to keep track of its peers.
On the other hand it may not be practical to keep track of all the members for extremely large clusters. If nodes join and leave group at a steady rate and each such event requires message exchange - number of such messages will grow with the size of a cluster limiting available bandwidth resources required for computation.
It may also be beneficial to use membership protocol on a resource constrained systems aka IoT devices.
That is why this implementation of SWIM includes a notion of peer capacity - that is maximum number of peers one given node can have / track. Note that this setting not required for to be the same for each peer. This trade-off makes this implementation a weakly consistent/ probability based as it it possible for a given node to be tracked only by one other peer to be consider a part of the group.
Key observation here is that for a node to be part of a group it must be in peer list of at list one other node. It is not enough for a node to know other peers. This peers must know about that node as 'connections' are symmetrical.
See docs for more information about project motivation and protocol design.
The library is designed to provide components for building distributed applications. It does not provide networking leaving it up to a library user to chose IO method. Some user may chose sockets directly, while other already have networking layer such as libuv or boost.asio.
In order to use the library a user must get a message buffer from an IO subsitem,
for example when a message is received on a UDP socket or read from a file.
This message then should be passed to the message handler to possibly update an estimated state of the cluster.
The update function in turn produces messages that library user must dispatch to the network.
#include <tribe/protocol.hpp> // Convenience header for message builders and parser
#include <tribe/membership.hpp> // Group membership model
// A handler of the datagram that user must implement
tribe::PeersModel handleDatagram(IO& io, tribe::PeersModel model, Solace::MemoryView msgBuffer) {
// parseAndUpdate will produce a new model given a valid message is parsed from the buffer
auto [newModel, outMessages] = tibe::parseAndUpdate(model, msgBuffer);
// It is user's should serialise and sent message out to the network,
// Or enqueue them to send later.
for (auto const& msg : outMessages) {
io << msg;
}
// A new model should be considered a new view of the cluster
return newModel;
}
parseAndUpdate
function is a convenience function for parsing raw messages and updating the model.
In case a more fine grained control is required - the process can be separated into two steps:
//... inside your IO service that handles datagrams
Solace::Result<tribe::PeersModelUpdate, Error>
parseDatagram(tribe::PeersModel model, Solace::MemoryView msgBuffer) {
auto maybeMessage = tribe::Gossip::MessageParser{}
.parse(reader);
if (!maybeMessage) { // Data in the buffer does not constitute a valid gossip message.
return maybeMessage.getError();
}
return tribe::update(model, *maybeMessage);
}
There is a Conan for this library.
If your project is using for Conan for dependency management you can add libtribe
to your conanfile.txt:
[requires]
libtribe/0.0.1@abbyssoul/stable
While the library is not available in the conan-central repository - you need to use:
conan remote add <REMOTE> https://api.bintray.com/conan/abbyssoul/public-conan
Please check the latest available binary version.
Project build is managed by CMake with some make-files only to automate basic operations and drive CMake.
The library depends on libsolace for low level data manipulation primitives such as ByteReader/ByteWriter and Result<> type.
Note test framework used is gtest and it is managed via git modules.
Don't forget to do git submodule update --init --recursive
on a new checkout to pull sub-module dependencies.
In order to build this project following tools must be present in the system:
- git (to check out project and it’s external modules, see dependencies section)
- doxygen (for documentation)
- cppcheck (static code analysis, latest version from git is used as part of the 'codecheck' step)
- cpplint (for static code analysis in addition to cppcheck)
- valgrind (for runtime code quality verification)
This project is using C++17 features extensively. The minimal tested/required version of gcc is gcc-7. CI is using clang-6 and gcc-7. To install build tools on Debian based Linux distribution:
sudo apt-get update -qq
sudo apt-get install git doxygen python-pip valgrind ggcov
sudo pip install cpplint
The library has one external dependency: libsolace which is managed via conan. Please make sure conan is installed on your system if you want to build this project.
# In the project check-out directory:
# To build debug version with sanitizer enabled (recommended for development)
./configure --enable-debug --enable-sanitizer
# To build the library it self
make
# To build and run unit tests:
make test
# To run valgrind on test suit:
# Note: `valgrind` doesn’t work with ./configure --enable-sanitize option
make verify
# To build API documentation using doxygen:
make doc
To install locally for testing:
make --prefix=/user/home/<username>/test/lib install
To install system wide (as root):
make install
To run code quality check before submission of a patch:
# Verify code quality before submission
make codecheck
Library also has some basic usage examples that can be found in 'examples' subdirectory.
To build all examples run from the based directory:
# Verify code quality before submission
make examples
This framework is work in progress and contributions are very welcomed.
Please see CONTRIBUTING.md
for details on how to contribute to this project.
Note that in order to maintain code quality a set of static code analysis tools is used as part of the build process. Thus all contributions must be verified by this suite of tools before PR can be accepted.
The library available under Apache License 2.0
Please see LICENSE
for details.
Please see AUTHORS
file for the list of contributors.