Model Checking: Use Stateright to Formally Verify Raft Lite

2024-02-24
8 min read

You can find the source code of this post here.

Background of model checking

Distributed system is rather complex due to the fact that the message can be lost, delayed or duplicated. It is hard to reason about the correctness of a distributed system.

Hand-written proof is a common way to verify the correctness of a distributed system. For example, the Raft paper provides a proof of safety property of Raft in Section 5.4. Many proofs seem plausible but actually could be wrong (I’m not saying the Raft paper’s proof is wrong). A lot of peer-reviewed papers have been found to be wrong regarding their proofs (e.g. SOCT2, SDT, etc.).

Model checking is a formal verification technique that can be used to verify the correctness of distributed system. It is a systematic way to check whether a model satisfies a given specification. Model checking explores all possible states of a model. If a property is violated in any state, the model checker will report it and the developer can fix the bug accordingly.

However, model checking is not a silver bullet. It is not feasible to check all possible states of a model. The state space of a model could be very large. In practice, we usually constrain the state space to a reasonable size. For example, we can specify the depth of the search to 20 steps or we can specify the maximum number of messages to 100. Even with these constraints, model checking is still a powerful tool to find bugs in a distributed system.

Many companies use TLA+ to verify their core algorithms (e.g. AWS, Azure, etc.). However, TLA+ is usually used to verify the algorithm at a high level. Even if the algorithm itself is verified, the implementation of the algorithm could still be wrong, since the programmer might make mistakes.

Stateright is a model checking library for Rust. Instead of using a high-level language like TLA+, stateright uses Rust to model the system. This means that we can verify the low-level implementation directly without the need to translate the high-level algorithm to the low-level implementation.

Introduction to stateright

You can find the details of stateright in its Github repo and there is also a good tutorial here. In case you don’t want to read them, I will give a brief introduction to stateright here. Feel free to skip this section if you are already familiar with stateright.

We usually model a distributed system as multiple nodes communicating through network. Stateright provides an abstraction for this called ActorModel.

Specify node logic

To define the logic of a node, you need to implement the trait Actor, namely the on_start, on_msg and on_timeout methods.

pub trait Actor: Sized {
    ...

    fn on_start(&self, id: Id, o: &mut Out<Self>) -> Self::State;

    fn on_msg(&self, id: Id, state: &mut Cow<'_, Self::State>, src: Id, msg: Self::Msg, o: &mut Out<Self>);

    fn on_timeout(&self, id: Id, state: &mut Cow<'_, Self::State>, _timer: &Self::Timer, o: &mut Out<Self>);

}

Specify system model

Assumptions about a distributed system can be categorized into the following types:

  1. Network behaviour (e.g. message loss)
  2. Timing behaviour (e.g. latency)
  3. Node behaviour (e.g. crashes)

For 1 and 2, stateright provides a ready-to-use network implementation:

pub enum Network<Msg>
where
    Msg: Eq + Hash,
{
    UnorderedDuplicating(HashableHashSet<Envelope<Msg>>),
    UnorderedNonDuplicating(HashableHashMap<Envelope<Msg>, usize>),
    Ordered(BTreeMap<(Id, Id), VecDeque<Msg>>),
}

As long as invariants do not check the network state, losing a message is indistinguishable from an unlimited delay.

For 3, stateright provides a way to specify the maximum crash-stop failures in the system by calling the method max_crashes in the ActorModel struct.

pub fn max_crashes(mut self, max_crashes: usize) -> Self;

As a side note, I didn’t find a way to specify the crash-recovery failures in stateright. It might be my fault. If you know how to do it, please let me know :)

Specify properties to check

Finally, you need to specify the properties you want to check using struct Property.

The condition is a closure that returns a boolean value. If the condition is true, the property is satisfied. You need to provide the closure to specify the property you want to check.

pub struct Property<M: Model> {
    pub expectation: Expectation,
    pub name: &'static str,
    pub condition: fn(&M, &M::State) -> bool,
}

Model checking Raft Lite

Specify node logic

Raft Lite is designed that the algorithm is runtime agnostic. Any runtime environment that implements the Runner trait can be used to run the Raft Lite algorithm.

Raft Lite algorithm uses the Runner to interact with other nodes (for example, sending a vote request). I implemented a RealRunner previously to run the Raft Lite algorithm in a real environment. Utilising RealRunner, I used Raft Lite to build StorgataDB, a distributed RESP-compatible KV database system.

To verify the algorithm in stateright, I implemented a CheckerRunner, which wraps the Raft Lite algorithm into the model checker.

pub trait Runner {
    fn init_state(&mut self) -> NodeState;
    fn vote_request(&mut self, peer_id: u64, vote_request_args: VoteRequestArgs);
    fn log_request(&mut self, peer_id: u64, log_request_args: LogRequestArgs);
    fn forward_broadcast(&mut self, broadcast_args: BroadcastArgs);
    ...
}

The key part of the implementation is the step function that processes any event, and return the actions that the node should take. This makes the algorithm behave like a state machine.

impl RaftProtocol<CheckerRunner> {
    pub(crate) fn step(&mut self, event: Event) -> Vec<StepOutput> {
        self.dispatch_event(event); // dispatch_event is the core logic of the Raft Lite algorithm
        self.runner.collect_output()
    }
}

The implementation of the Actor trait is just calling the step function every time when a message is received or a timeout is triggered.

impl Actor for CheckerActor {
    ...

    fn on_msg(
        &self,
        state: &mut Cow<Self::State>,
        msg: Self::Msg,
        o: &mut Out<Self>,
    ) {
        // translate the message to the event input of the step function
        let event = match msg {
            CheckerMessage::VoteRequest(vote_request_args) => Event::VoteRequest(vote_request_args),
            CheckerMessage::VoteResponse(vote_response_args) => {
                Event::VoteResponse(vote_response_args)
            }
            CheckerMessage::LogRequest(log_request_args) => Event::LogRequest(log_request_args),
            CheckerMessage::LogResponse(log_response_args) => Event::LogResponse(log_response_args),
            CheckerMessage::Broadcast(payload) => Event::Broadcast(payload),
        };
        process_event(state, event, o);
    }

    fn on_timeout(
        &self,
        _id: Id,
        state: &mut Cow<Self::State>,
        timer: &Self::Timer,
        o: &mut Out<Self>,
    ) {
        // translate the timer to the event input of the step function
        let event = match timer {
            CheckerTimer::ElectionTimeout => Event::ElectionTimeout,
            CheckerTimer::ReplicationTimeout => Event::ReplicationTimeout,
        };
        process_event(state, event, o);
    }
}

fn process_event(
    state: &mut Cow<CheckerState>,
    event: Event,
    o: &mut Out<CheckerActor>,
) {
    let state = state.to_mut();
    // call the step function to process the event, and get the actions that the node should take
    let step_output = state.raft_protocol.step(event);
    // take the actions
    for output in step_output {
        match output {
            StepOutput::DeliverMessage(payload) => {
                state.delivered_messages.push(payload);
            }
            StepOutput::VoteRequest(peer_id, vote_request_args) => {
                o.send(
                    Id::from(peer_id as usize),
                    CheckerMessage::VoteRequest(vote_request_args),
                );
            }
            ...
            StepOutput::ElectionTimerReset => {
                o.set_timer(CheckerTimer::ElectionTimeout, model_timeout());
            }
            ...
        }
    }
}

Specify system model

  1. The network is non-duplicating and unordered by default, but can be changed to other types by passing argument to the program.
  2. The node is crash-stop failure and can tolerate at most less than majority of nodes’ crash-stop failures.

Specify properties to check

I specified two properties to check the correctness: Election Safety and State Machine Safety:

  1. Election Safety: at most one leader can be elected in a given term
  2. State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.

Actually State Machine Safety is the only property that we need to care about (The goal of Raft is to implement a replicated state machine). However, to give a simpler example for readers, I also checked Election Safety:

.property(Expectation::Always, "Election Safety", |_, state| {
    // at most one leader can be elected in a given term
    let mut leaders_term = HashSet::new(); // `term` set if we found a leader in the term already
    for s in &state.actor_states {
        if s.raft_protocol.state.current_role == crate::raft_protocol::Role::Leader
            && !leaders_term.insert(s.raft_protocol.state.current_term)
        {
            return false;
        }
    }
    true
})

You can see that the property is really simple to express in Rust language. All you need to do is just to write a closure that returns a boolean value.

Strictly speaking, the Election Safety property should be checked in the context of all states together rather than each state separately. For simplicity, I just checked it in each state separately.

Running the model checker

We can run the model checker by specifying the maximum number of steps to limit the state space.

It will print all counterexamples if the properties are violated. Additionally, it will print examples that satisfy some properties. For instance, here I specify two liveness properties, and the model checker prints the examples that satisfy the properties.

No counterexamples are found in the Raft Lite algorithm, which means that the algorithm is correct in the context of the specified properties.

$ cargo run -- -m check --depth 10
Checking. states=4, unique=4, depth=1
Checking. states=475674, unique=133824, depth=10
Checking. states=917040, unique=256771, depth=10
Done. states=924710, unique=259150, depth=10, sec=3
Discovered "Election Liveness" example Path[3]:
- Timeout(Id(0), ElectionTimeout)
- Deliver { src: Id(0), dst: Id(1), msg: VoteRequest(VoteRequestArgs { cid: 0, cterm: 1, clog_length: 0, clog_term: 0 }) }
- Deliver { src: Id(1), dst: Id(0), msg: VoteResponse(VoteResponseArgs { voter_id: 1, term: 1, granted: true }) }
Fingerprint path: 13280538127433316798/18417327358524522001/10876327409151634344/11261648250825353397
Discovered "Log Liveness" example Path[6]:
- Timeout(Id(2), ElectionTimeout)
- Deliver { src: Id(2), dst: Id(0), msg: VoteRequest(VoteRequestArgs { cid: 2, cterm: 1, clog_length: 0, clog_term: 0 }) }
- Deliver { src: Id(0), dst: Id(2), msg: VoteResponse(VoteResponseArgs { voter_id: 0, term: 1, granted: true }) }
- Deliver { src: Id(2), dst: Id(2), msg: Broadcast([50]) }
- Deliver { src: Id(2), dst: Id(0), msg: LogRequest(LogRequestArgs { leader_id: 2, term: 1, prefix_len: 0, prefix_term: 0, leader_commit: 0, suffix: [LogEntry { term: 1, payload: [50] }] }) }
- Deliver { src: Id(0), dst: Id(2), msg: LogResponse(LogResponseArgs { follower: 0, term: 1, ack: 1, success: true }) }
Fingerprint path: 13280538127433316798/5012304960666992246/2656658050571602193/12788966706765998312/12610557528799436519/6208176474896103011/7212898540444505159

Alternatively, we can also run the model checker in a Web UI by:

$ cargo run -- -m explore
Exploring state space for Raft on http://localhost:3000

You can click to choose the next action and see the state transition after the action is taken. Also, you can find the sequence diagram of the system.

The following is a screenshot of the Web UI, in which I simulated the election process where two nodes are elected as leaders in different terms.