interrupts is threads

2023-10-04

A simple tattoo that says "shrimps is bugs"

credit: /u/Lewbular on /r/tattoodesigns

I love the tattoo "shrimps is bugs" because it is both absurd, and mostly true. Technically shrimps are not bugs, but they share a lot of qualities.

In the same way, I'd like to assert interrupts is threads.

interrupts is threads

In embedded, unless you are using a real time operating system, you often don't have "threads" like you would on your desktop. Threads have two main interesting qualities in this context:

We model interrupts "as if they were threads" in embedded Rust, because as far as Rust's safety guarantees are concerned, they ARE exactly the same.

Pre-emptive scheduling

As threads are pre-emptively scheduled, one chunk of code might be stopped at ANY time, and another chunk of code may start running.

Similar to a thread context switch: interrupts can happen at any time! The only difference is that the hardware is doing context switching for us, rather than some operating system.

For threads OR interrupts, this means that anything shared across them must either be read-only, or be synchronized in some way to avoid race conditions or other data corruption.

In Rust, we have two ways of making sure these safety rules are followed, both enforced by the compiler:

Separate stacks

By having separate stacks, threads can keep their own local context across thread switching. This means if you have something on the stack, like local variables, switching to another thread doesn't affect that data in any way.

Interrupts don't typically have their own stack, instead they typically share the main stack, or some privileged-mode stack common to all interrupts. However, they can't typically (at least safely) "see" the stack/local data from the main thread or other interrupts, and when they are done they are responsible for removing any data they did put on the stack.

This means conceptually we can treat interrupts as if they have separate, ephemeral stacks, that just happen to be on top of someone else's stack for as long as they are running.

The only difference is that threads can be resumed with their previous stack/context data at any time, while interrupts must "start with nothing" and "end with nothing".

How do interrupts work?

In modern embedded devices, interrupts act sort of like a callback. When some event occurs, the CPU stops whatever code is currently running and "calls" the interrupt handler function. This function takes no arguments, and returns no values.

In Rust terms, this means that the function looks like:

fn handler() { /* ... */ }

In C terms, this means that the function looks like:

void handler(void) { /* ... */ }

These functions run to completion, there's generally no way to pause execution and pick back up later. Once the function is complete, the CPU picks up whatever it was doing prior to the interrupt.

In modern devices, interrupts can also often be pre-empted by other interrupts, if the second interrupt is a higher "priority". These are typically called "nested" interrupts.

Because these functions take and return no values, the main way of sharing something with them is to place the item in a static variable.

Static variables in Rust

Let's say we have a simple program in Rust:

// This is the "main" thread. It is always running
fn main() {
    loop {
        red_led_on();
        green_led_on();
        sleep_ms(250);

        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

// This is an interrupt handler. Let's pretend it is called
// once every time a button is pressed, specifically in an
// "edge triggered" way.
#[interrupt]
fn button_press_handler() {
    // ?
}

This is a wonderful little program, which will blink two LEDs. But let's say we only want to blink one at a time, and switch which LED is blinking every time we press the button.

How do we get the "message" from the interrupt to the main thread that the button has been pressed? We might start by storing some boolean as a static that both can see:

// A static variable
static BLINK_RED: bool = false;

// This is the "main" thread. It is always running
fn main() {
    loop {
        if BLINK_RED {
            red_led_on();
        } else {
            green_led_on();
        }
        sleep_ms(250);

        // We can turn both off, it's fine.
        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

// This is an interrupt handler. Let's pretend it is called
// once every time a button is pressed, specifically in an
// "edge triggered" way.
#[interrupt]
fn button_press_handler() {
    // ERROR: BLINK_RED is not mutable!
    BLINK_RED = !BLINK_RED;
}

We solved one problem, we now have a variable that both the main thread and the interrupt handler can see! However Rust rightly complains that BLINK_RED is not mutable, which means we can't actually modify it.

You might say, okay, let's make it mutable:

// A static variable
static mut BLINK_RED: bool = false;

// This is the "main" thread. It is always running
fn main() {
    loop {
        // ERROR: Requires unsafe block!
        if BLINK_RED {
            red_led_on();
        } else {
            green_led_on();
        }
        sleep_ms(250);

        // We can turn both off, it's fine.
        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

// This is an interrupt handler. Let's pretend it is called
// once every time a button is pressed, specifically in an
// "edge triggered" way.
#[interrupt]
fn button_press_handler() {
    // ERROR: Requires unsafe block!
    BLINK_RED = !BLINK_RED;
}

But now Rust will complain for a different reason: static mut variables are unsafe to access, because they are unsynchronized, meaning there is no protection from multiple threads accessing the data at the same time.

You COULD just add unsafe blocks, and it would compile, but that's undefined behavior!

Even this simple line:

BLINK_RED = !BLINK_RED;

is theoretically problematic, because at an assembly level (in load-store architectures, like Arm and RISC-V are), it actually looks something like this:

let mut temp = BLINK_RED;   // load value into register
temp = !temp;               // invert boolean
BLINK_RED = temp;           // store value back to static

If we had multiple interrupts accessing this code, it could end up causing problems, because we could have something like this happen:

let mut temp0 = BLINK_RED;  // first interrupt loads value
temp0 = !temp0;             // first interrupt inverts
// A second interrupt occurs!
    let mut temp1 = BLINK_RED;  // second interrupt loads value
    temp1 = !temp1;             // second interrupt inverts
    BLINK_RED = temp1;          // second interrupt stores
// Second interrupt ends, first resumes
BLINK_RED = temp0;          // first interrupt stores

This is problematic, because even though two events happened, it looks like only one happened. If we started with BLINK_RED = false, it would be true after this sequence occurs, even though the first event should have gone false -> true and the second should have gone true -> false.

For larger or more complex variables, we can also run into "read tearing" and "write tearing", where we can be interrupted halfway through, leaving the data in an inconsistent or "corrupted" state.

This is the core reason why static mutable variables are unsafe in Rust, they can very very easily cause Undefined Behavior!

How do we solve this?

We solve this the same way as you do in desktop Rust! Whatever we want to share data between two "threads", we need to use some kind of data that is synchronized.

Shared Data - Sync

Since global variables are shared, this means we need some kind of Sync data!

This usually means using one of two main kinds of data:

These both work in a similar manner, they use something called "Inner Mutability", meaning the outer container is not mutable, but allow for mutation inside of them, using some kind of specific, safe operation (like using "Compare and Swap" operations, or some kind of runtime checked Mutex). Internally, these structures use some kind of careful unsafe code to ensure that access is safe.

The atomic code would look a little something like this:

// Note: NOT mutable!
static BLINK_RED: AtomicBool = AtomicBool::new(false);

fn main() {
    loop {
        if BLINK_RED.load(Ordering::Relaxed) {
            red_led_on();
        } else {
            green_led_on();
        }
        sleep_ms(250);

        // We can turn both off, it's fine.
        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

#[interrupt]
fn button_press_handler() {
    // a value XORd with true inverts it. There is a
    // fetch_not function that does this directly, but it
    // isn't stable, so this works.
    BLINK_RED.fetch_xor(true, Ordering::Relaxed);
}

I explain atomics and mutexes in more detail in the appendix section below, but for now, it's okay to take away:

This means that at compile time, we can guarantee that any shared data is "interrupt safe", the same way we can guarantee the data is "thread safe".

Shared Data - Send

Since interrupts don't have any context that is "theirs" (remember, they take no arguments, and have no 'context' field), Send is typically a little less directly relevant for interrupts.

However, most Sync structures, like Mutexs or Channels might require that whatever you put in them must be Send. This means conceptually, the relevant data is safe to be "given to" the mutex, or "taken from" the mutex.

Most commonly, this prevents you from putting "borrowed" data inside of the mutex. For example, you can't put a reference to some data on the stack inside of a static mutex, because at some point you could release the data and the static would still be holding a now-invalid reference!

What did we have to do to make this work in Rust?

That's the great part: no changes to Rust were required to model this, and Rust as a language has NO IDEA what interrupts are.

This model fell entirely out of existing capabilities and rules of the Rust language:

Therefore, as far as the Rust programming language is concerned:

interrupts is threads.


Appendices

Here's a little more context if you don't quite want to take me at my word, or want to know how these things work under the hood.

Appendix A: Atomics in more detail

With atomics, the example code code would look a little like this:

// Note: NOT mutable!
static BLINK_RED: AtomicBool = AtomicBool::new(false);

fn main() {
    loop {
        if BLINK_RED.load(Ordering::Relaxed) {
            red_led_on();
        } else {
            green_led_on();
        }
        sleep_ms(250);

        // We can turn both off, it's fine.
        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

#[interrupt]
fn button_press_handler() {
    // a value XORd with true inverts it. There is a
    // fetch_not function that does this directly, but it
    // isn't stable, so this works.
    BLINK_RED.fetch_xor(true, Ordering::Relaxed);
}

If your CPU supports "Compare And Swap" instructions, then you can use methods like fetch_xor, or compare_exchange, which guarantee the Load, Modify, and Store operation happens "atomically".

On Cortex-M devices, this uses the "LDREX" - "Load Exclusive", and "STREX" - "Store Exclusive" instructions, instead of "LDR" - Load and "STR" - Store instructions. At a hardware level, fetch_xor works a little like this:

loop {
    // This is the "LDREX"
    let mut temp = BLINK_RED.load();
    temp = !temp;

    // This is the "STREX"
    //
    // This fails if anyone else has touched "BLINK_RED" since
    // we loaded it. This is managed at a hardware level. If it
    // fails, we just re-load the value and try again.
    if BLINK_RED.store(temp).is_ok() {
        break;
    }
}

This works totally fine, but there are two potential problems:

First: not all hardware supports "Compare and Swap" operations. Notably the very popular Cortex-M0/Cortex-M0+ CPUs used in the Raspberry Pi RP2040, and hundreds of other popular low-cost/low-power devices, have NO support for LDREX/STREX or Compare and Swap operations.

Second: Atomics only work for data up to a certain size, typically the size of a register on your CPU (e.g. 32-bits on Cortex-M), though sometimes more or less (x86_64 and aarch64 do have support for 128-bit atomics despite being 64-bit processors). If your data is larger than this, you can't use atomics in this way.

In these cases, you need to use a Mutex instead.

A volatile aside

NOTE: Notice that I didn't mention volatile at all in this section!

Traditionally volatile access was used in C and C++ for sharing data with interrupts, however this was due to the fact that atomics did not exist in the standards until C11 and C++11.

Volatile is not intended to be used for synchronization, and depending on whether you have multiple cores, or data cache on your CPU, volatile may not be sufficient to perform the kind of data synchronization we've discussed here.

In general, volatile should ONLY be used for interacting with memory-mapped IO, such as hardware peripherals.

This is documented well in the Linux Kernel Docs.

Appendix B: Mutex

On a desktop, a Mutex is typically something provided by your operating system. If one thread locks a mutex, then other threads will be stopped by the operating system if they attempt to access the same data, and will only be allowed to continue once the mutex has been released.

With a Mutex, our example code would look like this:

// Note: NOT mutable!
static BLINK_RED: Mutex<bool> = Mutex::new(false);

fn main() {
    loop {
        // Lock the mutex, read the value, then drop
        // the mutex guard to allow others to access it
        let blink_red = {
            let guard = BLINK_RED.lock();
            *guard
        };

        if blink_red {
            red_led_on();
        } else {
            green_led_on();
        }
        sleep_ms(250);

        // We can turn both off, it's fine.
        red_led_off();
        green_led_off();
        sleep_ms(750);
    }
}

#[interrupt]
fn button_press_handler() {
    // Lock the mutex
    let mut guard = BLINK_RED.lock();
    *guard = !guard;
    // drop the mutex when we return
}

But this is a problem on embedded! I said before that interrupts can't be paused, they run to completion. What happens if the button press occurs RIGHT as the main thread has locked the data?

The interrupt would attempt to lock the mutex, but it would fail as it is already locked. If we used a desktop-style mutex, then our program would permanently deadlock. This is not ideal!

The simplest way to deal with this on embedded is to use a Critical Section, which is a fancy way of saying "disable all interrupts". If we can't be pre-empted while holding the mutex, then there's no problem! This only works on systems without "real" threads, unless the critical section also prevents the system from switching threads while a critical section is active.

If we disable interrupts inside of an interrupt, that's fine too! It just means that no NEW interrupts will start running until we enable them again, and our current interrupt will keep running until it is done. We don't need to worry about the "main" code running until all interrupts are done.

This works really well if you only need very short critical sections. Disabling and Enabling interrupts is very quick, and the time taken to load or store data is usually not significant.

There are some potential downsides to critical sections:

The "naive" disable-all-interrupts approach is a little overpowered, we prevent ALL interrupts from running, even ones that don't share this data. Frameworks like RTIC have clever ways of handling this more precisely. This is less important if your critical sections are very short.

If you can't keep your critical sections short, AND you have some time-sensitive or hard- or soft-realtime system, a critical section could cause you to miss some deadline.

"Which kind of mutex should I use" is outside of the scope of this article, and is generally a more broad system/data design question in my opinion.