Building Plain Old Data from Scratch

2024-06-06

I made an interesting discovery in Rust last weekend while messing with postcard:

enums are the only "plain" data type (I'm aware of) in Rust that you can't build "from scratch". When I say "from scratch", I mean using unsafe methods to manually initialize each field, say in a MaybeUninit, then call assume_init().

There are other "complex" data types, like trait objects, you can't necessarily build from scratch, but these are rarely relevant when serializing and deserializing. Even items on the heap can generally be built "in place" using various unsafe methods.

discuss on /r/rust

Building structs from scratch

Using the offset_of! macro, you can procedurally build each field of a struct, or each element of an array, even recursively!

For example, if you had a struct like this:

struct Example {
    a: u32,
    b: u8,
    c: i64,
}

You could build it from scratch like this:

use core::mem::{MaybeUninit, offset_of};

let mut out = MaybeUninit::<Example>::uninit();

let val = unsafe {
    let ptr: *mut Example = out.as_mut_ptr();
    ptr.byte_add(offset_of!(Example, a)).cast::<u32>().write(5u32);
    ptr.byte_add(offset_of!(Example, b)).cast::<u8>().write(6u8);
    ptr.byte_add(offset_of!(Example, c)).cast::<i64>().write(7i64);
    out.assume_init()
};

println!("{val:#?}");

Which will print:

Example {
    a: 5,
    b: 6,
    c: 7,
}

Check it out on the playground. Miri is totally fine with this! This even works with nested fields:

use core::mem::{MaybeUninit, offset_of};

#[derive(Debug)]
struct Example {
    a: u32,
    b: u8,
    c: i64,
    d: (bool, char),
}

fn main() {
    let mut out = MaybeUninit::<Example>::uninit();

    let val = unsafe {
        let ptr: *mut Example = out.as_mut_ptr();
        ptr.byte_add(offset_of!(Example, a)).cast::<u32>().write(5u32);
        ptr.byte_add(offset_of!(Example, b)).cast::<u8>().write(6u8);
        ptr.byte_add(offset_of!(Example, c)).cast::<i64>().write(7i64);

        let d_base = ptr.byte_add(offset_of!(Example, d)).cast::<(bool, char)>();
        d_base.byte_add(offset_of!((bool, char), 0)).cast::<bool>().write(true);
        d_base.byte_add(offset_of!((bool, char), 1)).cast::<char>().write('r');

        out.assume_init()
    };

    println!("{val:#?}");
}

This is useful for some stuff, like incremental deserialization. You can carefully build each field down to primitives, then finally call assume_init and expect it to work.

NOTE: although I use offset_of! here, you can also use addr_of_mut! to do the same kind of "from scratch" initialization as I do here. I'll explain this more in a note at the end.

Building enums from scratch

However, there are two issues with enums:

First, using offset_of! for enum fields today is unstable. However that is being worked on, so you could potentially initialize the fields in-place with nightly features.

Second, and more problematically, there is no way to set the discriminant, even in nightly, in a sound way.

Rust enums are sort of like "tagged unions", they act as if they had a discrete discriminant field (some integer), and a union of the different possible variant bodies they have.

However, setting the discriminant is... complicated. Particularly with things like "niche packing"!

For example, Option<NonNull<u8>> can actually be the same size as a pointer - because we know the pointer is never null, we can treat the null value as "special", so if the pointer is null, we can treat that as if it was a None!

For this, and other complexity reasons, we have a problem. I discussed this in the T-Lang Zulip, but it didn't seem as if the capability of manually setting the discriminant had been explored or discussed in the past.

This means that the only way to build MaybeUninit<MyEnum> from scratch is to get all of the parts together on the stack, create a MyEnum on the stack, and then write the whole thing to the destination, meaning that there are likely two full copies of the enum on the stack at once.

If I had an enum like:

enum MyEnum {
    One(u32),
    Fancy(Example),
}

The only way to actually initialize this instance is to do something like:

let mut out_enum = MaybeUninit::<MyEnum>::uninit();
let mut out_member = MaybeUninit::<Example>::uninit();

let member = unsafe {
    let ptr: *mut Example = out_member.as_mut_ptr();
    ptr.byte_add(offset_of!(Example, a)).cast::<u32>().write(5u32);
    ptr.byte_add(offset_of!(Example, b)).cast::<u8>().write(6u8);
    ptr.byte_add(offset_of!(Example, c)).cast::<i64>().write(7i64);

    let d_base = ptr.byte_add(offset_of!(Example, d)).cast::<(bool, char)>();
    d_base.byte_add(offset_of!((bool, char), 0)).cast::<bool>().write(true);
    d_base.byte_add(offset_of!((bool, char), 1)).cast::<char>().write('r');

    out.assume_init()
};

out_enum.write(MyEnum::Fancy(member));
let enm = unsafe { out_enum.assume_init() };

Is this a reasonable thing to want to do? I don't know! Ideally, I'd be able to do something like this:

// pseudo code

enum Example {
    A,
    B(u32),
}

// let's say we have this data ready off the wire
let wire_disc = 1;
let data = 123u32;

let (disc, offset): (Discriminant<Example>, usize) = match wire_disc {
    // First new syntax: getting the discriminant without a value
    // Also needs nightly offset_of for Enums
    0 => (discriminant_of!(Example, A), offset_of!(Example, A))
    1 => (discriminant_of!(Example, B), offset_of!(Example, B)),
    _ => panic!(),
};

let mut out = MaybeUninit::<Example>::new();
// write data
let val_ptr = out.as_mut_ptr().cast::<u32>().wrapping_byte_add(offset);
val_ptr.write(data);
// Second new syntax: unsafe function on `&mut MaybeUninit<T>` where T is an enum?
// Set discriminant appropriately
out.set_discriminant(disc);

// this is now fine (if we did everything right)
let out: Example = out.assume_init();

But why?

Why does this matter to me? I was doing an experiment if I could reduce the amount of generated proc macro code for serializing and deserializing, using a technique similar to how the forth language builds functions: Create a "list" of steps, and treat that as the plan for deserialization.

My hope is that this could generate less code at compile time, and potentially end up with faster compile times (due to less code - which matters when you have a LOT of #[derive(Serialize, Deserialize)] instances in your project, and potentially smaller code for embedded systems.

For example - I have a customer project where 50kloc of code becomes 200kloc of code after macro expansion - the majority of which is serde generated code, which means that the one crate takes 35-40 seconds to compile. This still results in very fast ser/de code, and is much safer than what I'm about to explain, but it was a little disappointing.

Additionally, by constructing everything "in place", more or less hand-writing "placement new" optimization, we can hopefully use less stack memory when deserializing large values on systems without a heap, where passing "by value" can use a surprising amount more memory than you would expect.

postcard-forth: a preview

The approach I took in my experiment is to start with a type like this:

struct Alpha {
    a: u8,
    b: u16,
    c: u32,
    d: i8,
    e: i16,
    f: i32,
}

And generate a "list" like this:

// macro generated
unsafe impl Deserialize for Alpha {
    const FIELDS: &'static [DeserField] = &[
        DeserField { offset: offset_of!(Alpha, a), func: deser_fields::<u8> },
        DeserField { offset: offset_of!(Alpha, b), func: deser_fields::<u16> },
        DeserField { offset: offset_of!(Alpha, c), func: deser_fields::<u32> },
        DeserField { offset: offset_of!(Alpha, d), func: deser_fields::<i8> },
        DeserField { offset: offset_of!(Alpha, e), func: deser_fields::<i16> },
        DeserField { offset: offset_of!(Alpha, f), func: deser_fields::<i32> },
    ];
}

With this function for doing the deserialization:

unsafe fn deser_fields<D: Deserialize>(
    stream: &mut DeserStream,
    base: NonNull<()>,
) -> Result<(), ()> {
    let fields = D::FIELDS;
    for field in fields {
        let fptr = base.as_ptr().wrapping_byte_add(field.offset);
        let fbase = NonNull::new_unchecked(fptr);
        (field.func)(stream, fbase)?;
    }
    Ok(())
}

This works recursively, until we get to primitives, which have implementations like this:

// hand written unsafe impl
unsafe impl Deserialize for u32 {
    const FIELDS: &'static [DeserField] = &[
        DeserField { offset: 0, func: deser_u32 },
    ];
}

// hand written primitive deserialization function
pub unsafe fn deser_u32(stream: &mut DeserStream, base: NonNull<()>) -> Result<(), ()> {
    if let Ok(val) = try_take_varint_u32(stream) {
        base.cast::<u32>().as_ptr().write(val);
        Ok(())
    } else {
        Err(())
    }
}

Unfortunately, there's no such ability for enums, which means I have to generate a full deserialization function (instead of just a simple const list), like serde does:

enum Dolsot {
    Bib(Alpha),
    Bim(Beta),
    Bap(u32),
    Bowl,
    Sticks { left: u32, right: u8 },
}

// macro generated
pub unsafe fn deser_Dolsot(
    stream: &mut ::postcard_forth::DeserStream,
    base: core::ptr::NonNull<()>,
) -> Result<(), ()> {
    let mut variant = core::mem::MaybeUninit::<u32>::uninit();
    ::postcard_forth::impls::deser_u32(
        stream,
        core::ptr::NonNull::from(&mut variant).cast(),
    )?;
    let variant = variant.assume_init();
    match variant {
        0u32 => {
            let mut a = core::mem::MaybeUninit::<Alpha>::uninit();
            ::postcard_forth::deser_fields_ref(stream, &mut a)?;

            base.cast::<Dolsot>().as_ptr().write(Dolsot::Bib(a.assume_init()));
        }
        1u32 => {
            let mut a = core::mem::MaybeUninit::<Beta>::uninit();
            ::postcard_forth::deser_fields_ref(stream, &mut a)?;

            base.cast::<Dolsot>().as_ptr().write(Dolsot::Bim(a.assume_init()));
        }
        2u32 => {
            let mut a = core::mem::MaybeUninit::<u32>::uninit();
            ::postcard_forth::deser_fields_ref(stream, &mut a)?;

            base.cast::<Dolsot>().as_ptr().write(Dolsot::Bap(a.assume_init()));
        }
        3u32 => {
            base.cast::<Dolsot>().as_ptr().write(Dolsot::Bowl);
        }
        4u32 => {
            let mut left = core::mem::MaybeUninit::<u32>::uninit();
            ::postcard_forth::deser_fields_ref(stream, &mut left)?;
            let mut right = core::mem::MaybeUninit::<u8>::uninit();
            ::postcard_forth::deser_fields_ref(stream, &mut right)?;

            base.cast::<Dolsot>()
                .as_ptr()
                .write(Dolsot::Sticks {
                    left: left.assume_init(),
                    right: right.assume_init(),
                });
        }
        _ => return Err(()),
    }
    Ok(())
}

// macro generated
unsafe impl ::postcard_forth::Deserialize for Dolsot {
    const FIELDS: &'static [::postcard_forth::DeserField] = &[
        ::postcard_forth::DeserField { offset: 0, func: deser_Dolsot },
    ];
}

Regardless, this still generally works, you can check out the postcard-forth repo here: https://github.com/jamesmunns/postcard-forth. It's just an experiment, but it is working enough to run the excellent rust_serialization_benchmark. Don't put too much faith in the benchmark results, but the initial numbers are promising.

I still need to come up with an A/B test to compare the volume of generated code, the compile time, and the code size in a bare metal embedded program, where it is easier to measure the binary size generated in a reliable way. I hope to follow up with this soon, as this was the primary objective, ser/de speed was only a check to ensure that I wasn't introducing a regression in performance.

I have some other ideas I'd like to explore as optional alternatives for serde for postcard in the future, but it was a fun bit of learning to write some fairly unsafe code.

Footnote: offset_of! vs addr_of_mut!

As I mentioned above, you can also use addr_of_mut! for "building from scratch":

use core::mem::{MaybeUninit, addr_of_mut};

let mut out = MaybeUninit::<Example>::uninit();

let val = unsafe {
    let ptr: *mut Example = out.as_mut_ptr();
    addr_of_mut!((*ptr).a).write(5u32);
    addr_of_mut!((*ptr).b).write(6u8);
    addr_of_mut!((*ptr).c).write(7i64);
    out.assume_init()
};

Note that there's quite a bit less casting and using byte_add for manual pointer moving! If you are JUST initializing a single MaybeUninit type, it's probably preferable to use addr_of_mut! instead of offset_of! like I did.

However, it has the same limitations I raised above regarding enums - there's still no way to set the discriminant of the resulting enum.

Furthermore, for the purpose of postcard-forth - I want to calculate these offsets at compile time, when I don't actually have an instance of the type itself to work with. It isn't valid to use addr_of_mut! without a valid "inbounds" pointer, so I use offset_of! instead, which doesn't have the same restriction. Additionally, I'm doing some intentional "type erasure" here, so having offsets being an integer unassociated with a type is also a feature, in this specific case.

Thanks to Ralf Jung for pointing this detail out in the T-Lang Zulip!