OneVariable : Thoughts on SPI flash and filesystems

I've been idly thinking about writing a simple (Q)SPI flash oriented filesystem lately. I think this is an interesting niche, the two file systems I am aware of in this space are:

To be clear, a general purpose file system is not anything I've designed before! I've designed simpler "slot based" or "log ring" filesystems, usually treating the whole flash as a ring buffer (for wear leveling), and overwriting the oldest data. This is much more similar to something like the sequential-storage crate by Dion Dokter from TweedeGolf.

Some facts about SPI flash

The typical storage amount is 1MiB-32MiB, maybe with 128MiB-256MiB as the reasonable upper range
Most devices have pretty high potential bandwidth for reading, 8MHz single lane spi at the low end, maybe 80MHz quad spi at the reasonable high end, with some devices exceeding this (in either frequency or things like octal-spi)
These devices typically can write any single byte (though some are limited to 4 or 8 bytes at a time, particularly if they have ECC), but a write can only be done to move bits from high to low, e.g. 0xFF -> 0xAA
MOST, but not all, devices can write the same byte multiple times, e.g. 0xFF -> 0xAA then 0xAA -> 0x0A
These devices typically can only erase a block or sector at a time, with typical sizes of 4KiB-64KiB for a block or sector
In terms of speed, reading is extremely fast, basically "line rate", writing is slower, and erasing is much slower (sometimes in the order of magnitude of milliseconds for a single block/sector), though a real "write" is actually an erase + write in most cases, if you don't pre-erase blocks!
These devices typically have a low "seek" cost, there is some cost to send a new address to "jump" to a new location, but it is relatively low, and there is no difference in speed based on the distance (in bytes) or direction (forwards and backwards) like there would be for spinning disks
These devices have typical endurance of as low as 10k erases, but more commonly 100k to 1m erases, but this means we should put some effort into wear leveling, and potentially some effort into bad block detection
It is generally important to ensure that anything we do is "power-off" safe, e.g. if we are interrupted due to a loss of power or crash, it must be possible to recover from
There is NOT generally a complex set of behaviors the chip itself does, like would be common for eMMC or SD card flash - we can generally trust that the devices do what we tell them (if they say "done", they really are done, and don't have pending flushes, etc.)

Back of an envelope speed numbers

An interesting detail of the speed:size ratio of an external flash is that it becomes somewhat reasonable to scan the entire device occasionally, either for recovery or normal operation.

1MHz, single SPI: 122KiB/s
8MHz, single SPI: 977KiB/s
32MHz, single SPI: 3.81MiB/s
80MHz, single SPI: 9.54MiB/s
8MHz, quad SPI: 3.81MiB/s
32MHz, quad SPI: 15.3MiB/s
80MHz, quad SPI: 38.1MiB/s

Speed	Size	SPI time (s)	QSPI time (s)
1MHz	1MiB	8.388608	2.097152
1MHz	8MiB	67.108864	16.777216
1MHz	16MiB	134.217728	33.554432
1MHz	32MiB	268.435456	67.108864
1MHz	128MiB	1073.741824	268.435456
8MHz	1MiB	1.048576	0.262144
8MHz	8MiB	8.388608	2.097152
8MHz	16MiB	16.777216	4.194304
8MHz	32MiB	33.554432	8.388608
8MHz	128MiB	134.217728	33.554432
32MHz	1MiB	0.262144	0.065536
32MHz	8MiB	2.097152	0.524288
32MHz	16MiB	4.194304	1.048576
32MHz	32MiB	8.388608	2.097152
32MHz	128MiB	33.554432	8.388608
80MHz	1MiB	0.1048576	0.0262144
80MHz	8MiB	0.8388608	0.2097152
80MHz	16MiB	1.6777216	0.4194304
80MHz	32MiB	3.3554432	0.8388608
80MHz	128MiB	13.4217728	3.3554432

Therefore if we think about how much reading we can do in roughly one second of wall clock time (okay for "sometimes, not always" operations, like at boot up for initialization):

Single SPI:
- 1MHz - N/A
- 8MHz - <= 1MiB
- 32MHz - <= 4MiB
- 80MHz - <= 8MiB
Quad SPI:
- 1MHz - N/A
- 8MHz - <= 4MiB
- 32MHz - <= 16MiB
- 80MHz - <= 32MiB

And if we think about how much reading we can do in roughly sixteen seconds of wall clock time (okay for very rare operations, like in the case where a serious recovery is necessary):

Single SPI:
- 1MHz - <= 2MiB
- 8MHz - <= 16MiB
- 32MHz - <= 32MiB
- 80MHz - <= 128MiB
Quad SPI:
- 1MHz - <= 8MiB
- 8MHz - <= 64MiB
- 32MHz - > 128MiB
- 80MHz - > 128MiB

Hmm, this is actually a little less feasible than I thought before. In many cases, the storage flash device is on a secondary interface, many devices that have QSPI are using it for primary code flash, leaving only single SPI for "bulk storage". These secondary interfaces are also typically limited in speed, as well.

I think a reasonable lower target would be single spi, 8MHz. This means that we can expect 1MiB of read in 1s, and 16MiB of read in 16s. Hmm.

Many developers often want to use a single flash for both code and bulk storage. This seems desirable for a couple of reasons:

It's cheaper in cost and size to have one big flash, than two
If your one flash is internal instead of external, it means you don't need additional parts, and to use additional pins + peripherals on your MCU
If your one flash is external, it's likely you can use a higher speed interface for both, for example QSPI vs SPI, or a higher frequency

But there's one pretty huge downside of using the same flash device for code and bulk storage: you usually can't read from flash while you are erasing or writing, which means in most cases, your CPU will stall and block until the erase is complete. This may take >100ms in some cases!

If you are trying to write logs to the filesystem while you are doing something else like spinning a motor, this can have negative side effects!

There are SOME workarounds and tradeoffs possible to mitigate this:

You can load your code into RAM, so you aren't running from flash
- But RAM is much more limited in size vs flash, and this eats into your working RAM size
Some flash parts have multiple banks, so you can be erasing one while still being able to read from the other
- This is not common at all in external flash parts
- This is also somewhat rare even for internal flash parts
- If you DO have banks, and your FS or code aren't strictly split between the two banks, you're back to square one

So in most cases, I'd suggest developers use separate flashes for code and bulk storage. This can be either:

Use internal flash for code, and an external flash chip for storage
Use one external flash chip for code, and a second external flash chip for storage

Otherwise, your use of the filesystem will be limited to cases where you write extremely rarely, and/or you are okay with the fact that the entire system may stall whenever you need to write and erase

What do we need for a filesystem?

There are a couple of pieces that I think are necessary for a filesystem:

Blocks

We usually want to divide the flash into sectors/blocks, as these are the smallest unit that we can erase at one time. Sectors/blocks are most commonly 4KiB-64KiB.

Size	Block	# of blocks
1MiB	4KiB	256
1MiB	16KiB	64
1MiB	64KiB	16
8MiB	4KiB	2048
8MiB	16KiB	512
8MiB	64KiB	128
16MiB	4KiB	4096
16MiB	16KiB	1024
16MiB	64KiB	256
32MiB	4KiB	8192
32MiB	16KiB	2048
32MiB	64KiB	512
128MiB	4KiB	32768
128MiB	16KiB	8192
128MiB	64KiB	2048

The total number of blocks is important to keep in mind, because we will need to keep track of the state of each block, remembering which blocks are occupied, which ones are unused but need erase before writing, and which ones are erased and ready to be written to.

We'll need to be able to store or determine these states relatively quickly. This is because we'll want some kind of "block allocator", when we want to create a new file or write new data, we'll want to be able to determine where to put it, and whether we need to do an erase before writing.

If we have 32768 blocks, and need to store just two bits of information for each block (is used and is erased), that would require 8KiB of storage to hold that information.

Metadata

We need to store some kind of metadata for each file we store (more on files later). On desktop filesystems, this might include things like permissions, timestamps of creation, modification, or access. It also includes things like the name of the file, and potentially the length of the file

We also may want to use metadata to describe "scattered" files, if the storage required for a file spans multiple blocks. We also may want to use metadata to allow for "overlays", if we want to replace or insert data into a file, without fully copying the file.

Files

Most people want a filesystem so they can store files. Files are of arbitrary size, and have a couple of common access patterns:

Creating
- This could be the creation of a new file, either with an initially zero size, or with some fixed initial reservation
Deleting
- This is the removal of a file from the filesystem
Reading
- This could be reading linearly, from start to finish
- This could be skipping some amount, then reading linearly
- This could be jumping around arbitrarily forwards and backwards
Writing
- This could be writing a new file, and writing linearly
- This could be appending to the end of an existing file
- This could be overwriting an existing part of a file
- This could be inserting something in the middle of a file
Renaming
- This could be changing the name of a file
- This could also overwrite an old file, meaning it is both a rename and a deletion (of the old file)

Wrapping up

Writing this all down has helped solidify some of the thoughts I had about writing a FS, and also realize that some of the ideas I had are less viable or reasonable than I thought.

Maybe I'll pick up writing on this more later :)

Thoughts on SPI flash and filesystems