I've been idly thinking about writing a simple (Q)SPI flash oriented filesystem lately. I think this is an interesting niche, the two file systems I am aware of in this space are:
To be clear, a general purpose file system is not anything I've designed before! I've designed simpler "slot based" or "log ring" filesystems, usually treating the whole flash as a ring buffer (for wear leveling), and overwriting the oldest data. This is much more similar to something like the sequential-storage crate by Dion Dokter from TweedeGolf.
Some facts about SPI flash
- The typical storage amount is 1MiB-32MiB, maybe with 128MiB-256MiB as the reasonable upper range
- Most devices have pretty high potential bandwidth for reading, 8MHz single lane spi at the low end, maybe 80MHz quad spi at the reasonable high end, with some devices exceeding this (in either frequency or things like octal-spi)
- These devices typically can write any single byte (though some are limited
to 4 or 8 bytes at a time, particularly if they have ECC), but a write can
only be done to move bits from high to low, e.g.
0xFF
->0xAA
- MOST, but not all, devices can write the same byte multiple times, e.g.
0xFF
->0xAA
then0xAA
->0x0A
- These devices typically can only erase a block or sector at a time, with typical sizes of 4KiB-64KiB for a block or sector
- In terms of speed, reading is extremely fast, basically "line rate", writing is slower, and erasing is much slower (sometimes in the order of magnitude of milliseconds for a single block/sector), though a real "write" is actually an erase + write in most cases, if you don't pre-erase blocks!
- These devices typically have a low "seek" cost, there is some cost to send a new address to "jump" to a new location, but it is relatively low, and there is no difference in speed based on the distance (in bytes) or direction (forwards and backwards) like there would be for spinning disks
- These devices have typical endurance of as low as 10k erases, but more commonly 100k to 1m erases, but this means we should put some effort into wear leveling, and potentially some effort into bad block detection
- It is generally important to ensure that anything we do is "power-off" safe, e.g. if we are interrupted due to a loss of power or crash, it must be possible to recover from
- There is NOT generally a complex set of behaviors the chip itself does, like would be common for eMMC or SD card flash - we can generally trust that the devices do what we tell them (if they say "done", they really are done, and don't have pending flushes, etc.)
Back of an envelope speed numbers
An interesting detail of the speed:size ratio of an external flash is that it becomes somewhat reasonable to scan the entire device occasionally, either for recovery or normal operation.
- 1MHz, single SPI: 122KiB/s
- 8MHz, single SPI: 977KiB/s
- 32MHz, single SPI: 3.81MiB/s
- 80MHz, single SPI: 9.54MiB/s
- 8MHz, quad SPI: 3.81MiB/s
- 32MHz, quad SPI: 15.3MiB/s
- 80MHz, quad SPI: 38.1MiB/s
Speed | Size | SPI time (s) | QSPI time (s) |
---|---|---|---|
1MHz | 1MiB | 8.388608 | 2.097152 |
1MHz | 8MiB | 67.108864 | 16.777216 |
1MHz | 16MiB | 134.217728 | 33.554432 |
1MHz | 32MiB | 268.435456 | 67.108864 |
1MHz | 128MiB | 1073.741824 | 268.435456 |
8MHz | 1MiB | 1.048576 | 0.262144 |
8MHz | 8MiB | 8.388608 | 2.097152 |
8MHz | 16MiB | 16.777216 | 4.194304 |
8MHz | 32MiB | 33.554432 | 8.388608 |
8MHz | 128MiB | 134.217728 | 33.554432 |
32MHz | 1MiB | 0.262144 | 0.065536 |
32MHz | 8MiB | 2.097152 | 0.524288 |
32MHz | 16MiB | 4.194304 | 1.048576 |
32MHz | 32MiB | 8.388608 | 2.097152 |
32MHz | 128MiB | 33.554432 | 8.388608 |
80MHz | 1MiB | 0.1048576 | 0.0262144 |
80MHz | 8MiB | 0.8388608 | 0.2097152 |
80MHz | 16MiB | 1.6777216 | 0.4194304 |
80MHz | 32MiB | 3.3554432 | 0.8388608 |
80MHz | 128MiB | 13.4217728 | 3.3554432 |
Therefore if we think about how much reading we can do in roughly one second of wall clock time (okay for "sometimes, not always" operations, like at boot up for initialization):
- Single SPI:
- 1MHz - N/A
- 8MHz - <= 1MiB
- 32MHz - <= 4MiB
- 80MHz - <= 8MiB
- Quad SPI:
- 1MHz - N/A
- 8MHz - <= 4MiB
- 32MHz - <= 16MiB
- 80MHz - <= 32MiB
And if we think about how much reading we can do in roughly sixteen seconds of wall clock time (okay for very rare operations, like in the case where a serious recovery is necessary):
- Single SPI:
- 1MHz - <= 2MiB
- 8MHz - <= 16MiB
- 32MHz - <= 32MiB
- 80MHz - <= 128MiB
- Quad SPI:
- 1MHz - <= 8MiB
- 8MHz - <= 64MiB
- 32MHz - > 128MiB
- 80MHz - > 128MiB
Hmm, this is actually a little less feasible than I thought before. In many cases, the storage flash device is on a secondary interface, many devices that have QSPI are using it for primary code flash, leaving only single SPI for "bulk storage". These secondary interfaces are also typically limited in speed, as well.
I think a reasonable lower target would be single spi, 8MHz. This means that we can expect 1MiB of read in 1s, and 16MiB of read in 16s. Hmm.
Sharing flash for code (XIP) and bulk storage (FS)
Many developers often want to use a single flash for both code and bulk storage. This seems desirable for a couple of reasons:
- It's cheaper in cost and size to have one big flash, than two
- If your one flash is internal instead of external, it means you don't need additional parts, and to use additional pins + peripherals on your MCU
- If your one flash is external, it's likely you can use a higher speed interface for both, for example QSPI vs SPI, or a higher frequency
But there's one pretty huge downside of using the same flash device for code and bulk storage: you usually can't read from flash while you are erasing or writing, which means in most cases, your CPU will stall and block until the erase is complete. This may take >100ms in some cases!
If you are trying to write logs to the filesystem while you are doing something else like spinning a motor, this can have negative side effects!
There are SOME workarounds and tradeoffs possible to mitigate this:
- You can load your code into RAM, so you aren't running from flash
- But RAM is much more limited in size vs flash, and this eats into your working RAM size
- Some flash parts have multiple banks, so you can be erasing one while
still being able to read from the other
- This is not common at all in external flash parts
- This is also somewhat rare even for internal flash parts
- If you DO have banks, and your FS or code aren't strictly split between the two banks, you're back to square one
So in most cases, I'd suggest developers use separate flashes for code and bulk storage. This can be either:
- Use internal flash for code, and an external flash chip for storage
- Use one external flash chip for code, and a second external flash chip for storage
Otherwise, your use of the filesystem will be limited to cases where you write extremely rarely, and/or you are okay with the fact that the entire system may stall whenever you need to write and erase
What do we need for a filesystem?
There are a couple of pieces that I think are necessary for a filesystem:
Blocks
We usually want to divide the flash into sectors/blocks, as these are the smallest unit that we can erase at one time. Sectors/blocks are most commonly 4KiB-64KiB.
Size | Block | # of blocks |
---|---|---|
1MiB | 4KiB | 256 |
1MiB | 16KiB | 64 |
1MiB | 64KiB | 16 |
8MiB | 4KiB | 2048 |
8MiB | 16KiB | 512 |
8MiB | 64KiB | 128 |
16MiB | 4KiB | 4096 |
16MiB | 16KiB | 1024 |
16MiB | 64KiB | 256 |
32MiB | 4KiB | 8192 |
32MiB | 16KiB | 2048 |
32MiB | 64KiB | 512 |
128MiB | 4KiB | 32768 |
128MiB | 16KiB | 8192 |
128MiB | 64KiB | 2048 |
The total number of blocks is important to keep in mind, because we will need to keep track of the state of each block, remembering which blocks are occupied, which ones are unused but need erase before writing, and which ones are erased and ready to be written to.
We'll need to be able to store or determine these states relatively quickly. This is because we'll want some kind of "block allocator", when we want to create a new file or write new data, we'll want to be able to determine where to put it, and whether we need to do an erase before writing.
If we have 32768 blocks, and need to store just two bits of information for each block (is used and is erased), that would require 8KiB of storage to hold that information.
Metadata
We need to store some kind of metadata for each file we store (more on files later). On desktop filesystems, this might include things like permissions, timestamps of creation, modification, or access. It also includes things like the name of the file, and potentially the length of the file
We also may want to use metadata to describe "scattered" files, if the storage required for a file spans multiple blocks. We also may want to use metadata to allow for "overlays", if we want to replace or insert data into a file, without fully copying the file.
Files
Most people want a filesystem so they can store files. Files are of arbitrary size, and have a couple of common access patterns:
- Creating
- This could be the creation of a new file, either with an initially zero size, or with some fixed initial reservation
- Deleting
- This is the removal of a file from the filesystem
- Reading
- This could be reading linearly, from start to finish
- This could be skipping some amount, then reading linearly
- This could be jumping around arbitrarily forwards and backwards
- Writing
- This could be writing a new file, and writing linearly
- This could be appending to the end of an existing file
- This could be overwriting an existing part of a file
- This could be inserting something in the middle of a file
- Renaming
- This could be changing the name of a file
- This could also overwrite an old file, meaning it is both a rename and a deletion (of the old file)
Wrapping up
Writing this all down has helped solidify some of the thoughts I had about writing a FS, and also realize that some of the ideas I had are less viable or reasonable than I thought.
Maybe I'll pick up writing on this more later :)