Crash Consistency Testing for Persistent Memory File Systems
A correct file system implementation should be crash-consistent. In the event of a crash (e.g., due to a power failure), the file system’s data structures should remain consistent to avoid corrupted or lost files.
Crash consistency is challenging to achieve in file systems for persistent memory (PM). PM is integrated into the CPU’s memory hierarchy and accessed with regular load/store instructions. The memory write path offers an atomic write size of only 8 bytes. Additionally, PM software must manage volatile state in the write path by issuing cache flush and memory fence instructions. Correct use of these so-called PM primitives is challenging since they do not have a visible effect on the application data during runtime but are critical for consistency after a crash.
We developed Suvi, an approach for black-box crash consistency testing of PM file systems. Suvi uses a record-and-replay approach. Its testing pipeline finds concrete witnesses for crash consistency bugs. The Tracer records PM interactions of a PM file system running in a virtual machine (VM). The Crash Image Generator replays the trace, simulates crashes, and yields crash images that represent possible PM contents in the event of a crash. Finally, the Tester automatically determines the crash atomicity of file system operations by analyzing the semantic state contained in the crash images.
Suvi innovates on previous approaches to crash consistency testing in multiple ways:
- Suvi offers full-system tracing of PM and NVMe accesses using virtual machines with binary translation, allowing analysis of cross-media file systems that use these storage technologies.
- Suvi includes an advanced PM simulation that models the ordering of x86 store instructions more precisely than other crash consistency testing approaches and supports both volatile and persistent caches.
- Two heuristics ensure efficient generation of crash images by avoiding a combinatorial explosion when there is a large number of PM stores.
- Suvi makes the analysis of large PM images feasible by using file system copy-on-write and a memoized hashing scheme.
- Suvi’s analysis tools allow the automatic detection of crash consistency bugs and help developers identify the causes of such bugs.
| Title | Author | Source |
|---|---|---|
| Vinter: Automatic Non-Volatile Memory Crash Consistency Testing for Full Systems | Samuel Kalbfleisch, Lukas Werling, Frank Bellosa |
2022 USENIX Annual Technical Conference. July 11–13, 2022 |
| Efficient and Correct Persistent Memory File Systems | Lukas Werling |
Dissertation - Karlsruher Institut für Technologie (KIT) |
| Improvements in Crash Consistency Testing for Persistent Memory File Systems | Lukas Werling, Thomas-Christian Oder, Lucas Wäldele, Daniel Ritz, Frank Bellosa |
Tagungsband des FG-BS Frühjahrstreffens 2024, Bochum, Germany, March 14 - 15, 2024 |