Power and configuration fault tolerant Linux systems

A long time ago I was tasked with creating a quick and dirty Linux based embedded system for WiFi routing equipment. The target hardware for this operating system included Soekris net4801; PCEngines WRAP.1(x) and WRAP.2(x); and some custom 1U server with Via EPIA equipped mainboards. I wanted to reduce the amount of set up time required for installing, compiling and maintaining new packages and opted to use Debian 'stable' as the base for the project. Each mainboard and component set had at least 64 megs of RAM, a Compact Flash socket or the ability to use a hard disk drive, mini-pci or pci slots, and a processor over 200MHz. Those base specs helped narrow down the projects scope and created a short list of goals to solve a few root problems with embedded Linux distributions:

  • Reduce filesystem corruption during power outages
  • Booting from the most recent filesystem snapshot
  • Having available a full Debian based distribution for small personal and professional computing products
These goals were met by allowing Debian to use UnionFS and a root filesystem stored as a SquashFS which contains a majority of the packages a custom product would require. The UnionFS mounting script would reformat and repopulate a spare partition to use as the real time root filesystem overlay. The data which it populates the overlay partition with comes from another partition used to store the most recent saved filesystem overlay. Rsync is used via a shell script which synchronizes changes made on the overlay partition to the storage partition. This allows for power failure any time "except" when synchronizing data to the storage partition. The overlay partition is always, unless instructed via boot parameters, reformatted and repopulated with the last saved changes.

This sounds like a lot of fluff, but when you have a 1gig Compact Flash card to use on a project like this, instead of a 64 meg Disk-On-Chip device, you can spare a few bytes here and there to maintain a pristine filesystem snapshot.

Other programs which spun off due to this project were:
  • Using GPSD to set the system and hardwire clocks on mainboards with faulty or no RTC battery, this was important for systems that are often offline and unable to use NTP time sources
  • Integrating NTP with the NMEA output from GPSD
  • AMD Geode SCX200 GPIO fiddling in the kernel, this allowed visual boot status on PCEngines WRAP boards
  • WRAP and Soekris related patches to the kernel
I will try to maintain a list of similar projects, one that comes to mind immediately is Flashybrid. More recent versions of Flashybrid contain hooks to store staging data on disk. It has a neat setup and may be more suitable for most users looking for a "save now, erase changes on boot" functionality.