Bootloader: Stage Two! // the crosseroads

The last time we talked, I mentioned the next few steps that needed to be done in order to get to the stage two.

For those who don’t remember, those steps looked something like this:

Find and read the disk’s partition table;
Parse each partition entry looking for an “active” one;
Ensure that only one entry is marked as active;
Discover where that particular partition begins, and load the first sector of it into memory at 0x7C00;
JMP to 0x7C00 and transfer control to (what is hopefully) the second stage bootloader.

These steps have (finally) been completed!

Holy crap, it booted—on real hardware, no less

So, the Master Boot Record format has been around in the same form since about 1982, I think (don’t quote me on that, though). The MBR developed for use by the FAT filesystem (and MS-DOS) by Bill Gates way-back-when pretty much laid down the minimum criteria for “things that an MBR should do”, as well as laid down the layout of the MBR. For instance, the partition table for the drive is embedded inside the MBR, starting at offset 0x01be. (Incidentally, this means that instead of having a “full” 512 bytes of space for code like I thought, it means you actually get 440 bytes of usable space for code.) Each partition table entry is 16 bytes long, and there are four entries. If any of the entries are not used (for instance, on a disk with only one partition), then the unused entries are usually filled with all zeroes.

The good thing about this is that in order to find a bootable partition, all you have to do is examine the byte at 0x01be, see if it’s set to 0x80, increment 16 bytes, and do the same thing three more times. The rules are thus:

Partitions can be marked as active (0x80) or not active (0x00);
If no partitions are marked as active, then die.
If more than one entry is marked as active, the partition table is invalid–die.
If any entry’s “active” byte is set to anything other than 0x80 or 0x00, the partition table is invalid–die.

Pretty simple, once you get all the kinks worked out.

Once you find the active partition, the next step is to read the partition entry’s data to find out where the partition starts, and then load the partition’s first sector from disk (some resources say reading just one sector is a waste of a good interrupt, and that multiple sectors should be read–it doesn’t harm anything) into memory at address 0000:07C0. (This, incidentally, is why the first thing an MBR has to do is relocate itself somewhere else, because it too is initially loaded at 0x7C00.)

This is where it gets a little tricky. Anyone who has been around Linux for a while can remember back to when best practice was to always, always, always make sure your kernel was located somewhere below the first 8GiB of the disk (I hate using the “proper” terms for binary powers of bytes–it just seems weird, but I guess it is also correct. If I flip-flop between the two, it’s probably not intentional). The reason for this restriction was because there are two ways to read from a disk: using CHS (Cylinder / Head / Sector) or using LBA (Logical Block Addressing), and not all BIOSes supported using LBA. When using the CHS method, the BIOS is limited to reading just about 8GiB of the disk (well, using “logical” CHS, not “physical” CHS, which was limited to ~504MB). At some point, Western Digital and Phoenix Technologies came up with the “BIOS Enhanced Disk Services” (EDD) standard, which added some extensions to allow the BIOS to read a disk using LBA instead of CHS. This allows for quite a bit larger drives to be read (128GiB using 28-bit addressing; 128PiB for 48-bit; and 8ZiB for 64-bit LBA, assuming 512-byte sectors).

The EDD standard has been around since the mid-90s (I believe), so in theory it’s a very rare occurrence nowadays to find a BIOS that doesn’t support reading using LBA. However, good practice is to test for its availability and fall-back to using CHS if the extensions aren’t supported. So far my MBR practices half of that–the “checking” part. If the extensions aren’t present, it just bails instead of falling back to CHS. (Note: these will have to work their way in at some point, because you can’t use LBA extensions to boot from a floppy. Yes, it’s going to support floppies.) By the way, the interrupt used to read data from a disk is INT 13h, in case you have the need to read a disk in real mode anytime soon.

So, at this point my bootloader does all of this. There are still a few things I want to build in, but it’s essentially done: I can read the partition table, find the active partition, read that partition’s first sector into memory, and pass control to (what is hopefully) the partition’s volume boot record. Things that I still want to do in the MBR code:

Validate that the second stage bootloader has the correct signature (0xAA55) at the end, and bail if it doesn’t;
Take into account BIOSes that don’t support reading via LBA for fixed disks and booting off of floppies by falling-back to CHS INT13h calls;
Reduce the bloat (never thought I’d say that about code with a total length of 415 BYTES) by streamlining boot and error messages, etc.

I have a stub of a volume boot record in my tree at the moment that just prints out “Hello World!” just to make sure that I can make the jump from MBR to VBR. The VBR also contains the BIOS Parameter Block (BPB) and the “Extended BPB” data structures. These structures are supposed to “[describe] the physical layout of a data storage volume” (from the Wikipedia article for the BPB) and are mostly used for the FAT and NTFS file systems. However, that structure is also present, e.g., in OpenBSD’s biosboot.S boot record with the following comment:

/*
* BIOS Parameter Block.  Read by many disk utilities.
*
* We would have liked biosboot to go from the superblock to
* the root directory to the inode for /boot, thence to read
* its blocks into memory.
*
* As code and data space is quite tight in the 512-byte
* partition boot sector, we instead get installboot to pass
* us some pre-processed fields.
*
* We would have liked to put these in the BIOS parameter block,
* as that seems to be the right place to put them (it's really
* the equivalent of the superblock for FAT filesystems), but
* caution prevents us.
*
* For now, these fields are either directly in the code (when they
* are used once only) or at the end of this sector.
*
/

If the OpenBSD guys are hesitant to touch it, then it’s probably a good idea to leave it be. Also, since rev. 1 of this blessed operating system will probably run on some flavor of FAT (since I am led to believe that it’s probably the easiest filesystem still in active use to implement), and since FAT sort of relies on the data in that structure, it’s probably a good idea to keep it in there.

So that’s where I’m at now, and it pretty much takes care of everything in the “What’s Next” section of the previous post. Aside from the small list of things I want to implement and/or fix in the MBR, I’m well on my way to learning how to do fun things like write a filesystem driver in assembly in order to load a kernel file from disk. Actually, I haven’t quite figured out how I’m going to work this. I kind of like how OpenBSD (and I think MS-DOS) handle kernel bootstrapping. In OpenBSD, I think the bootloader loads a file (by getting passed the location of the file on-disk via the installboot utility) that does the heavy-lifting. In MS-DOS, the bootloader loads IO.SYS (the on-disk location of which I believe is hard-coded into the boot sector code by the SYS.COM utility) which then bootstraps things. So, I’ll either have to parse a file system in assembly to find the kernel, or have the values handed to me. No wonder everyone seems to choose the latter.