INF [34]151 FAQ

Project 1 (P1)

What is a boot image?

A boot image file contains the raw (binary) code for the bootsector and the raw machine code for whatever the boot sector will load when it is read by the BIOS and executed. In other words, it contains an "image" of code/data that will be loaded into memory.

What is a IA32 code/data/stack segment?

This is a part of memory reserved for code, data or stack values. In order for any program to run correctly, one needs to set aside at least one segment for its code (the instructions), one segment for the data it reads/writes and one segment for stack usage (push/pop and function call/return). In real mode, these segments are limited to 64KB in size (an annoying constraint), while in protected mode these segments can have arbitrary sizes. You can read more about segments in "IA32 Software developers manual Vol 1., Chapter 3: Basic execution environment". Please note that the nasm documentation refers to sections as "segments" (These have nothing to do with IA32 segments)!

How much assembly should I learn?

It is recommended that you learn as much as you can, since understanding how the machine works at this level is necessary in order to understand some of the issues operating systems have to deal with. Also, assembly language is the only way to program some parts of the hardware directly (as operating systems must, from time to time). For this course relevant information can be found in: In addition, the documentation for nasm and as may be helpful. This documentation can be read online with the GNU info system, e.g. "info as". To learn about GNU info, try "info info". There should be info documentation available for "nasm" too.

For P1, it is a good idea to read:

What are sectors and tracks?

Sectors, tracks, blocks and cylinders are all terms that describe the way data is physically organized on floppy disks (the same terminology applies to hard disks as well). A floppy disk consists of a magnetic disk. On each side of the disk, the data is organized into separate "rings", called tracks. Each track is further divided into sectors. Each sector holds 512 Bytes worth of data. The floppy disk drive has two heads that are used to read/write data on each side (see the figure). The position of the heads relative to each other is fixed. Thus two vertically adjacent tracks form an imaginary cylinder. For floppy drives, blocks and sectors are synonyms.

To read data, one needs to instruct the BIOS to read a specific number of sectors from a specific head, track and sector position.
More information can be found in "The Undocumented PC". Floppy disk layout

What is the BIOS?

The BIOS (Basic Input Output System) is nonvolatile memory (typically flash memory on modern PC's) that holds the boot-up instruction sequence. This instruction sequence performs (among other things) the following tasks: In addition, the BIOS on PC's can be used to configure the various features of modern motherboards. On a PC, the BIOS also has various functions that can be called directly in real mode. The BIOS functions can be invoked via the int instruction. Unfortunately, this mapping is a mistake.

How can I access a USB flash memory?

A USB flash memory can be accessed the same way a floppy disk is using the BIOS read/write sectors function. Although, the flash memory does not have any sectors, cylinder, or tracks, the BIOS uses the sector addressing schema to access the flash memory. In order to obtain the number of sectors, cylinders, and tracks of the USB memory, the BIOS read device parameters function can be used.

How is the first 1MB of memory on a PC organized?

Some areas of the first Megabyte of memory is reserved for use by hardware devices, such as the BIOS and the graphics adapter. In general, the area 0x00500-0x80000 is available. The other areas must not be written to (unless you know what you are doing, that is). Note that you can write to memory in the area 0xa0000 directly in P1. This area is used by the graphics adapter. If you write certain values to this area correctly, you should be able to make characters appear on the screen (see the P1 description for a brief description). Note that all addresses given here are physical addresses!.

How did you choose the stack position and size?

The stack position was chosen arbitrarily from the available memory. The size of this stack was chosen so that would be sufficient for the code using it (no stack overflow should occur).

What is real mode?

Real mode (also known as 8086 emulation mode), is the mode an IA32 processor is in when the machine has read the bootsector into memory. Thus the bootsector code executes in real mode. This processor mode puts the CPU into environment that resembles the old Intel 8086. The 8086 mode has some annoying limitations that one needs to know about, such as segmented addressing with fixed sized segments. All modern IA32 CPUs support a technically superior mode of operation, called protected mode. However, since the machine starts up in real mode by default, we still need to deal with real mode (also, understanding real mode makes it easier to grasp protected mode). Another big shortcoming of real mode is that it has no way of protecting one process from other processes or the kernel. This is the main reason why no modern operating system runs in real mode.

How does addressing work in real mode?

The Intel 8086 can address up to 1MB of memory (using 20 address lines). However, since all data (for a 8086), including addresses are 16 bits, it is clear than such an address cannot be the final 20 bit address. To work around this, Intel came up with a rather odd segmented memory model. The 16 bit address is in fact an offset relative to a particular address. Even if the Intel manuals speak of addresses and effective address calculation, all addresses are really offsets (even in protected mode!). In real mode, the addressing works as follows:

One register addresses memory in 16 byte increments (every 16 bytes of memory is called a paragraph). Another register acts as a positive offset from the start of a paragraph address. The first part is often referred to as the "segment" part of an address, while the other part is simply called offset. The way a physical address is formed in this scheme can be described by the following equation:

segment * 16 + offset = physical address

The address formed by a combination of segment and offset is called a logical address, and is often written as segment:offset. Note that since the offset is 16 bits, the largest memory area that can be covered when holding the segment part fixed is 64KB. Thus it is common to talk about the "64KB segment limit" of real mode.

The really strange thing about this kind of addressing is that many logical addresses can all specify the same address! Consider the examples below (all addresses in hex):

0100:0000 = 1000 ((100<<4) + 0 = 1000)
0000:1000 = 1000 ((0<<4) + 1000 = 1000)
00ff:0010 = 1000 ((ff<<4) + 10 = ff0 + 10 = 1000)
(I'm sure you can come up with more examples yourself)

In order to address a byte (or word) in memory, the programmer will typically load a segment register with the address of a 64KB segment in which the byte resides in, and then specify the offset to the byte from that segment, either directly (encoding the offset as a constant in the instruction), or by using any of the other ways to specify an address (an offset, really) in the instruction. As seen in the examples above, where these segments begin is a matter of choice (paragraph addresses 100, 0 and ff all worked for the byte 4KB from the start of memory).

Note that the segment part addresses a paragraph (16 bytes), and thus needs to be multiplied by 16 (a left shift by 4), before the offset is added to it. Thus a segment register should always hold a paragraph address, never a byte address!

Addressing in real mode

How do I jump to code in another code segment?

AT&T syntax: ljmp $SEGMENT,$OFFSET

nasm syntax: jmp WORD SEGMENT:OFFSET

Wouldn't a big kernel image overwrite the bootsector code during loading?

Yes, it would. It is an optional (but recommended) exercise to solve this problem in P1.

How do I test my implementation on hardware?

To test your implementation, you must first copy it to your usb drive.
  • To determine the device node of your usb pen, insert it in your computer, let it idle, then run dmesg. The last lines should show you the path of the device, for example /dev/sdh.
  • Use the dd program to transfer your image file to the usb drive. For example: dd if=image of=/dev/sdh.
  • You might have to flush the filesystem buffers. Do a flush from the command line.
  • Insert your usb drive in the lab machines at Modula (room 2443).