INF [34]151 FAQ
Project 1 (P1)
What is a boot image?
A boot image file contains the raw (binary) code for the
bootsector and the raw machine code for whatever the boot sector
will load when it is read by the BIOS and executed. In other words,
it contains an "image" of code/data that will be loaded into memory.
What is a IA32 code/data/stack segment?
This is a part of memory reserved for code, data or stack values. In
order for any program to run correctly, one needs to set aside at
least one segment for its code (the instructions), one segment for the
data it reads/writes and one segment for stack usage (push/pop and
function call/return). In real mode, these segments are limited to
64KB in size (an annoying constraint), while in protected mode these
segments can have arbitrary sizes. You can read more about segments in
"IA32 Software developers manual Vol 1., Chapter 3:
Basic execution environment". Please note that the nasm documentation
refers to sections as "segments" (These have nothing to do with IA32 segments)!
How much assembly should I learn?
It is recommended that you learn as much as you can, since
understanding how the machine works at this level is necessary in
order to understand some of the issues operating systems have to deal
with. Also, assembly language is the only way to program some parts of
the hardware directly (as operating systems must, from time to time).
For this course relevant information can be found in:
- IA32 Software developers manual Volume 1:
- Chapter 3: Basic execution environment
- Chapter 4: Data types, sections 4.1-4.5
- Chapter 5: Instruction set summary, section 5.1
- Chapter 6: Procedure calls, Interrupts and exceptions
- Chapter 7: Programming with the General-purpose instructions
- IA32 Software developers manual Volume 2:
- Chapter 3: Instruction set reference, section 3.2
- IA32 Software developers manual Volume 3:
- Chapter 4: Protected-Mode memory management
- Chapter 5: Interrupt and exception handling
- Chapter 16: 8086 Emulation, section 16.1
- PC Assembly language (by Paul A. Carter), chapters 1-5
In addition, the documentation for nasm and as may be helpful. This
documentation can be read online with the GNU info system, e.g. "info
as". To learn about GNU info, try "info info". There should be info
documentation available for "nasm" too.
For P1, it is a good idea to read:
- PC Assembly language (by Paul A. Carter), chapters 1-5
- IA32 Vol 1. Chapter 3: Basic execution environment
- IA32 Vol 1. Chapter 4: Data types, sections 4.1-4.5
- IA32 Vol 1. Chapter 5: Instruction set summary, section 5.1
- IA32 Vol 3. Chapter 16: 8086 Emulation, section 16.1
What are sectors and tracks?
Sectors, tracks, blocks and cylinders are all terms that describe the
way data is physically organized on floppy disks (the same
terminology applies to hard disks as well). A floppy disk consists of
a magnetic disk. On each side of the disk, the data is organized into
separate "rings", called tracks. Each track is further divided into
sectors. Each sector holds 512 Bytes worth of data. The floppy disk
drive has two heads that are used to read/write data on each side (see
the figure). The position of the heads relative to each other is
fixed. Thus two vertically adjacent tracks form an imaginary cylinder.
For floppy drives, blocks and sectors are synonyms.
To read data, one needs to instruct the BIOS to read a specific number
of sectors from a specific head, track and sector position.
More information can be found in "The Undocumented PC".
What is the BIOS?
The BIOS (Basic Input Output System) is nonvolatile memory (typically
flash memory on modern PC's) that holds the boot-up instruction
sequence. This instruction sequence performs (among other things) the
following tasks:
- Initialize IO ports
- Set up the interrupt table
- Load boot sector from disk storage (Harddisk, CD-ROM, floppy, etc.)
In addition, the BIOS on PC's can be used to configure the various
features of modern motherboards. On a PC, the BIOS also has various
functions that can be called directly in real mode. The BIOS functions
can be invoked via the int instruction. Unfortunately, this
mapping is a mistake.
How can I access a USB flash memory?
A USB flash memory can be accessed the same way a floppy disk is using
the BIOS read/write sectors function. Although, the flash memory does not
have any sectors, cylinder, or tracks, the BIOS uses the sector addressing
schema to access the flash memory. In order to obtain the number of sectors,
cylinders, and tracks of the USB memory, the BIOS read device parameters
function can be used.
How is the first 1MB of memory on a PC organized?
Some areas of the first Megabyte of memory is reserved for use by
hardware devices, such as the BIOS and the graphics adapter. In
general, the area 0x00500-0x80000 is available. The other areas must
not be written to (unless you know what you are doing, that is). Note
that you can write to memory in the area 0xa0000 directly in P1. This
area is used by the graphics adapter. If you write certain values to
this area correctly, you should be able to make characters appear on
the screen (see the P1 description for a brief description). Note that
all addresses given here are physical addresses!.
How did you choose the stack position and size?
The stack position was chosen arbitrarily from the available
memory. The size of this stack was chosen so that would be sufficient
for the code using it (no stack overflow should occur).
What is real mode?
Real mode (also known as 8086 emulation mode), is the mode an IA32
processor is in when the machine has read the bootsector into
memory. Thus the bootsector code executes in real mode. This processor
mode puts the CPU into environment that resembles the old Intel
8086. The 8086 mode has some annoying limitations that one needs to
know about, such as segmented addressing with fixed sized
segments. All modern IA32 CPUs support a technically superior mode of
operation, called protected mode. However, since the machine
starts up in real mode by default, we still need to deal with real
mode (also, understanding real mode makes it easier to grasp protected
mode). Another big shortcoming of real mode is that it has no way of
protecting one process from other processes or the kernel. This is the
main reason why no modern operating system runs in real mode.
How does addressing work in real mode?
The Intel 8086 can address up to 1MB of memory (using 20 address
lines). However, since all data (for a 8086), including
addresses are 16 bits, it is clear than such an address cannot
be the final 20 bit address. To work around this, Intel came up with a
rather odd segmented memory model. The 16 bit address is in fact
an offset relative to a particular address. Even if the Intel
manuals speak of addresses and effective address calculation, all
addresses are really offsets (even in protected mode!). In real
mode, the addressing works as follows:
One register addresses memory in 16 byte increments (every 16 bytes of
memory is called a paragraph). Another register acts as a positive
offset from the start of a paragraph address. The first part is often
referred to as the "segment" part of an address, while the other part
is simply called offset. The way a physical address is formed in this
scheme can be described by the following equation:
segment * 16 + offset = physical address
The address formed by a combination of segment and offset is called a
logical address, and is often written as segment:offset. Note
that since the offset is 16 bits, the largest memory area that can be
covered when holding the segment part fixed is 64KB. Thus it is common
to talk about the "64KB segment limit" of real mode.
The really strange thing about this kind of addressing is that many
logical addresses can all specify the same address!
Consider the examples below (all addresses in hex):
0100:0000 = 1000 ((100<<4) + 0 = 1000)
0000:1000 = 1000 ((0<<4) + 1000 = 1000)
00ff:0010 = 1000 ((ff<<4) + 10 = ff0 + 10 = 1000)
(I'm sure you can come up with more examples yourself)
In order to address a byte (or word) in memory, the programmer will
typically load a segment register with the address of a 64KB segment
in which the byte resides in, and then specify the offset to the byte
from that segment, either directly (encoding the offset as a constant
in the instruction), or by using any of the other ways to specify an
address (an offset, really) in the instruction. As seen in the
examples above, where these segments begin is a matter of choice
(paragraph addresses 100, 0 and ff all worked for the byte 4KB from
the start of memory).
Note that the segment part addresses a paragraph (16 bytes),
and thus needs to be multiplied by 16 (a left shift by 4),
before the offset is added to it. Thus a segment register
should always hold a paragraph address, never a byte address!
How do I jump to code in another code segment?
AT&T syntax: ljmp $SEGMENT,$OFFSET
nasm syntax: jmp WORD SEGMENT:OFFSET
Wouldn't a big kernel image overwrite the bootsector code during loading?
Yes, it would. It is an optional (but recommended) exercise to solve
this problem in P1.
How do I test my implementation on hardware?
To test your implementation, you must first copy it to your usb drive.
To determine the device node of your usb pen, insert it in your computer, let it idle, then run dmesg. The last lines should show you the path of the device, for example /dev/sdh.
Use the dd program to transfer your image file to the usb drive. For example: dd if=image of=/dev/sdh.
You might have to flush the filesystem buffers. Do a flush from the command line.
Insert your usb drive in the lab machines at Modula (room 2443).