INF [34]151 FAQ

Project 2 (P2)

Why don't the standard C library functions work?

C programs (as defined by ISO) can be written to run in two different environments: While this distinction may appear strange at first, it is reasonable. Since most of the functions in the C library rely on abstractions provided by the operating system (e.g. fopen, printf, etc.), they cannot work in a freestanding environment. Since we are writing an OS, we must write C for a freestanding environment. Of course, we are free to write our own (perhaps simplified) version of printf if we wish.

The list of header files above deserves some explanation. There are two standards that you should know about. The old one (C89), and the new one (C99). The old one is supported by most modern compilers (including GCC). The new one (C99) is not supported by many compilers yet. Note that the support for C99 is not complete in current versions of GCC (3.3 at the time of writing), so it is safest to stick to the older standard (C89). This is generally not a problem, since C99 in general only defines additional (new) features to C89. For the full story, see the Standards node in the GCC info documentation.

How do I use pointers in C?

http://pw1.netcom.com/~tjensen/ptr/pointers.htm

Should/can I use extended assembly (also known as "inline assembly")?

In general, you may use any mechanism that you find useful to achieve what you want. However, extended assembly is an extension to the C programming language, which means that compilers that offer this (common) extension can (and will) do things their own way. Different compilers may require different syntax and may emit different instruction sequences. Also, the way these assembly instructions interact with their surrounding C code is non-trivial. You should definitely read the compilers documentation before using it. For GCC, the following tips may be helpful: A much easier way to use assembly in your kernel is to write the assembly code as a function in its own file (*.s for as and *.asm for nasm), and then call this function from C. This means that you must follow the C calling conventions.

Why don't we use interrupts when making a system call?

Because we haven't implemented hardware enforced separation between user and kernel mode in P2. This will be done in later projects.

How does one guarantee atomicity in a lock implementation?

One needs hardware support. See the "IA32 Software developers manual Vol 3., Chapter 7".

How does addressing work in protected mode?

Please note that the following explanation is a gross simplification and omits a lot of details. For the full story, consult the Intel manuals. The most important changes with regard to addressing from real mode to protected mode are:
  1. The segments can start anywhere within the 32 bit address space
  2. The segments can have any size within this 32 bit address space
  3. Offsets can be 32 bits
In order to describe such segments special data structures called segment descriptors are needed. These describe (among other things) the start address and the size of each segment. The segment descriptors are collected in a table called a "segment descriptor table". You can have one table per process (a "local" descriptor table), and one that is common to all processes (a "global" descriptor table). In this project, we will only make use of one (global) descriptor table.

The role of the segment registers has changed in protected mode. Instead of holding paragraph addresses that identify the start of a segment, it now holds (among other things) an index into a segment descriptor table.

In order to use a segment, one needs to:

  1. Initialise an entry in a segment descriptor table (i.e. describe the segment)
  2. Load a segment register with the index of that entry
Although segmentation (the translation from logical to physical addresses) cannot be turned off for the IA32 processors, one can make the segment part irrelevant in practise. This is done by setting up one large segment. This segment starts at address 0 and is 4GB in size. Then one simply makes all the segment registers "index" this one segment. The result is something that for all practical purposes is a linear address space. This means that all instructions can use a 32 bit offset as if this was a "linear address". Many operating systems for the IA32 do things this way, and this is what the code and data in "bootblock.s" sets up.

So the bottom line is that from now on we can regard the address space as one linear sequence of bytes that is 4GB in size (32 bits).

How does one access C variables and functions from assembler?

This is really quite simple. A global variable is simply accessed as any other assembly language symbol (label). If a function is to be called, one needs to follow the C calling conventions (described in "PC assembly language", chapter 4). So arguments to the function are pushed on the stack, some register values are saved (if needed) and then the function is called. The argument to the call instruction is simply the name of the function as it appears in C. For the GNU assembler (as), this is all that is needed (as it will take all undefined symbols as externally defined). For the Netwide assembler (nasm), one also needs to say explicitly that the symbols are external with the "extern" directive.

Accessing members of a C "struct" is slightly more complicated, and is discussed in a separate question in this FAQ.

What should I read for P2?

How does one access the members of a struct in assembly?

In general, there is no problem accessing most of the things that C creates from assembly. Functions and global variables are exported (unless they are declared to have internal linkage) and the linker will resolve the use of symbols for us (in the NASM assembler, one would have to say "extern symbol" to use such a symbol; the GNU assembler assumes that all undefined symbols are external so no directive is needed). But struct members are another story. The C compiler will implement access to the members as an constant offset in the code. For example, given the declaration:
struct example {
     int member_a;
     int member_b;
     int member_c;
     long member_d;
};
The following C code fragment:
     struct example ex; 

     ...

     ex.member_a = 3;
     ex.member_c = 4;
would be translated into something along the lines of (assuming that the start address of the struct is in eax):
     movl $3,(%eax)
     movl $4,8(%eax)

This means that "member_a" has offset 0 and "member_c" has offset 8. You may now think that by knowing the sizes of "int", "long", etc. and simply adding the sizes of the members before the member you want to access (here int has size 4, apparently) will give you the offset. This might work, but it might not. The reason it might not work is that the C compiler may pad your struct as it sees fit to make certain members aligned (try inserting a member of type "char" and observe the changes to the offsets). Standard C (both C99 and C89) only guarantee that the first member has the same address as the struct itself (offset 0), and that the members come in the same order as they are given. But it also makes it legal for a compiler to pad the struct as it likes (this is the reason why one should always use sizeof(struct X) to find the size of a struct with tag "X").

Since the compiler may pad the struct, relying on hard-coded constants in the assembly code is a bad solution, since any changes in the struct would require manually recomputing and changing the constants in the assembly code as well (something that is easy to forget or do incorrectly). It is also (as already mentioned) based on guessing about the layout.

The best solution (I know of) is to ask the C compiler to reveal the constants in a way that:

  1. Is as automatic as possible
  2. Works with any (standard C) compiler
The solution involves the use the C macro offsetof (found in the <stddef.h> header file) with the struct members we are interested in. This will reveal their offset in a way that works with any compiler (since it is standard C).

We now have a way of getting the offset of any struct member we want to access, like this:

 offsetof(struct example, member_c) 
All we now need to to is to make a small C program that:
  1. Includes the definitions of the structs we need to access from assembly
  2. Outputs the constant in way that is convenient for assembly language programs.

So, if the header file that defined "struct example" is called "example.h" we could do something like this:

#include "example.h"
And then do something like this:
     printf("#define EXAMPLE_MEMBER_C %ld\n",
            offsetof(struct example, member_c));
     printf("#define EXAMPLE_MEMBER_A %ld\n",
            offsetof(struct example, member_a));
     ...
If we now redirect this output into a header file, e.g. "example_defs.h", and (ab)use the C preprocessor to preprocess assembly code, we could write the following in assembly:
#include "example_defs.h"
and then use the constants like this in our assembly code:
     movl $3,EXAMPLE_MEMBER_A(%eax)
     movl $4,EXAMPLE_MEMBER_C(%eax)

The only thing left to take care of is to make sure that the header file "example_defs.h" is remade whenever the structs we access change, and that the assembly code is reassembled whenever this file is (re)created. This is a task that GNU "make" can help us with. We can ask GNU "make" to do this by telling it:

Now we should have no trouble accessing struct members from assembly. The only two things we need to do to access a struct member is to make sure that there is a print statement corresponding to the member we want to get at in "asmdefs.c", and that we include the correct header file in our assembly language code.