NASM - The Netwide Assembler

Chapter 9: Writing 32-bit Code (Unix, Win32, DJGPP)

This chapter attempts to cover some of the common issues involved when writing 32-bit code, to run under Win32 or Unix, or to be linked with C code generated by a Unix-style C compiler such as DJGPP. It covers how to write assembly code to interface with 32-bit C routines, and how to write position-independent code for shared libraries.

Almost all 32-bit code, and in particular all code running under Win32, DJGPP or any of the PC Unix variants, runs in flat memory model. This means that the segment registers and paging have already been set up to give you the same 32-bit 4Gb address space no matter what segment you work relative to, and that you should ignore all segment registers completely. When writing flat-model application code, you never need to use a segment override or modify any segment register, and the code-section addresses you pass to CALL and JMP live in the same address space as the data-section addresses you access your variables by and the stack-section addresses you access local variables and procedure parameters by. Every address is 32 bits long and contains only an offset part.

9.1 Interfacing to 32-bit C Programs

A lot of the discussion in section 8.4, about interfacing to 16-bit C programs, still applies when working in 32 bits. The absence of memory models or segmentation worries simplifies things a lot.

9.1.1 External Symbol Names

Most 32-bit C compilers share the convention used by 16-bit compilers, that the names of all global symbols (functions or data) they define are formed by prefixing an underscore to the name as it appears in the C program. However, not all of them do: the ELF specification states that C symbols do not have a leading underscore on their assembly-language names.

The older Linux a.out C compiler, all Win32 compilers, DJGPP, and NetBSD and FreeBSD, all use the leading underscore; for these compilers, the macros cextern and cglobal, as given in section 8.4.1, will still work. For ELF, though, the leading underscore should not be used.

9.1.2 Function Definitions and Function Calls

The C calling convention in 32-bit programs is as follows. In the following description, the words caller and callee are used to denote the function doing the calling and the function which gets called.

The caller pushes the function's parameters on the stack, one after another, in reverse order (right to left, so that the first argument specified to the function is pushed last).
The caller then executes a near CALL instruction to pass control to the callee.
The callee receives control, and typically (although this is not actually necessary, in functions which do not need to access their parameters) starts by saving the value of ESP in EBP so as to be able to use EBP as a base pointer to find its parameters on the stack. However, the caller was probably doing this too, so part of the calling convention states that EBP must be preserved by any C function. Hence the callee, if it is going to set up EBP as a frame pointer, must push the previous value first.
The callee may then access its parameters relative to EBP. The doubleword at [EBP] holds the previous value of EBP as it was pushed; the next doubleword, at [EBP+4], holds the return address, pushed implicitly by CALL. The parameters start after that, at [EBP+8]. The leftmost parameter of the function, since it was pushed last, is accessible at this offset from EBP; the others follow, at successively greater offsets. Thus, in a function such as printf which takes a variable number of parameters, the pushing of the parameters in reverse order means that the function knows where to find its first parameter, which tells it the number and type of the remaining ones.
The callee may also wish to decrease ESP further, so as to allocate space on the stack for local variables, which will then be accessible at negative offsets from EBP.
The callee, if it wishes to return a value to the caller, should leave the value in AL, AX or EAX depending on the size of the value. Floating-point results are typically returned in ST0.
Once the callee has finished processing, it restores ESP from EBP if it had allocated local stack space, then pops the previous value of EBP, and returns via RET (equivalently, RETN).
When the caller regains control from the callee, the function parameters are still on the stack, so it typically adds an immediate constant to ESP to remove them (instead of executing a number of slow POP instructions). Thus, if a function is accidentally called with the wrong number of parameters due to a prototype mismatch, the stack will still be returned to a sensible state since the caller, which knows how many parameters it pushed, does the removing.

There is an alternative calling convention used by Win32 programs for Windows API calls, and also for functions called by the Windows API such as window procedures: they follow what Microsoft calls the __stdcall convention. This is slightly closer to the Pascal convention, in that the callee clears the stack by passing a parameter to the RET instruction. However, the parameters are still pushed in right-to-left order.

Thus, you would define a function in C style in the following way:

global  _myfunc 

_myfunc: 
        push    ebp 
        mov     ebp,esp 
        sub     esp,0x40        ; 64 bytes of local stack space 
        mov     ebx,[ebp+8]     ; first parameter to function 

        ; some more code 

        leave                   ; mov esp,ebp / pop ebp 
        ret

At the other end of the process, to call a C function from your assembly code, you would do something like this:

extern  _printf 

        ; and then, further down... 

        push    dword [myint]   ; one of my integer variables 
        push    dword mystring  ; pointer into my data segment 
        call    _printf 
        add     esp,byte 8      ; `byte' saves space 

        ; then those data items... 

segment _DATA 

myint       dd   1234 
mystring    db   'This number -> %d <- should be 1234',10,0

This piece of code is the assembly equivalent of the C code

    int myint = 1234; 
    printf("This number -> %d <- should be 1234\n", myint);

9.1.3 Accessing Data Items

To get at the contents of C variables, or to declare variables which C can access, you need only declare the names as GLOBAL or EXTERN. (Again, the names require leading underscores, as stated in section 9.1.1.) Thus, a C variable declared as int i can be accessed from assembler as

          extern _i 
          mov eax,[_i]

And to declare your own integer variable which C programs can access as extern int j, you do this (making sure you are assembling in the _DATA segment, if necessary):

          global _j 
_j        dd 0

To access a C array, you need to know the size of the components of the array. For example, int variables are four bytes long, so if a C program declares an array as int a[10], you can access a[3] by coding mov ax,[_a+12]. (The byte offset 12 is obtained by multiplying the desired array index, 3, by the size of the array element, 4.) The sizes of the C base types in 32-bit compilers are: 1 for char, 2 for short, 4 for int, long and float, and 8 for double. Pointers, being 32-bit addresses, are also 4 bytes long.

To access a C data structure, you need to know the offset from the base of the structure to the field you are interested in. You can either do this by converting the C structure definition into a NASM structure definition (using STRUC), or by calculating the one offset and using just that.

To do either of these, you should read your C compiler's manual to find out how it organizes data structures. NASM gives no special alignment to structure members in its own STRUC macro, so you have to specify alignment yourself if the C compiler generates it. Typically, you might find that a structure like

struct { 
    char c; 
    int i; 
} foo;

might be eight bytes long rather than five, since the int field would be aligned to a four-byte boundary. However, this sort of feature is sometimes a configurable option in the C compiler, either using command-line options or #pragma lines, so you have to find out how your own compiler does it.

9.1.4 `c32.mac`: Helper Macros for the 32-bit C Interface

Included in the NASM archives, in the misc directory, is a file c32.mac of macros. It defines three macros: proc, arg and endproc. These are intended to be used for C-style procedure definitions, and they automate a lot of the work involved in keeping track of the calling convention.

An example of an assembly function using the macro set is given here:

proc    _proc32 

%$i     arg 
%$j     arg 
        mov     eax,[ebp + %$i] 
        mov     ebx,[ebp + %$j] 
        add     eax,[ebx] 

endproc

This defines _proc32 to be a procedure taking two arguments, the first (i) an integer and the second (j) a pointer to an integer. It returns i + *j.

Note that the arg macro has an EQU as the first line of its expansion, and since the label before the macro call gets prepended to the first line of the expanded macro, the EQU works, defining %$i to be an offset from BP. A context-local variable is used, local to the context pushed by the proc macro and popped by the endproc macro, so that the same argument name can be used in later procedures. Of course, you don't have to do that.

arg can take an optional parameter, giving the size of the argument. If no size is given, 4 is assumed, since it is likely that many function parameters will be of type int or pointers.

9.2 Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries

ELF replaced the older a.out object file format under Linux because it contains support for position-independent code (PIC), which makes writing shared libraries much easier. NASM supports the ELF position-independent code features, so you can write Linux ELF shared libraries in NASM.

NetBSD, and its close cousins FreeBSD and OpenBSD, take a different approach by hacking PIC support into the a.out format. NASM supports this as the aoutb output format, so you can write BSD shared libraries in NASM too.

The operating system loads a PIC shared library by memory-mapping the library file at an arbitrarily chosen point in the address space of the running process. The contents of the library's code section must therefore not depend on where it is loaded in memory.

Therefore, you cannot get at your variables by writing code like this:

        mov     eax,[myvar]             ; WRONG

Instead, the linker provides an area of memory called the global offset table, or GOT; the GOT is situated at a constant distance from your library's code, so if you can find out where your library is loaded (which is typically done using a CALL and POP combination), you can obtain the address of the GOT, and you can then load the addresses of your variables out of linker-generated entries in the GOT.

The data section of a PIC shared library does not have these restrictions: since the data section is writable, it has to be copied into memory anyway rather than just paged in from the library file, so as long as it's being copied it can be relocated too. So you can put ordinary types of relocation in the data section without too much worry (but see section 9.2.4 for a caveat).

9.2.1 Obtaining the Address of the GOT

Each code module in your shared library should define the GOT as an external symbol:

extern  _GLOBAL_OFFSET_TABLE_   ; in ELF 
extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out

At the beginning of any function in your shared library which plans to access your data or BSS sections, you must first calculate the address of the GOT. This is typically done by writing the function in this form:

func:   push    ebp 
        mov     ebp,esp 
        push    ebx 
        call    .get_GOT 
.get_GOT: 
        pop     ebx 
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc 

        ; the function body comes here 

        mov     ebx,[ebp-4] 
        mov     esp,ebp 
        pop     ebp 
        ret

(For BSD, again, the symbol _GLOBAL_OFFSET_TABLE requires a second leading underscore.)

The first two lines of this function are simply the standard C prologue to set up a stack frame, and the last three lines are standard C function epilogue. The third line, and the fourth to last line, save and restore the EBX register, because PIC shared libraries use this register to store the address of the GOT.

The interesting bit is the CALL instruction and the following two lines. The CALL and POP combination obtains the address of the label .get_GOT, without having to know in advance where the program was loaded (since the CALL instruction is encoded relative to the current position). The ADD instruction makes use of one of the special PIC relocation types: GOTPC relocation. With the WRT ..gotpc qualifier specified, the symbol referenced (here _GLOBAL_OFFSET_TABLE_, the special symbol assigned to the GOT) is given as an offset from the beginning of the section. (Actually, ELF encodes it as the offset from the operand field of the ADD instruction, but NASM simplifies this deliberately, so you do things the same way for both ELF and BSD.) So the instruction then adds the beginning of the section, to get the real address of the GOT, and subtracts the value of .get_GOT which it knows is in EBX. Therefore, by the time that instruction has finished, EBX contains the address of the GOT.

If you didn't follow that, don't worry: it's never necessary to obtain the address of the GOT by any other means, so you can put those three instructions into a macro and safely ignore them:

%macro  get_GOT 0 

        call    %%getgot 
  %%getgot: 
        pop     ebx 
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc 

%endmacro

9.2.2 Finding Your Local Data Items

Having got the GOT, you can then use it to obtain the addresses of your data items. Most variables will reside in the sections you have declared; they can be accessed using the ..gotoff special WRT type. The way this works is like this:

        lea     eax,[ebx+myvar wrt ..gotoff]

The expression myvar wrt ..gotoff is calculated, when the shared library is linked, to be the offset to the local variable myvar from the beginning of the GOT. Therefore, adding it to EBX as above will place the real address of myvar in EAX.

If you declare variables as GLOBAL without specifying a size for them, they are shared between code modules in the library, but do not get exported from the library to the program that loaded it. They will still be in your ordinary data and BSS sections, so you can access them in the same way as local variables, using the above ..gotoff mechanism.

Note that due to a peculiarity of the way BSD a.out format handles this relocation type, there must be at least one non-local symbol in the same section as the address you're trying to access.

9.2.3 Finding External and Common Data Items

If your library needs to get at an external variable (external to the library, not just to one of the modules within it), you must use the ..got type to get at it. The ..got type, instead of giving you the offset from the GOT base to the variable, gives you the offset from the GOT base to a GOT entry containing the address of the variable. The linker will set up this GOT entry when it builds the library, and the dynamic linker will place the correct address in it at load time. So to obtain the address of an external variable extvar in EAX, you would code

        mov     eax,[ebx+extvar wrt ..got]

This loads the address of extvar out of an entry in the GOT. The linker, when it builds the shared library, collects together every relocation of type ..got, and builds the GOT so as to ensure it has every necessary entry present.

Common variables must also be accessed in this way.

9.2.4 Exporting Symbols to the Library User

If you want to export symbols to the user of the library, you have to declare whether they are functions or data, and if they are data, you have to give the size of the data item. This is because the dynamic linker has to build procedure linkage table entries for any exported functions, and also moves exported data items away from the library's data section in which they were declared.

So to export a function to users of the library, you must use

global  func:function           ; declare it as a function 

func:   push    ebp 

        ; etc.

And to export a data item such as an array, you would have to code

global  array:data array.end-array      ; give the size too 

array:  resd    128 
.end:

Be careful: If you export a variable to the library user, by declaring it as GLOBAL and supplying a size, the variable will end up living in the data section of the main program, rather than in your library's data section, where you declared it. So you will have to access your own global variable with the ..got mechanism rather than ..gotoff, as if it were external (which, effectively, it has become).

Equally, if you need to store the address of an exported global in one of your data sections, you can't do it by means of the standard sort of code:

dataptr:        dd      global_data_item        ; WRONG

NASM will interpret this code as an ordinary relocation, in which global_data_item is merely an offset from the beginning of the .data section (or whatever); so this reference will end up pointing at your data section instead of at the exported global which resides elsewhere.

Instead of the above code, then, you must write

dataptr:        dd      global_data_item wrt ..sym

which makes use of the special WRT type ..sym to instruct NASM to search the symbol table for a particular symbol at that address, rather than just relocating by section base.

Either method will work for functions: referring to one of your functions by means of

funcptr:        dd      my_function

will give the user the address of the code you wrote, whereas

funcptr:        dd      my_function wrt ..sym

will give the address of the procedure linkage table for the function, which is where the calling program will believe the function lives. Either address is a valid way to call the function.

9.2.5 Calling Procedures Outside the Library

Calling procedures outside your shared library has to be done by means of a procedure linkage table, or PLT. The PLT is placed at a known offset from where the library is loaded, so the library code can make calls to the PLT in a position-independent way. Within the PLT there is code to jump to offsets contained in the GOT, so function calls to other shared libraries or to routines in the main program can be transparently passed off to their real destinations.

To call an external routine, you must use another special PIC relocation type, WRT ..plt. This is much easier than the GOT-based ones: you simply replace calls such as CALL printf with the PLT-relative version CALL printf WRT ..plt.

9.2.6 Generating the Library File

Having written some code modules and assembled them to .o files, you then generate your shared library with a command such as

ld -shared -o library.so module1.o module2.o       # for ELF 
ld -Bshareable -o library.so module1.o module2.o   # for BSD

For ELF, if your shared library is going to reside in system directories such as /usr/lib or /lib, it is usually worth using the -soname flag to the linker, to store the final library file name, with a version number, into the library:

ld -shared -soname library.so.1 -o library.so.1.2 *.o

You would then copy library.so.1.2 into the library directory, and create library.so.1 as a symbolic link to it.