Understanding these details is important as we may be presented with assembly language that is written in one of the available syntaxes depending on the type of operating system and tools we use. See the first article in the series: What is x86 assembly?
What does x86 assembly look like?
Programmers with experience in high-level languages like Java may find it completely different to write programs in assembly language. Assembly programs typically contain instructions or mnemonics, which look as follows. The preceding excerpt shows a simple hello world program in x86 assembly language. This program prints the string “Hello, world!” and gracefully exits. As we can notice, there are several mov instructions followed by int 0x80 written here to complete the program. We will discuss more technical details about this program later, but the idea is to give a picture of how assembly programs look like. message db “Hello, world!”, 0x0a len equ $ – message section .text _start: mov eax, 4 mov ebx, 1 mov ecx, message mov edx, len int 0x80 mov eax, 1 mov ebx, 0 int 0x80
Examples of x86 assembly programming language
Low-level programs such as drivers and boot loaders may be written in assembly. Bootloader is a small piece of software that gets executed when a system boots. Once the bootloader is loaded, it gets an operating system loaded and ready for execution. Depending on the computer design, this process may slightly vary as there can be one or more additional stages in the process of boot loading. The following example shows a bootloader written in assembly language. The preceding excerpt is taken from https://en.wikibooks.org/wiki/X86_Assembly/Bootloaders and it provides a good example of how real-world software written in assembly may look like. The same link has additional examples and a detailed explanation about the program shown here. Msg: db “Hello World! “ EndMsg: Start: mov bx, 000Fh ;Page 0, colour attribute 15 (white) for the int 10 calls below mov cx, 1 ;We will want to write 1 character xor dx, dx ;Start at top left corner mov ds, dx ;Ensure ds = 0 (to let us load the message) cld ;Ensure direction flag is cleared (for LODSB) Print: mov si, Msg ;Loads the address of the first byte of the message, 7C02h in this case ;PC BIOS Interrupt 10 Subfunction 2 – Set cursor position ;AH = 2 Char: mov ah, 2 ;BH = page, DH = row, DL = column int 10h lodsb ;Load a byte of the message into AL. ;Remember that DS is 0 and SI holds the ;offset of one of the bytes of the message. ;PC BIOS Interrupt 10 Subfunction 9 – Write character and colour ;AH = 9 mov ah, 9 ;BH = page, AL = character, BL = attribute, CX = character count int 10h inc dl ;Advance cursor cmp dl, 80 ;Wrap around edge of screen if necessary jne Skip xor dl, dl inc dh cmp dh, 25 ;Wrap around bottom of screen if necessary jne Skip xor dh, dh Skip: cmp si, EndMsg ;If we’re not at end of message, jne Char ;continue loading characters jmp Print ;otherwise restart from the beginning of the message times 0200h – 2 – ($ – $$) db 0 ;Zerofill up to 510 bytes dw 0AA55h ;Boot Sector signature ;OPTIONAL: ;To zerofill up to the size of a standard 1.44MB, 3.5″ floppy disk ;times 1474560 – ($ – $$) db 0
Types of syntax used to write x86 assembly
x86 assembly language comes in two syntax flavors. Intel and AT&T. Intel syntax is predominantly used in the Windows family, while AT&T is commonly seen in the UNIX family. We will stick to intel syntax throughout our assembly language journey in this series of articles. However, let us dive into the details of these two syntaxes. Let us begin by going through the following two examples. Sample Code 1: Sample Code 2: _start: mov $0x2,%eax add $0x8,%eax add %eax,%eax sub $0x5,%eax inc %eax inc %eax dec %eax dec %eax If you closely observe the two examples shown above, they achieve the same outcome but they look different. The first program is written using AT&T syntax and the latter is written using intel syntax. _start: mov eax,0x2 add eax,0x8 add eax,eax sub eax,0x5 inc eax inc eax dec eax dec eax Let us go through some of the notable differences in these two syntaxes.
When writing programs in AT&T syntax, the first operand in the instruction is the source operand and the second operand is the destination operand. However, in intel syntax, the first operand is the destination operand and the second operand is the source operand. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2. When programs are written using AT&T syntax, the registers use the prefix % while intel syntax does not use any prefix with the registers. Similarly, Intel syntax does not use any prefixes for its immediate operand while AT&T syntax uses $ along with the hexadecimal representation using 0x. Once again the same example we used earlier can explain these differences. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2. In AT&T syntax, all opcodes have a suffix to specify the size. For example, moving an 8-bit value from the register bl to al will need the following instruction in intel syntax: mov al, bl. The same operation in AT&T syntax will be written by specifying the size as a suffix to the opcode, which looks as follows: movb %bl, %al. Notice the opcode movb.
It should be noted that we have only scratched the surface keeping beginner-level readers in mind and there are more differences between these two syntaxes. If it is confusing to read through the assembly program written in one of these syntaxes, it is easy to convert it into the other type. For example, let us assume that a program is written in at&t syntax and using objdump on this program will show the assembly instructions as follows. Clearly, the program is written in AT&T syntax and objdump is shown the same. We can display instructions in intel syntax using objdump as shown below. Disassembly of section .text: 08049000 <_start>: 8049000: b8 02 00 00 00 mov $0x2,%eax 8049005: 83 c0 08 add $0x8,%eax 8049008: 01 c0 add %eax,%eax 804900a: 83 e8 05 sub $0x5,%eax 804900d: 40 inc %eax 804900e: 40 inc %eax 804900f: 48 dec %eax 8049010: 48 dec %eax As we can notice, the objdump output shows the instructions in intel syntax even though the program is written in AT&T syntax. Disassembly of section .text: 08049000 <_start>: 8049000: b8 02 00 00 00 mov eax,0x2 8049005: 83 c0 08 add eax,0x8 8049008: 01 c0 add eax,eax 804900a: 83 e8 05 sub eax,0x5 804900d: 40 inc eax 804900e: 40 inc eax 804900f: 48 dec eax 8049010: 48 dec eax
Assembling and linking
When a program is written in AT&T syntax, it can be compiled and linked as follows using GAS assembler and ld linker. The following excerpt shows the commands on a 64 bit CPU. Similarly, programs written using intel syntax can be compiled and linked using NASM and ld respectively as shown below. The following excerpt shows the commands on a 64 bit CPU. See the next article in the series, x86 basics: Data representation, memory and information storage.
Sources:
X86 Assembly Bootloaders, Wikibooks Assembly Language for x86 Processors, Kip Irvine Modern X86 Assembly Language Programming, Daniel Kusswurm Linux Assembly Language Programming, Bob Neveln