Understanding these details is important as we may be presented with assembly language that is written in one of the available syntaxes depending on the type of operating system and tools we use.  See the first article in the series: What is x86 assembly?

What does x86 assembly look like?

Programmers with experience in high-level languages like Java may find it completely different to write programs in assembly language. Assembly programs typically contain instructions or mnemonics, which look as follows. The preceding excerpt shows a simple hello world program in x86 assembly language. This program prints the string “Hello, world!” and gracefully exits. As we can notice, there are several mov instructions followed by int 0x80 written here to complete the program. We will discuss more technical details about this program later, but the idea is to give a picture of how assembly programs look like.     message db “Hello, world!”, 0x0a     len equ $ – message section .text _start:     mov eax, 4         mov ebx, 1         mov ecx, message       mov edx, len        int 0x80            mov eax, 1          mov ebx, 0         int 0x80

Examples of x86 assembly programming language

Low-level programs such as drivers and boot loaders may be written in assembly. Bootloader is a small piece of software that gets executed when a system boots. Once the bootloader is loaded, it gets an operating system loaded and ready for execution. Depending on the computer design, this process may slightly vary as there can be one or more additional stages in the process of boot loading. The following example shows a bootloader written in assembly language.  The preceding excerpt is taken from https://en.wikibooks.org/wiki/X86_Assembly/Bootloaders and it provides a good example of how real-world software written in assembly may look like. The same link has additional examples and a detailed explanation about the program shown here.   Msg:    db “Hello World! “  EndMsg:  Start:  mov bx, 000Fh   ;Page 0, colour attribute 15 (white) for the int 10 calls below          mov cx, 1       ;We will want to write 1 character          xor dx, dx      ;Start at top left corner          mov ds, dx      ;Ensure ds = 0 (to let us load the message)          cld             ;Ensure direction flag is cleared (for LODSB)  Print:  mov si, Msg     ;Loads the address of the first byte of the message, 7C02h in this case                          ;PC BIOS Interrupt 10 Subfunction 2 – Set cursor position                          ;AH = 2  Char:   mov ah, 2       ;BH = page, DH = row, DL = column          int 10h          lodsb           ;Load a byte of the message into AL.                          ;Remember that DS is 0 and SI holds the                          ;offset of one of the bytes of the message.                          ;PC BIOS Interrupt 10 Subfunction 9 – Write character and colour                          ;AH = 9          mov ah, 9       ;BH = page, AL = character, BL = attribute, CX = character count          int 10h          inc dl          ;Advance cursor          cmp dl, 80      ;Wrap around edge of screen if necessary          jne Skip          xor dl, dl          inc dh          cmp dh, 25      ;Wrap around bottom of screen if necessary          jne Skip          xor dh, dh  Skip:   cmp si, EndMsg  ;If we’re not at end of message,          jne Char        ;continue loading characters          jmp Print       ;otherwise restart from the beginning of the message  times 0200h – 2 – ($ – $$)  db 0    ;Zerofill up to 510 bytes          dw 0AA55h       ;Boot Sector signature  ;OPTIONAL:  ;To zerofill up to the size of a standard 1.44MB, 3.5″ floppy disk  ;times 1474560 – ($ – $$) db 0

Types of syntax used to write x86 assembly

x86 assembly language comes in two syntax flavors. Intel and AT&T. Intel syntax is predominantly used in the Windows family, while AT&T is commonly seen in the UNIX family. We will stick to intel syntax throughout our assembly language journey in this series of articles. However, let us dive into the details of these two syntaxes. Let us begin by going through the following two examples. Sample Code 1: Sample Code 2: _start:   mov    $0x2,%eax   add    $0x8,%eax   add    %eax,%eax   sub    $0x5,%eax   inc    %eax   inc    %eax   dec    %eax   dec    %eax If you closely observe the two examples shown above, they achieve the same outcome but they look different. The first program is written using AT&T syntax and the latter is written using intel syntax. _start:   mov    eax,0x2   add    eax,0x8   add    eax,eax   sub    eax,0x5   inc    eax   inc    eax   dec    eax   dec    eax Let us go through some of the notable differences in these two syntaxes.

When writing programs in AT&T syntax, the first operand in the instruction is the source operand and the second operand is the destination operand. However, in intel syntax, the first operand is the destination operand and the second operand is the source operand. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2. When programs are written using AT&T syntax, the registers use the prefix % while intel syntax does not use any prefix with the registers. Similarly, Intel syntax does not use any prefixes for its immediate operand while AT&T syntax uses $ along with the hexadecimal representation using 0x. Once again the same example we used earlier can explain these differences. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2. In AT&T syntax, all opcodes have a suffix to specify the size. For example, moving an 8-bit value from the register bl to al will need the following instruction in intel syntax: mov al, bl. The same operation in AT&T syntax will be written by specifying the size as a suffix to the opcode, which looks as follows: movb %bl, %al. Notice the opcode movb.

It should be noted that we have only scratched the surface keeping beginner-level readers in mind and there are more differences between these two syntaxes. If it is confusing to read through the assembly program written in one of these syntaxes, it is easy to convert it into the other type. For example, let us assume that a program is written in at&t syntax and using objdump on this program will show the assembly instructions as follows. Clearly, the program is written in AT&T syntax and objdump is shown the same. We can display instructions in intel syntax using objdump as shown below. Disassembly of section .text: 08049000 <_start>:  8049000: b8 02 00 00 00       mov    $0x2,%eax  8049005: 83 c0 08             add    $0x8,%eax  8049008: 01 c0                add    %eax,%eax  804900a: 83 e8 05             sub    $0x5,%eax  804900d: 40                   inc    %eax  804900e: 40                   inc    %eax  804900f: 48                   dec    %eax  8049010: 48                   dec    %eax As we can notice, the objdump output shows the instructions in intel syntax even though the program is written in AT&T syntax. Disassembly of section .text: 08049000 <_start>:  8049000: b8 02 00 00 00       mov    eax,0x2  8049005: 83 c0 08             add    eax,0x8  8049008: 01 c0                add    eax,eax  804900a: 83 e8 05             sub    eax,0x5  804900d: 40                   inc    eax  804900e: 40                   inc    eax  804900f: 48                   dec    eax  8049010: 48                   dec    eax

Assembling and linking

When a program is written in AT&T syntax, it can be compiled and linked as follows using GAS assembler and ld linker. The following excerpt shows the commands on a 64 bit CPU. Similarly, programs written using intel syntax can be compiled and linked using NASM and ld respectively as shown below. The following excerpt shows the commands on a 64 bit CPU. See the next article in the series, x86 basics: Data representation, memory and information storage.

Sources:

X86 Assembly Bootloaders, Wikibooks Assembly Language for x86 Processors, Kip Irvine Modern X86 Assembly Language Programming, Daniel Kusswurm Linux Assembly Language Programming, Bob Neveln