Turinys

Introduction to the hardware

Overview of the 80x86 Family

The 80×86 family was first started in 1981 with the 8086 and the newest member is the Pentium which was released thirteen years later in 1994. They are all backwards compatible with each other but each new generation has added features and more speed than the previous chip. Today there are very few computers in use that have the 8088 and 8086 chips in them as they are very outdated and slow. The number of 286 or 386 based computers around is declining as today's software becomes more and more demanding. Even 486's are being replaced by Pentiums. With the Pentium PRO and the MMX based CPUs Intel keeps increasing performance and features.

Representation of numbers in binary

Before we begin to understand how to program in assembly it is best to try to understand how numbers are represented in computers. Numbers are stored in binary, base two. There are several terms which are used to describe different size numbers and I will describe what these mean.

1 BIT: 0 

One bit is the simplest piece of data that exists. Its either a one or a zero.

1 NIBBLE: 0000 
4 BITS 

The nibble is four bits or half a byte. Note that it has a maximum value of 15 (1111 = 15). This is the basis for the hexadecimal (base 16) number system which is used as it is far easier to understand. Hexadecimal numbers go from 1 to F and are followed by a h to state that the are in hex. i.e. Fh = 15 decimal. Hexadecimal numbers that begin with a letter are prefixed with a 0 (zero).

1 BYTE 00000000 
2 NIBBLES 
8 BITS 

A byte is 8 bits or 2 nibbles. A byte has a maximum value of FFh (255 decimal). Because a byte is 2 nibbles the hexadecimal representation is two hex digits in a row i.e. 3Dh. The byte is also that size of the 8-bit registers which we will be covering later.

1 WORD 0000000000000000 
2 BYTES 
4 NIBBLES 
16 BITS 

A word is two bytes that are stuck together. A word has a maximum value of FFFFh (65,536). Since a word is four nibbles, it is represented by four hex digits. This is the size of the 16-bit registers.

Registers

Registers are a place in the CPU where a number can be stored and manipulated. There are three sizes of registers: 8-bit, 16-bit and on 386 and above 32-bit. There are four different types of registers; general purpose registers, segment registers, index registers and stack registers. Firstly here are descriptions of the main registers. Stack registers and segment registers will be covered later.

General Purpose Registers

These are 16-bit registers. There are four general purpose registers; AX, BX, CX and DX. They are split up into 8-bit registers. AX is split up into AH which contains the high byte and AL which contains the low byte. On 386's and above there are also 32-bit registers, these have the same names as the 16-bit registers but with an 'E' in front i.e. EAX. You can use AL, AH, AX and EAX separately and treat them as separate registers for some tasks.

If AX contained 24689 decimal:

AH AL
01100000 01110001

AH would be 96 and AL would be 113. If you added one to AL it would be 114 and AH would be unchanged. SI, DI, SP and BP can also be used as general purpose registers but have more specific uses. They are not split into two halves.

Index Registers

These are sometimes called pointer registers and they are 16-bit registers. They are mainly used for string instructions. There are three index registers SI (source index), DI (destination index) and IP (instruction pointer). On 386's and above there are also 32-bit index registers: EDI and ESI. You can also use BX to index strings.

IP is a index register but it can't be manipulated directly as it stores the address of the next instruction.

Stack registers

BP and SP are stack registers and are used when dealing with the stack. They will be covered when we talk about the stack later on.

Segments and offsets

The original designers of the 8088 decided that nobody will ever need to use more that one megabyte of memory so they built the chip so it couldn't access above that. The problem is to access a whole megabyte 20 bits are needed. Registers only have 16 bits and they didn't want to use two because that would be 32 bits and they thought that this would be too much for anyone. They came up with what they thought was a clever way to solve this problem: segments and offsets. This is a way to do the addressing with two registers but not 32 bits.

OFFSET = SEGMENT * 16 
SEGMENT = OFFSET / 16 (the lower 4 bits are lost) 

One register contains the segment and another register contains the offset. If you put the two registers together you get a 20-bit address.

SEGMENT 0010010000010000xxxx
OFFSET xxxx0100100000100010
20-bit Address 00101000100100100010

DS stores the Segment and SI stores the offset. As they are both 16 bits long the addresses overlap. This is how DS:SI is used to make a 20 bit address. The segment is in DS and the offset is in SI. The standard notation for a Segment/Offset pair is: SEGMENT:OFFSET

Segment registers are: CS, DS, ES, SS. On the 386+ there are also FS and GS.

Offset registers are: BX, DI, SI, BP, SP, IP. In 386+ protected mode, ANY general register (not a segment register) can be used as an Offset register. (Except IP, which you can't manipulate directly).

If you are now thinking that assembly must be really hard and you don't understand segments and offsets at all then don't worry. I didn't understand them at first but I struggled on and found out that they were not so hard to use in practice.

The Stack

As there are only six registers that are used for most operations, you're probably wondering how do you get around that. It's easy. There is something called a stack which is an area of memory which you can save and restore values to.

This is an area of memory that is like a stack of plates. The last one you put on is the first one that you take off. This is sometimes referred to as Last On First Off (LOFO) or Last In First Out (LIFO).