CHAPTER 1

Language Components of MASM

Programming with MASM requires that you understand the MASM concepts of reserved words, identifiers, predefined symbols, constants, expressions, operators, data types, registers, and statements. This section defines important terms and provides lists that summarize these topics. For detailed information, see Help or the Reference.

Reserved Words

A reserved word has a special meaning fixed by the language. You can use it only under certain conditions. Reserved words in MASM include:

Instructions, which correspond to operations the processor can execute.

Directives, which give commands to the assembler.

Attributes, which provide a value for a field, such as segment alignment.

Operators, which are used in expressions.

Predefined symbols, which return information to your program.

MASM reserved words are not case sensitive except for predefined symbols (see "Predefined Symbols," later in this chapter).

The assembler generates an error if you use a reserved word as a variable, code label, or other identifier within your source code. However, if you need to use a reserved word for another purpose, the OPTION NOKEYWORD directive can selectively disable a word’s status as a reserved word.

For example, to remove the STR instruction, the MASK operator, and the NAME directive from the set of words MASM recognizes as reserved, use this statement in the code segment of your program before the first reference to STR, MASK, or NAME:

OPTION NOKEYWORD:<STR MASK NAME>

The section "Using the OPTION Directive," later in this chapter, discusses the OPTION directive. Appendix D provides a complete list of MASM reserved words.

With the /Zm command-line option or OPTION M510 in effect, MASM does not reserve any operators or instructions that do not apply to the current CPU mode. For example, you can use the symbol ENTER when assembling under the default CPU mode but not under .286 mode, since the 80186/486 processors recognize ENTER as an instruction. The USE32, FLAT, FAR32, and NEAR32 segment types and the 80386/486 register names are not keywords with processors other than the 80386/486.

Identifiers

An identifier is a name that you invent and attach to a definition. Identifiers can be symbols representing variables, constants, procedure names, code labels, segment names, and user-defined data types such as structures, unions, records, and types defined with TYPEDEF. Identifiers longer than 247 characters generate an error.

Certain restrictions limit the names you can use for identifiers. Follow these rules to define a name for an identifier:

The first character of the identifier can be an alphabetic character (A-Z) or any of these four characters: @ _ $ ?

The other characters in the identifier can be any of the characters listed above or a decimal digit (0-9).

Avoid starting an identifier with the at sign (@), because MASM 6.1 predefines some special symbols starting with @ (see "Predefined Symbols," following). Beginning an identifier with hidden@ may also cause conflicts with future versions of the Macro Assembler.

The symbol - and thus the identifier - is visible as long as it remains within scope. (For more information about visibility and scope, see "Sharing Symbols with Include Files" in Chapter 8.)

Predefined Symbols

The assembler includes a number of predefined symbols (also called predefined equates). You can use these symbol names at any point in your code to represent the equate value. For example, the predefined equate @FileName represents the base name of the current file. If the current source file is TASK.ASM, the value of @FileName is TASK. The MASM predefined symbols are listed according to the kinds of information they provide. Case is important only if the /Cp option is used. (For additional details, see Help on ML command-line options.)

The predefined symbols for segment information include:
Symbol	Description

@code	Returns the name of the code segment.
@CodeSize	Returns an integer representing the default code distance.
@CurSeg	Returns the name of the current segment.
@data	Expands to DGROUP.
@DataSize	Returns an integer representing the default data distance.
@fardata	Returns the name of the segment defined by the .FARDATA directive.
@fardata?	Returns the name of the segment defined by the .FARDATA? directive.
@Model	Returns the selected memory model.
@stack	Expands to DGROUP for near stacks or STACK for far stacks. (See "Creating a Stack" in Chapter 2.)
@WordSize	Provides the size attribute of the current segment.
The predefined symbols for environment information include:
Symbol	Description

@Cpu	Contains a bit mask specifying the processor mode.
@Environ	Returns values of environment variables during assembly.
@Interface	Contains information about the language parameters.
@Version	Represents the text equivalent of the MASM version number. In MASM 6.1, this expands to 610.
The predefined symbols for date and time information include:
Symbol	Description

@Date	Supplies the current system date during assembly.
@Time	Supplies the current system time during assembly.
The predefined symbols for file information include:
Symbol	Description

@FileCur	Names the current file (base and suffix).
@FileName	Names the base name of the main file being assembled as it appears on the command line.
@Line	Gives the source line number in the current file.
The predefined symbols for macro string manipulation include:
Symbol	Description

@CatStr	Returns concatenation of two strings.
@InStr	Returns the starting position of a string within another string.
@SizeStr	Returns the length of a given string.
@SubStr	Returns substring from a given string.

Integer Constants and Constant Expressions

An integer constant is a series of one or more numerals followed by an optional radix specifier. For example, in these statements

mov ax, 25
mov bx, 0B3h

the numbers 25 and 0B3h are integer constants. The h appended to 0B3 is a radix specifier. The specifiers are:

y for binary (or b if the default radix is not hexadecimal)

o or q for octal

t for decimal (or d if the default radix is not hexadecimal)

h for hexadecimal

Radix specifiers can be either uppercase or lowercase letters; sample code in this book is in lowercase. If you do not specify a radix, the assembler interprets the integer according to the current radix. The default radix is decimal, but you can change the default with the .RADIX directive.

Hexadecimal numbers must always start with a decimal digit (0-9). If necessary, add a leading zero to distinguish between symbols and hexadecimal numbers that start with a letter. For example, MASM interprets ABCh as an identifier. The hexadecimal digits A through F can be either uppercase or lowercase letters. Sample code in this book is in uppercase letters.

Constant expressions contain integer constants and (optionally) operators such as shift, logical, and arithmetic operators. The assembler evaluates constant expressions at assembly time. (In addition to constants, expressions can contain labels, types, registers, and their attributes.) Constant expressions do not change value during program execution.

Symbolic Integer Constants

You can define symbolic integer constants with either of the data assignment directives, EQU or the equal sign (=). These directives assign values to symbols during assembly, not during program execution. Symbolic constants are used to assign names to constant values. You can use a symbol with an assigned value in place of an immediate operand. For example, instead of referring in your code to keyboard scan codes with numbers such as 30 or 48, you can create more recognizable symbols:

SCAN_A EQU 30
SCAN_B EQU 48

then use the appropriate symbol in your program rather than the number. Using symbolic constants instead of undescriptive numbers makes your code more readable and easier to maintain. The assembler does not allocate data storage when you use either EQU or =. It simply replaces each occurrence of the symbol with the value of the expression.

The directives EQU and = have slightly different purposes. Integers defined with the = directive can be redefined with another value in your source code, but those defined with EQU cannot. Once you’ve defined a symbolic constant with the EQU directive, attempting to redefine it generates an error. The syntax is:

symbol EQU expression

The symbol is a unique name of your choice, except for words reserved by MASM. The expression can be an integer, a constant expression, a one- or two-character string constant (four-character on the 80386/486), or an expression that evaluates to an address. Symbolic constants let you change a constant value used throughout your source code by merely altering expression in the definition. This removes the potential for error and saves you the inconvenience of having to find and replace each occurrence of the constant in your program.

The following example shows the correct use of EQU to define symbolic integers.

column EQU 80 ; Constant - 80
row EQU 25 ; Constant - 25
screen EQU column * row ; Constant - 2000
line EQU row ; Constant - 25

.DATA

.CODE
.
.
.
mov cx, column
mov bx, line

The value of a symbol defined with the = directive can be different at different places in the source code. However, a constant value is assigned during assembly for each use, and that value does not change at run time.

The syntax for the = directive is:

symbol = expression

Size of Constants

The default word size for MASM 6.1 expressions is 32 bits. This behavior can be modified using OPTION EXPR16 or OPTION M510. Both of these options set the expression word size to 16 bits, but OPTION M510 affects other assembler behavior as well (see Appendix A).

It is illegal to change the expression word size once it has been set with OPTION M510, OPTION EXPR16, or OPTION EXPR32. However, you can repeat the same directive in your source code as often as you wish. You can place the same directive in every include file, for example.

Operators

Operators are used in expressions. The value of the expression is determined at assembly time and does not change when the program runs.

Operators should not be confused with processor instructions. The reserved
word ADD is an instruction; the plus sign (+) is an operator. For example, Amount+2 illustrates a valid use of the plus operator (+). It tells the assembler to add 2 to the constant value Amount, which might be a value or an address. Contrast this operation, which occurs at assembly time, with the processor’s ADD instruction. ADD tells the processor at run time to add two numbers and store the result.

The assembler evaluates expressions that contain more than one operator according to the following rules:

Operations in parentheses are performed before adjacent operations.

Binary operations of highest precedence are performed first.

Operations of equal precedence are performed from left to right.

Unary operations of equal precedence are performed right to left.

Table 1.3 lists the order of precedence for all operators. Operators on the same line have equal precedence.

Table 1.3 Operator Precedence

Precedence	Operators

1	( ), [ ]
2	LENGTH, SIZE, WIDTH, MASK, LENGTHOF, SIZEOF
3	. (structure-field-name operator)
4	: (segment-override operator), PTR
5	LROFFSET, OFFSET, SEG, THIS, TYPE
6	HIGH, HIGHWORD,LOW, LOWWORD
7	+ ,- (unary)
8	, /, MOD, SHL, SHR*
9	+, - (binary)
10	EQ, NE, LT, LE, GT, GE
11	NOT
12	AND
13	OR, XOR
14	OPATTR,SHORT,.TYPE

Data Types

A "data type" describes a set of values. A variable of a given type can have any of a set of values within the range specified for that type.

The intrinsic types for MASM 6.1 are BYTE, SBYTE, WORD, SWORD, DWORD, SDWORD, FWORD, QWORD, and TBYTE. These types define integers and binary coded decimals (BCDs), as discussed in Chapter 6. The signed data types SBYTE, SWORD, and SDWORD work in conjunction with directives such as INVOKE (for calling procedures) and .IF (introduced in Chapter 7). The REAL4, REAL8, and REAL10 directives define floating-point types. (See Chapter 6.)

Versions of MASM prior to 6.0 had separate directives for types and initializers. For example, BYTE is a type and DB is the corresponding initializer. The distinction does not apply in MASM 6.1. You can use any type (intrinsic or user-defined) as an initializer.

MASM does not have specific types for arrays and strings. However, you can treat a sequence of data units as arrays, and character or byte sequences as strings. (See "Arrays and Strings" in Chapter 5.)

Types can also have attributes such as langtype and distance (NEAR and FAR). For information on these attributes, see "Declaring Parameters with the PROC Directive" in Chapter 7.

You can also define your own types with STRUCT, UNION, and RECORD. The types have fields that contain string or numeric data, or records that contain bits. These data types are similar to the user-defined data types in high-level languages such as C, Pascal, and FORTRAN. (See Chapter 5, "Defining and Using Complex Data Types.")

You can define new types, including pointer types, with the TYPEDEF directive. TYPEDEF assigns a qualifiedtype (explained in the following) to a typename of your choice. This lets you build new types with descriptive names of your choosing, making your programs more readable. For example, the following statement makes the symbol CHAR a synonym for the intrinsic type BYTE:

CHAR TYPEDEF BYTE

The qualifiedtype is any type or pointer to a type of the form:

[[distance]] PTR [[qualifiedtype]]

where distance is NEAR, FAR, or any distance modifier. (For more information on distance, see "Declaring Parameters with the PROC Directive" in Chapter 7.)

The qualifiedtype can also be any type previously defined with TYPEDEF. For example, if you use TYPEDEF to create an alias for BYTE - say, CHAR as in the preceding example - you can use CHAR as a qualifiedtype when defining the pointer type PCHAR, like this:

CHAR TYPEDEF BYTE
PCHAR TYPEDEF PTR CHAR

The typename CHAR in the first line becomes a qualifiedtype in the second line. Use of the TYPEDEF directive to define pointers is explained in "Accessing Data with Pointers and Addresses" in Chapter 3.

Since distance and qualifiedtype are optional syntax elements, you can use variables of type PTR or FAR PTR. You can also define procedure prototypes with qualifiedtype. For more information about procedure prototypes, see "Declaring Procedure Prototypes" in Chapter 7.

These rules govern the use of qualifiedtype:

The only component of a qualifiedtype definition that can be forward-
referenced is a structure or union type identifier.

If you do not specify distance, the assembler assumes a distance that corresponds to the memory model. The assumed distance is NEAR for tiny, small, and medium models, and FAR for other models.

If you do not specify a memory model with .MODEL, the assembler assumes SMALL model (and therefore NEAR pointers).

You can use a qualifiedtype in seven places:

Use	Example

In procedure arguments	proc1 PROC pMsg:PTR BYTE
In prototype arguments	proc2 PROTO pMsg:FAR PTR WORD
With local variables declared inside procedures	LOCAL pMsg:PTR
With the LABEL directive	TempMsg LABEL PTR WORD
With the EXTERN and EXTERNDEF directives	EXTERN pMsg:FAR PTR BYTE EXTERNDEF MyProc:PROTO
With the COMM directive	COMM var1:WORD:3
With the TYPEDEF directive	PBYTE TYPEDEF PTR BYTE PFUNC TYPEDEF PROTO MyProc

"Defining Pointer Types with TYPEDEF" in Chapter 3 shows ways to write a TYPEDEF type for a qualifiedtype. Attributes such as NEAR and FAR can also apply to a qualifiedtype.

You can determine an accurate definition for TYPEDEF and qualifiedtype from the BNF grammar definitions given in Appendix B. The BNF grammar defines each component of the syntax for any directive, showing the recursive properties of components such as qualifiedtype.

Registers

The 8086 family of processors have the same base set of 16-bit registers. Each processor can treat certain registers as two separate 8-bit registers. The 80386/486 processors have extended 32-bit registers. To maintain compatibility with their predecessors, 80386/486 processors can access their registers as 16-bit or, where appropriate, as 8-bit values.

Figure 1.3 shows the registers common to all the 8086-based processors. Each register has its own special uses and limitations.

Figure 1.3 Registers for 8088 - 80286 Processors

80386/486 Only

The 80386/486 processors use the same 8-bit and 16-bit registers used by the rest of the 8086 family. All of these registers can be further extended to 32 bits, except segment registers, which always occupy 16 bits. The extended register names begin with the letter "E." For example, the 32-bit extension of AX is EAX. The 80386/486 processors have two additional segment registers, FS and GS. Figure 1.4 shows the extended registers of the 80386/486.

Figure 1.4 Extended Registers for the 80386/486 Processors

Segment Registers

At run time, all addresses are relative to one of four segment registers: CS, DS, SS, or ES. (The 80386/486 processors add two more: FS and GS.) These registers, their segments, and their purposes include:

Register and Segment Purpose

CS (Code Segment) Contains processor instructions and their immediate operands.

DS (Data Segment) Normally contains data allocated by the program.

SS (Stack Segment) Contains the program stack for use by PUSH, POP, CALL, and RET.

Register and Segment Purpose

ES (Extra Segment) References secondary data segment. Used by string instructions.

FS, GS Provides extra segments on the 80386/486.

General-Purpose Registers

The AX, DX, CX, BX, BP, DI, and SI registers are 16-bit general-purpose registers, used for temporary data storage. Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers.

The 8086-based processors do not perform memory-to-memory operations. For example, the processor cannot directly copy a variable from one location in memory to another. You must first copy from memory to a register, then from the register to the new memory location. Similarly, to add two variables in memory, you must first copy one variable to a register, then add the contents of the register to the other variable in memory.

The processor can access four of the general registers - AX, DX, CX, and BX - either as two 8-bit registers or as a single 16-bit register. The AH, DH, CH, and BH registers represent the high-order 8 bits of the corresponding registers. Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.

The 80386/486 processors can extend all the general registers to 32 bits, though as Figure 1.4 shows, you cannot treat the upper 16 bits as a separate register as you can the lower 16 bits. To use EAX as an example, you can directly reference the low byte as AL, the next lowest byte as AH, and the low word as AX. To access the high word of EAX, however, you must first shift the upper 16 bits into the lower 16 bits.

Special-Purpose Registers

The 8086 family of processors has two additional registers, SP and IP, whose values are changed automatically by the processor.

SP (Stack Pointer)

The SP register points to the current location within the stack segment. Pushing a value onto the stack decreases the value of SP by two; popping from the stack increases the value of SP by two. Thirty-two-bit operands on 80386/486 processors increase or decrease SP by four instead of two. The CALL and INT instructions store the return address on the stack and reduce SP accordingly. Return instructions retrieve the stored address from the stack and reset SP to its value before the call. SP can also be adjusted with instructions such as ADD. The program stack is described in detail in Chapter 3.

IP (Instruction Pointer)

The IP register always contains the address of the next instruction to be executed. You cannot directly access or change the instruction pointer. However, instructions that control program flow (such as calls, jumps, loops, and interrupts) automatically change the instruction pointer.

Flags Register

The 16 bits in the flags register control the execution of certain instructions and reflect the current status of the processor. In 80386/486 processors, the flags register is extended to 32 bits. Some bits are undefined, so there are actually 9 flags for real mode, 11 flags (including a 2-bit flag) for 80286 protected mode, 13 for the 80386, and 14 for the 80486. The extended flags register of the 80386/486 is sometimes called "Eflags."

Figure 1.5 shows the bits of the 32-bit flags register for the 80386/486. Earlier 8086-family processors use only the lower word. The unmarked bits are reserved for processor use, and should not be modified.

Figure 1.5 Flags for 8088-80486 Processors

In the following descriptions and throughout this book, "set" means a bit value of 1, and "cleared" means the bit value is 0. The nine flags common to all 8086-family processors, starting with the low-order flags, include:

Flag	Description

Carry	Set if an operation generates a carry to or a borrow from a destination operand.
Parity	Set if the low-order bits of the result of an operation contain an even number of set bits.
Auxiliary Carry	Set if an operation generates a carry to or a borrow from the low-order 4 bits of an operand. This flag is used for binary coded decimal (BCD) arithmetic.
Zero	Set if the result of an operation is 0.
Sign	Equal to the high-order bit of the result of an operation (0 is positive, 1 is negative).
Trap	If set, the processor generates a single-step interrupt after each instruction. A debugging program can use this feature to execute a program one instruction at a time.
Interrupt Enable	If set, interrupts are recognized and acted on as they are received. The bit can be cleared to turn off interrupt processing temporarily.
Direction	If set, string operations process down from high addresses to low addresses. If cleared, string operations process up from low addresses to high addresses.
Overflow	Set if the result of an operation is too large or small to fit in the destination operand.

Although all flags serve a purpose, most programs require only the carry, zero, sign, and direction flags.

Statements

Statements are the line-by-line components of source files. Each MASM statement specifies an instruction or directive for the assembler. Statements have up to four fields, as shown here:

[[name:]] [[operation]] [[operands]] [[;comment]]

The following list explains each field:

Field	Purpose

name	Labels the statement, so that instructions elsewhere in the program can refer to the statement by name. The name field can label a variable, type, segment, or code location.
operation	Defines the action of the statement. This field contains either an instruction or an assembler directive.
operands	Lists one or more items on which the instruction or directive operates.
comment	Provides a comment for the programmer. Comments are for documentation only; they are ignored by the assembler.

The following line contains all four fields:

mainlp: mov ax, 7 ; Load AX with the value 7

Here, mainlp is the label, mov is the operation, and ax and 7 are the operands, separated by a comma. The comment follows the semicolon.

All fields are optional, although certain directives and instructions require an entry in the name or operand field. Some instructions and directives place restrictions on the choice of operands. By default, MASM is not case sensitive.

Each field (except the comment field) must be separated from other fields by white-space characters (spaces or tabs). MASM also requires code labels to be followed by a colon, operands to be separated by commas, and comments to be preceded by a semicolon.

A logical line can contain up to 512 characters and occupy one or more physical lines. To extend a logical line into two or more physical lines, put the backslash character (\) as the last non-whitespace character before the comment or end of the line. You can place a comment after the backslash as shown in this example:

.IF (x > 0) \ ; X must be positive
&& (ax > x) \ ; Result from function must be > x
&& (cx == 0) ; Check loop counter, too
mov dx, 20h
.ENDIF

Multiline comments can also be specified with the COMMENT directive. The assembler ignores all text and code between the delimiters or on the same line as the delimiters. This example illustrates the use of COMMENT.

COMMENT ^ The assembler
ignores this text
^ mov ax, 1 and this code

previous | start | next

Questions:

file: /Techref/language/masm/ch1/slide3.htm, 42KB, , updated: 2008/12/6 18:29, local time: 2025/8/6 03:36,

^{216.73.216.82,10-2-164-33:LOG IN}

©2025 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?

Please DO link to this page! Digg it! / MAKE!

<A HREF="http://massmind.org/techref/language/masm/ch1/slide3.htm"> CHAPTER 1</A>

Did you find what you needed?

Language Components of MASM

Welcome to massmind.org!

Welcome to massmind.org!