A common type of UNIX® application is a filter—a program that reads data from the stdin, processes it somehow, then writes the result to stdout.
In this chapter, we shall develop a simple filter, and learn how to read from stdin and write to stdout. This filter will convert each byte of its input into a hexadecimal number followed by a blank space.
%include 'system.inc' section .data hex db '0123456789ABCDEF' buffer db 0, 0, ' ' section .text global _start _start: ; read a byte from stdin push dword 1 push dword buffer push dword stdin sys.read add esp, byte 12 or eax, eax je .done ; convert it to hex movzx eax, byte [buffer] mov edx, eax shr dl, 4 mov dl, [hex+edx] mov [buffer], dl and al, 0Fh mov al, [hex+eax] mov [buffer+1], al ; print it push dword 3 push dword buffer push dword stdout sys.write add esp, byte 12 jmp short _start .done: push dword 0 sys.exit
In the data section we create an array called hex
.
It contains the 16 hexadecimal digits in ascending order. The array is followed by a
buffer which we will use for both input and output. The first two bytes of the buffer are
initially set to 0
. This is where we will write the two
hexadecimal digits (the first byte also is where we will read the input). The third byte
is a space.
The code section consists of four parts: Reading the byte, converting it to a hexadecimal number, writing the result, and eventually exiting the program.
To read the byte, we ask the system to read one byte from stdin, and store it in the first byte of the buffer
. The system returns the number of bytes read in EAX
. This will be 1
while data is
coming, or 0
, when no more input data is available.
Therefore, we check the value of EAX
. If it is 0
, we jump to .done
, otherwise we
continue.
Note: For simplicity sake, we are ignoring the possibility of an error condition at this time.
The hexadecimal conversion reads the byte from the buffer
into EAX
, or actually just
AL
, while clearing the remaining bits of EAX
to zeros. We also copy the byte to EDX
because we need to convert the upper four bits (nibble)
separately from the lower four bits. We store the result in the first two bytes of the
buffer.
Next, we ask the system to write the three bytes of the buffer, i.e., the two hexadecimal digits and the blank space, to stdout. We then jump back to the beginning of the program and process the next byte.
Once there is no more input left, we ask the system to exit our program, returning a zero, which is the traditional value meaning the program was successful.
Go ahead, and save the code in a file named hex.asm, then type the following (the ^D means press the control key and type D while holding the control key down):
% nasm -f elf hex.asm % ld -s -o hex hex.o % ./hex Hello, World! 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A Here I come! 48 65 72 65 20 49 20 63 6F 6D 65 21 0A ^D %
Note: If you are migrating to UNIX from MS-DOS®, you may be wondering why each line ends with
0A
instead of0D 0A
. This is because UNIX does not use the cr/lf convention, but a "new line" convention, which is0A
in hexadecimal.
Can we improve this? Well, for one, it is a bit confusing because once we have
converted a line of text, our input no longer starts at the beginning of the line. We can
modify it to print a new line instead of a space after each 0A
:
%include 'system.inc' section .data hex db '0123456789ABCDEF' buffer db 0, 0, ' ' section .text global _start _start: mov cl, ' ' .loop: ; read a byte from stdin push dword 1 push dword buffer push dword stdin sys.read add esp, byte 12 or eax, eax je .done ; convert it to hex movzx eax, byte [buffer] mov [buffer+2], cl cmp al, 0Ah jne .hex mov [buffer+2], al .hex: mov edx, eax shr dl, 4 mov dl, [hex+edx] mov [buffer], dl and al, 0Fh mov al, [hex+eax] mov [buffer+1], al ; print it push dword 3 push dword buffer push dword stdout sys.write add esp, byte 12 jmp short .loop .done: push dword 0 sys.exit
We have stored the space in the CL
register. We can
do this safely because, unlike Microsoft® Windows®, UNIX system calls
do not modify the value of any register they do not use to return a value in.
That means we only need to set CL
once. We have,
therefore, added a new label .loop
and jump to it for the
next byte instead of jumping at _start
. We have also added
the .hex
label so we can either have a blank space or a new
line as the third byte of the buffer
.
Once you have changed hex.asm to reflect these changes, type:
% nasm -f elf hex.asm % ld -s -o hex hex.o % ./hex Hello, World! 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A Here I come! 48 65 72 65 20 49 20 63 6F 6D 65 21 0A ^D %
That looks better. But this code is quite inefficient! We are making a system call for every single byte twice (once to read it, another time to write the output).