Search (using Google):  Web Karig

 

9 May 2004

Hex dump revisited

If anyone is reading this website regularly, I apologize for the long dry spell. I'll try to post more regularly in the future. Also, I noticed that the "index" link on each screenshot page was broken; I've now fixed that. (So that must have been why the "404" page was one of the most frequently visited pages on this site. Oops.)

I have a new code project and a new screenshot. The code doesn't do anything except prove that my new hex-dump routine works; after it runs, the screen looks like the new screenshot.

Project structure

The latest project is the most complex so far. Most of my projects have been a single boot sector, the source code for which came in a single assembly-language file. This project consists of no fewer than eleven files:

  • make.bat. Running this batch file assembles all the source-code files into one image file and writes the image file onto the floppy disk in Drive A.
  • karig.asm. This includes all the other source-code files:
    • macros.asm. I defined macros that let me use Forth words in my assembly language code.
    • memmap.asm. My definitions for fixed regions of memory go into this file.
    • boot.asm. The boot sector. This includes one or the other of:
      • realexp.asm. I'll keep this file around in case I need to try something in real mode — for example, to find out what information a BIOS routine can provide.
      • protmode.asm. This file sets up both protected mode and the graphic screen.
    • loader.asm. Code that sets up the system but that won't fit into the boot sector goes into this file.
    • screen.asm. This file contains routines needed to print characters to the graphic screen. This includes:
      • font.bin. This is the raw bitmap data for the system font.
    • dump.asm. The code to dump sixteen bytes of memory to the screen is stored here.

As you can see, I've added a lot more code than I normally do between postings, and I need to take some time to explain all of it, because they introduce a number of special features in Karig.

Macros and Korth words

I have written some macros, for use with NASM of course, that simulate some of the words I will define for my Korth compiler (Korth being Karig's own variation on ColorForth). Before listing these, though, I need to explain the data-stack concept.

Most programming languages let you define subroutines (or procedures, or functions, or whatever a given language calls them), so that if a program needs to perform a certain task more than once, the program needs only one copy of the code needed to perform that task, but it can call the code many times — once for each time the task is to be performed. Most programming languages also provide a way to pass data to and get data back from a subroutine — for example, the number of a row of characters on the screen, to a routine that prints a string of text to any row.

Different programming languages pass data to and from subroutines in different ways. Many assembly-language programmers use the general-purpose registers for this. C, C++, and many other languages use the return stack. This means that return addresses and data are mixed on the same stack, and some juggling of the items on the stack is always required whenever a subroutine needs to get at the data or store data to be returned, and whenever the stack has to be purged of any data that is no longer needed. Forth, and languages based on Forth, define a second stack — the data stack — separate from the return stack. In this way, return addresses go on the return stack, and data to be shared between subroutines go on the data stack. The need to juggle return addresses to get at the data, or vice versa, is removed.

A Forth subroutine (or word) thus expects to find its input on the data stack. Because a word may require two or more discrete pieces of data, the word may still need to juggle the items on the stack a little bit, so Forth and its variants define words that the programmer can use to do this. My macros.asm includes eight macros to simulate eight Korth words, each of which moves the items on the data stack in a particular way:

  • _dup ( n -- n n ) duplicates the top item on the stack. You'd do this if you want to do something with the top value, something that will alter the value, when you know you'll need the unaltered value later.
  • _drop ( n -- ) drops (discards) the top item from the stack, making the item underneath the new top item. You'd do this to purge the stack of data you no longer need.
  • _lit ( -- n ) takes a value and pushes that value onto the stack. For example, the statement _lit 0x100 pushes the value 0x100 (256) onto the stack.
  • _over ( m n -- m n m ) duplicates the second item on the stack and pushes the duplicate item onto the stack.
  • _nip ( m n -- n ) drops (discards) the second item from the stack. The item that was on top of the stack before remains at the top.
  • _swap ( m n -- n m ) swaps the top and second items on the stack. The second item ends up on top, and the top item ends up underneath. You'd use this when you still need the top item, but you need to use the second item first.
  • _push ( n -- ) removes the top item from the stack and stored it on the return stack. You can sometimes use this instead of _dup and _swap, but use this with care, because the return stack is where return addresses are stored, and if the computer gets to a return instruction before you remove the item from the return stack, the computer will use the item as a return address and run whatever code is to be found at that address.
  • _pop ( -- n ) removes the top item from the return stack and stores it onto the regular data stack. Use this to retrieve an item you _pushed onto the return stack.

Note the stack diagrams above, after each macro name, for example "( n -- n n )". The letters before the hyphens represent values expected to be on the stack before the macro is called; the letters after, values expected to be on the stack after the macro finishes. The diagram "( n -- )" means that a value "n" is expected to be on the stack but is later dropped; "( -- n )" means that no value is expected on the stack but that one is later added. Two instances of a given letter indicate two instances of the same value, so that "( n -- n n )" indicates that the macro not only leaves the expected value "n" on the stack but also adds a copy of the same value. Different letters indicate different values; the diagram "( n -- m )" would indicate that the value "n" is replaced by the value "m".

I should also mention that I implement the data stack as Chuck Moore does in ColorForth: The top item on the data stack is in EAX, and ESI points to the second item on the data stack. The data stack grows downward in memory, so that when something is pushed onto the stack, I subtract four from the address in ESI, and when an item is removed, I add four. Unlike Chuck Moore, I also use EBX as a spare register to hold data temporarily.

%macro	_dup 0
		; ( n -- n n )
		; Duplicate first item on data stack.
		lea	esi, [esi-4]
		mov	[esi], eax
		%endmacro

%macro	_drop 0
		; ( n -- )
		; Drop first item from data stack.
		lodsd
		%endmacro

%macro	_lit 1
		; ( -- n )
		; Push literal onto data stack.
		_dup
		mov	eax, %1
		%endmacro

%macro	_over 0
		; ( m n -- m n m )
		; Duplicate second item on data stack.
		_dup
		mov	eax, [esi+4]
		%endmacro

%macro	_nip 0
		; ( m n -- n )
		; Drop second item from data stack.
		lea	esi, [esi+4]
		%endmacro

%macro	_swap 0
		; ( m n -- n m )
		; Swap first and second items on data stack.
		mov	ebx, eax
		mov	eax, [esi]
		mov	[esi], ebx
		%endmacro

%macro	_push 0
		; ( n -- ) [ -- n ]
		; Move item from data stack to return stack.
		push	eax
		_drop
		%endmacro

%macro	_pop 0
		; ( -- n ) [ n -- ]
		; Move item from return stack to data stack.
		_dup
		pop	eax
		%endmacro

There are also some "flow control" macros and a lot of "conditional instruction" macros in the macros.asm file, which I thought I might use, but so far I haven't needed any of them, so I won't go over them here. If I don't find any use for these in future projects, I might just remove these from macros.asm in a future project.

Memory map

I moved equates for memory addresses into a separate file, memmap.asm. This file defines the locations of the stacks, of the video buffer into which text characters are drawn, and of the video RAM where the video buffer must be copied to update the screen. (This last is hardcoded for now; therefore this code will not work on all systems. I will eventually need to write setup code to retrieve the actual physical address of the video RAM.)

Boot sector

The boot-sector code now begins with two equates. Both of these govern how the boot sector is assembled.

%define BOOTING_FROM_FLOPPY     1
%define STARTING_PROTECTED_MODE 1

If BOOTING_FROM_FLOPPY is nonzero, then the code to load the rest of the system from the floppy disk is assembled. (If it is zero, then no code is assembled to load anything because I haven't written code to load the system from the hard disk yet.)

The boot sector includes either of two files — realexp.asm or protmode.asm. If STARTING_PROTECTED_MODE is zero, then realexp.asm is assembled in, and the boot sector never sets up protected mode or displays the graphic screen. If STARTING_PROTECTED_MODE is nonzero, then protmode.asm is assembled in, and the boot sector starts protected mode and displays a white screen in graphics mode.

The boot sector does its work in two stages. Boot_stage_1 moves the boot sector to another address in memory and continues execution from there.

[ORG 0x0800]
; (Boot sector is loaded at 0x7C00 but moves itself.)

[BITS 16]

boot_stage_1: ; CODE EXECUTED FROM ADDRESS 0x7C00

; ------ Set up real-mode segment registers.
		xor	ax, ax
		mov	ds, ax
		mov	es, ax
		mov	fs, ax
		mov	gs, ax

; ------ Set up real-mode call stack.
		cli
		mov	ss, ax
		mov	sp, 0x0800
		mov	di, sp
		sti

; ------ Move this boot sector lower in memory.
		cld
		mov	cx, 256
		mov	si, 0x7C00
		rep	movsw

; ------ Jump to new location.
		jmp	0:0x0800 + (boot_stage_2 - boot_stage_1)

Boot_stage_2 loads the rest of the system (actually just 9KB) and may or may not start protected mode (depending on the values of the two equates above).

boot_stage_2: ; CODE EXECUTED FROM ADDRESS 0x0800

; System loader is fixed at 9KB (18 sectors), the size of a single track on
; a floppy disk.

%if BOOTING_FROM_FLOPPY

		mov	ax, 0x0200 + 17  ; function 2 -- read 17 sectors
		mov	bx, 0x0800 + 512 ; buffer follows boot sector
		mov	cx, 2            ; cylinder 0 (CH), sector 2 (CL)
		xor	dx, dx           ; head 0 (DH), drive 0 (DL)
		int	0x13
		; Check for errors.

%else

		; Get partition data first!

%endif

%if STARTING_PROTECTED_MODE
		%include "protmode.asm"
%else
		%include "realexp.asm"
%endif

; ------ (Required to make this a boot sector.)
		times	508 - ($-$$) db 0
		jmp	short $+4
		db	0x55, 0xAA

Experiments in real mode

If I want to try something in real mode (usually involving the BIOS), I set STARTING_PROTECTED_MODE to zero and put my test code into realexp.asm. This file contains the code I've already written to dump memory to the screen in real mode (see the original hex-dump entries — Part 1 and Part 2). I'll need to use this file to discover more about how to write startup code to get the BIOS to reveal more about the hardware. (Of course the file cannot get too large because the code assembled from this file has to fit within the boot sector.)

Starting protected mode

The file protmode.asm contains boot-sector code to set up protected mode and the graphic display. First, it sits and waits for three seconds before continuing, to give the floppy disk controller time to finish before turning off interrupts.

; ------ Delay for three seconds.
;        (Gives floppy controller time to finish BEFORE
;        we clear interrupts and enter protected mode.)
		xor	cx, cx
		xor	dx, dx
		mov	ah, 1
		int	0x1A     ; set system timer to zero

	.zz:	xor	ah, ah
		int	0x1A
		cmp	dx, 18*3 ; Timer ticks 18.2 times a second.
		jl	.zz

Then I enable the graphic display. Note that I just activate VESA mode 0x111 here — I assume that the machine offers mode 0x111, that it offers a linear framebuffer (a single contiguous stretch of video memory for the entire screen), and that I already know the physical address of the linear framebuffer (0xE0000000). Proper setup code would check this more carefully, or at least be prepared to print an error message if I can't use the video mode I want.

; ------ Enable graphic screen: 800x600, 64K colors
		mov	bx, 0x4111 ; mode 0x111, linear, clear memory
		mov	ax, 0x4F02
		int	0x10

Now I enable the A20 line (so that I can place my screen buffer in high memory — at address 0x200000). Note that I disable interrupts and do not re-enable them. I have no interrupt handlers set up, and I have no further need for the BIOS routines from this point on.

; ------ Enable A20 line. (Method used in ColorForth)
		cli
		in	al, 0x70
		or	al, 0x80
		out	0x70, al

		mov	al, 0xD1
		out	0x64, al
	.20:	in	al, 0x64
		and	al, 2
		jnz	.20
		mov	al, 0x4B
		out	0x60, al
		; No "sti" here.

Now I can enter protected mode.

; ------ Load GDT and enter protected mode.
		lgdt	[gdt]
		mov	eax, cr0
		or	al, 1
		mov	cr0, eax
		jmp	dword 8:pmstart
[BITS 32]
pmstart:
		mov	eax, dseg-gdt
		mov	ds, eax
		mov	es, eax
		mov	fs, eax
		mov	gs, eax
		mov	ss, eax

Now that I am running 32-bit code, I can set up my 32-bit Korth registers and jump to the next stage in setting up the system. Note that I am setting up two stacks here: ESP points to gods, the return stack; ESI points to godd, the data stack. (The "god" prefix refers to the "graphic output display" task. Karig will take after ColorForth and run two tasks — the "main" task and the "god" or display task — and the first task to run will be the "god" task. )

; ------ Set up Korth machine: two call stacks, two data stacks.
		mov	esp, gods
		mov	esi, godd

; ------ Finish setting up the system.
		jmp	loader ; defined in memmap.asm

Loader: experiments in protected mode

Any experimental code to try things out and display the results on the graphic display will go into the file loader.asm for now. Code in this file will be assembled outside the boot sector, so I am free to write a relatively large amount of code here.

This file contains the code needed to test the new memory-dump code I wrote to work with the graphic display. I'll get back to this test code later.

Screen font

I'm still using the same font I introduced in the entry "Making a font." The font data is stored in the file font.bin, which screen.asm imports.

I'm also still using the same font-printing routines I introduced in "Testing the font."

screen.asm

The screen.asm file represents my first attempt at defining an API (application programmer's interface) for Karig — that is, a set of routines that a programmer can call on when writing his own code to run on Karig. This implies a distinction between routines written with that programmer in mind, and routines that the programmer should not touch because they are part of the inner workings of Karig itself. In other words, an API implies a distinction between "public" routines and "private" ones.

I wanted a way to distinguish "public" or API routines in a file from "private" routines that should never be called except by other routines in the same file. NASM does not provide "public" and "private" namespaces like, say, C++. So I can't have the assembler help me to ensure that a label is never referenced from another source-code file. However, NASM does offer global and local labels, and it allows the dollar sign ("$") to be part of a label. I decided to mark "private" labels with a dollar sign.

A source-code file with both "public" or API routines and "private" routines would be arranged with the API routines near the top of the file, and the other routines near the bottom. Between the API routines and the other routines would be a global label derived from the file's name and ending with a dollar sign — for example, "scr$" in screen.asm, or "dump$" in dump.asm. Private routines would all be placed below this global label, and each private routine would be marked with a local label. For example, the routine to print a single character is marked with the local label ".glyph" — so that a nonprivate routine has to access this routine by executing call scr$.glyph. Thus the very name of the routine is a reminder that the routine should not be called except from within screen.asm, the file in which the routine is defined. This should help me to keep the code from getting entangled as I continue to add code and features to Karig.

The first two API routines in screen.asm are:

  • cls ( -- ) clears the screen (fills the video buffer with white pixels).
  • refresh ( -- ) refreshes the screen (copies the video buffer into the video RAM).

T — the text register

This is new. I reserve eighty bytes of memory for what I call the text register, or just "T" for short. It has room for eighty characters, because it stands in for a row of characters on the screen. To print to the screen, you clear out the contents of T, append characters to whatever is in T already, and then call tprint to print the text to a particular row on the screen. I have no provision for printing to a specific column on the screen, nor do I plan to add such a provision; I wanted the API here to be as simple as possible.

I will probably add a tcopy routine later so that the contents of T can be copied to a buffer in memory somewhere. This would allow me to use T as the functional equivalent of "standard output" on POSIX-compatible systems such as Unix — the contents of T would have somewhere to go other than straight to the screen.

The basic routines to clear T, add text to T, and print the contents of T to the screen are in screen.asm.

  • tsize ( -- n ) Returns the capacity of T in bytes. This is currently fixed at 80 — the number of characters that can be printed on one row on the screen.
  • tcount ( -- n ) Returns the number of characters in T.
  • tclear ( -- ) Sets the number of characters in T to zero and fills T with space characters.
  • tapps ( p s -- ) Copies a string (at address p, of size s) into T. If there is not enough room left in T for "s" characters, tapps returns an error (the carry is set) and no characters are copied into T.
  • tapp1 ( c -- ) Copies a single character (in AL, i.e., the first byte in the top item on the stack) into T. If T is full, tapp1 returns an error (the carry is set) and the character is not copied into T.
  • tapp2 ( c -- ) Copies two characters (in AX, i.e., the first two bytes in the top item on the stack) into T. If there is not enough room left in T for two more characters, tapp2 returns an error (the carry is set) and the content of T is unchanged.
  • tapp4 ( c -- ) Copies four characters (in EAX, i.e., the top item on the stack) into T. If there is not enough room left in T for four more characters, tapp4 returns an error (the carry is set) and the content of T is unchanged.
  • tprint ( r -- ) Prints the contents of T to the video buffer at the given row. If the row number is not in the range 0 to 29, then the video buffer is unchanged, and tprint exits with an error (the carry is set).
  • tfit ( s -- ) Simply returns with an error (sets the carry flag) if there is not enough room in T for "s" more characters.

The new dump routine

I won't go over all the code in dump.asm here. I'll just mention the one API routine in the file:

  • dump16 ( r p -- r+1 p+16 ) Clears T, then dumps the contents of sixteen bytes into T (starting at address p), then prints the result to row r. Adds one to the row number and sixteen to the address, so that dump16 can be called repeatedly (up to 30 times) without changing the stack contents.

Test code

The code that verifies that my new dump code works is in loader.asm. It simply clears the screen, passes the address of some sample text to the new dump routine, prints eight lines to the video buffer, and refreshes the screen:

		call	cls
		_lit	0
		_lit	sampletext
		call	dump16
		call	dump16
		call	dump16
		call	dump16
		call	dump16
		call	dump16
		call	dump16
		call	dump16
		call	refresh

; ------ Halt computer.
		jmp	short $

sampletext:
		db	"This is a sample of text. "
		db	"The quick brown fox jumps over the lazy dog. "
		db	"PACK MY BOX WITH FIVE DOZEN LIQUOR JUGS. "

		dd	0x01234567, 0x89ABCDEF
		db	0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF

The result is the screenshot I mentioned at the top of this page.

To do

The code in this project makes a lot of assumptions about the display. The code often assumes that the screen has 640x480 resolution, that each color on the screen needs two bytes (16 bits), that each character is always eight pixels wide and sixteen pixels high, that the screen always contains thirty rows and eighty columns, and that the physical address for the video RAM (or linear framebuffer) is 0xE0000000. Therefore the code in this project won't work on some computers. I need to go over this code again at some point and rewrite the code so that it can work in VESA video modes other than 0x111 (640x480, 16-bit color).

I need to revamp my setup code so that it does a little more exploring of the hardware. It doesn't even check to ensure that there is more than one megabyte of RAM; it just assumes that the RAM is there.

There are also a number of vulnerabilities in the code. For example, dump16 doesn't yet verify that the row number you pass to it is OK (ideally, it should check that the row number is OK, and if it isn't, leave T and the video buffer unchanged).

Check the index for other entries.