Search (using Google):  Web Karig

 

31 January 2004

A tiny compiler, part 5

I'm still working on my prototype compiler. I've created a keyboard loop, a routine to add bytes to the code space, and a framework for passing words to the compiler routines. Now I'm writing and testing another routine the compiler will need.

The "compiler routines" will be compile, define, execute, and hex. (I already added these as do-nothing routines; I'll flesh these out later.) The hex routine expects that the (magenta) word that the user has just typed is a number in hexadecimal format, so hex needs code to convert the two characters into a byte value. This is what hex_to_byte does.

Code

As usual, I've prepared boot-sector source code that you can try out.

hex_to_byte

The hex_to_byte routine has to take the two characters in newword and convert them into a single byte in AL.

The routine caches the two characters in BX and starts to work on the first (left) character. (Remember that newword stores the new word backwards, so the first character is in the high byte — in BH.)

hex_to_byte:
		mov	bx, [newword]
		mov	al, bh

The routine passes AL to a private subroutine, to convert the hexadecimal numeral into a binary value between 0 and 15. The character being converted was the first of the two, representing the high four bits of the byte value, so the resulting binary value is shifted into the high four bits of AL, and the result is stashed in AH for use later.

		call	.1
		shl	al, 4
		mov	ah, al

The routine then retrieves the second of the two characters (representing the low four bits of the byte value), converts the character, and combines the result with the high four bits stashed in AH. The byte value is now in AL, so the job is finished.

		mov	al, bl
		call	.1
		or	al, ah
		ret

Here is that private subroutine that hex_to_byte uses. It expects a hexadecimal character in AL. If the character is a numeral, then I simply subtract from it the value for the numeral zero and return.

	.1:	cmp	al, '9'
		ja	.2
		sub	al, '0'
		ret

If the character is a letter, then I mask out bit 5, which has no effect on an uppercase letter but will convert a lowercase letter into an uppercase one.

	.2	and	al, 0x5F

Then I can convert the letter. 'A' becomes 0x0A, 'B' becomes 0x0B, and so on.

		sub	al, ('A'-10)
		ret

Test code

The test code comes with a list of sixteen newwords, which are copied one at a time into newword. The hex_to_byte routine converts each newword into a byte value, which is stored into a sixteen-byte buffer at address 0x8000. After all sixteen bytes are stored, dump_16 displays the contents of the buffer.

		mov	bx, 15
	.1:	push	bx
		add	bx, bx
		mov	ax, [newwords+bx]
		mov	[newword], ax
		call	hex_to_byte
		pop	bx
		mov	[0x8000+bx], al
		dec	bx
		jns	.1

		mov	bx, 0x8000
		call	dump_16

		jmp	short $

Results

The newwords look like this:

newwords:
		db	'01','23','45','67','89','AB','CD','EF'
		db	'65','74','83','92','a1','b0','cf','de'

Each of these newwords is stored with bytes reversed, as they would be if they had been typed in — that is, '01' is what newword would be if you typed in 10, and so on — so the buffer into which the generated byte values are stored looks like this:

0000:8000: 10 23 54 76 98 BA DC FE 56 47 38 29 1A 0B FC ED | >2Tv....VG8)....

Still to come

I still have to do the following:

  • Create a data stack, so that words can pass data to one another. In addition, the BIOS routines tend to trash AX and SI, critical parts of the data stack, so I must also write code to save and restore these registers.

  • Finish the define, compile, execute, and hex routines.

  • Add a few more words to the dictionary. Each new word offers a compiler service, as I discussed in the entry on threads. (I know, I said that c, [c_comma] would be the only word in the dictionary, but these compiler services are intended to expose parts of the compiler so that new words can take advantage of them. A user could not add these words to the dictionary, so I have to.)

  • Write a new entry (or two) discussing how the user can add words to this compiler's dictionary.

One last thing: I implied in the beginning that I was going to provide a way for this compiler to store and use precode, but I don't think I'll do that. I'll start work on the real compiler soon, the 32-bit version, and it will have a different design.

  • The keyboard loop will simply store text into the current text block, not compile it immediately. The keyboard loop will be part of the text editor that will serve as the user's way of interacting with Karig. (I intend Karig to be a system for organizing text as well as for coding.)

  • The editor will include code to store text either as ASCII text or as precode. The compiler will not handle the creation of precode at all.

  • The compiler itself will be as it is in Chuck Moore's colorForth — separate from the editor, and designed solely to translate precode into machine code as quickly as possible.

It's the end of the month, and I wanted to add one more entry for January 2004.

Check the index for other entries.