BASIC file formats
BASIC programs written for TI BASIC and Extended BASIC are not stored as plain text in memory. This is different with assembler programs which are edited as text files and then assembled to a Tagged Object Code file.
This is not appropriate for BASIC. When the program is started, and it would be stored as plain text, the BASIC interpreter would have to parse the line first, finding out the commands and the arguments, and then execute it. This is typical for script languages of today, but it would be just too slow, and we know well that TI BASIC and Extended BASIC are quite slow, compared with other platforms.
BASIC lines are tokenized. For each command or special character or character sequence that has a meaning in BASIC there is a one-byte code, the token. Example:
Command | Token (hex) |
---|---|
NEW | 00 |
SAVE | 07 |
EDIT | 09 |
9c | |
& | b8 |
"..." (quoted string) | c7 |
SEG$ | d8 |
VALIDATE | fe |
You can find a complete table here.
So let us take a simple BASIC line like
PRINT "HELLO"
There will not be a string like "PRINT" in memory, because the parser recognized this word as a command and replaced it with its token. Second, there is a string following the command, which is enclosed in quotes. The contents can be anything, so the parser must copy it into memory as is.
Finally, the line is converted to the following byte sequence:
09 | 9c | c7 | 05 | 48 | 45 | 4c | 4c | 4f | 00 |
line length | "..." | string length | H | E | L | L | O | end |
Sample program
Let's have a look at a real Extended BASIC program. This is an output of TIImageTool which shows the contents of a PROGRAM file.
000000: 00 3f 37 a7 37 98 37 d7 00 28 37 a9 00 1e 37 ac .?7.7.7..(7...7. 000010: 00 14 37 b2 00 0a 37 ca 02 8b 00 05 96 52 4f 57 ..7...7......ROW 000020: 00 17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 .....ROW...1.... 000030: 54 45 53 54 b4 52 4f 57 00 0e 8c 52 4f 57 be c8 TEST.ROW...ROW.. 000040: 01 31 b1 c8 02 32 30 00 .1...20.
The numbers on the left (xxxxx:) are the offset from the beginning of the file. At the right side we see the ASCII representation of the bytes, where unprintable characters are shown by a dot. The offsets and the ASCII column are not part of the file but added for better readability.
There are no commands to be seen, but we should expect nothing like that, after reading the above paragraphs.
At first we cut away the offsets and the ASCII column, and we add some line breaks so we see the file structure. We join some bytes together as they are parts of words.
003f 37a7 3798 37d7 0028 37a9 001e 37ac 0014 37b2 000a 37ca 02 8b 00 05 96 52 4f 57 00 17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00 0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00
Everything is still the same. We can now analyse the contents of the file.
Meaning | Contents |
---|---|
Header | 003f 37a7 3798 37d7 |
Line Number Table | 0028 37a9 |
001e 37ac | |
0014 37b2 | |
000a 37ca | |
Program lines | 02 8b 00 |
05 96 52 4f 57 00 | |
17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00 | |
0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00 |
TODO: continue