Difference between revisions of "Benchmarking"
m |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. | Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. There are plenty of interesting things to be found out by benchmarking. For example, you can | ||
* determine the execution times of instructions | |||
* check the memory speed | |||
* find out about the usage of wait states in the system | |||
* find out when a certain bit in the system is set, and when it is reset. | |||
In the last case, for example, I wanted to find out whether the EO bit of the video processor's status register 2 is used in non-interlaced modes or not. The EO bit indicates which of the two screens is currently displayed (when using interlace). It was not clear, however, whether it is locked to a constant value when we do not use interlace or whether it still alternates between 0 and 1. | |||
How can we do this by measuring time? - The concept is to find out how long it takes until we have got 10000 times of this bit set to 1 when reading the status register. If the bit is locked to 0 it will take forever, so we need an upper bound for our iterations. If the bit is locked to 1 it will quickly terminate, so the number of iterations should not be too low. | |||
What I found was that it takes equal time for getting 10000 status reads with the bit set to 0 or to 1. This means that the bit indeed continuously changes between 0 and 1. | |||
== Timer main program == | == Timer main program == | ||
For my benchmark experiments I use the same frame program which launches the tests that are included. I split the parts of the program in order to explain them one after another. The complete file can be downloaded as a file in [[TIFILES format]] or on a [[Sector Dump Format]] image. | |||
* [http://www.mizapf.de/ti99/benchmark.tfi benchmark.tfi] | |||
* [http://www.mizapf.de/ti99/benchmark.dsk benchmark.dsk] | |||
DEF START | |||
START B @GO | |||
* F040 = ON-CHIP | |||
CODE EQU >6040 | |||
REGS EQU >F000 | |||
SRAM EQU >5000 | |||
DRAM EQU >6000 | |||
PAD EQU >F030 | |||
BOX EQU >8080 | |||
RESTXT TEXT 'Result for test ' | |||
CRLF BYTE 13,10 | |||
VALBUF DATA >3132,>3334,>3536,>3738 | |||
VIDXOP DATA 6 | |||
The list of tests is here. We are using pointers to the start and to the end of the test routine. The null value indicates the end of the list. | |||
TESTS DATA T01,T01E | |||
DATA 0 | |||
Now for the main program. We will set the mapper so that some areas are available as SRAM, some as DRAM, and also the Peripheral Box will be available. | |||
COUNT BYTE >30,>30,':',>20 | |||
H01 BYTE >01 | |||
H30 BYTE >30 | |||
H3A BYTE >3A | |||
SAVMAP DATA 0 | |||
GO LWPI >F000 | |||
LIMI 0 | |||
LI R1,>ED20 | |||
MOV R1,@>F112 4000=SRAM, 6000=DRAM | |||
LI R1,>EEEF | |||
MOV R1,@>F116 C000-FFFF = SRAM | |||
MOV @>F114,@SAVMAP | |||
LI R1,>BA00 | |||
MOVB R1,@>F114 8000=BOX 4000 | |||
LI R12,>1EE0 | |||
* SBZ 15 // wait state on | |||
SBO 15 // wait state off | |||
This is the test loop. Repeat until we read a null value from the list. Copy each test routine to the target memory area. The pointer to the area is CODE. So when we set CODE to F040 above, the tests will be run in the on-chip RAM. | |||
LI R15,TESTS | |||
BLOOP MOV *R15+,@PARM | |||
JEQ STOP | |||
MOV *R15+,@PARM+2 | |||
AB @H01,@COUNT+1 | |||
CB @H3A,@COUNT+1 | |||
JNE B1 | |||
AB @H01,@COUNT | |||
MOVB @H30,@COUNT+1 | |||
B1 BL @COPY | |||
PARM DATA 0,0 | |||
BL @GETTIM | |||
MOV R7,R14 | |||
BL @CODE | |||
BL @GETTIM | |||
MOV R14,R6 | |||
BL @PRINT | |||
LIMI 2 | |||
LIMI 0 | |||
JMP BLOOP | |||
STOP NOP | |||
LI R12,>1EE0 | |||
SBO 15 | |||
MOV @SAVMAP,@>F114 | |||
BLWP @>0000 | |||
This subprogram prints the difference of R6 and R7. We expect both registers to contain time values in tenths of seconds, starting from the current hour. So the maximum value will be 36000, 60 minutes times 60 seconds times 10 tenths. No test should be longer than one hour. | |||
* | |||
* PRINT: Prints the difference of R6 and R7 | |||
* If R7<R6 (new hour), add 36000 to their difference | |||
* | |||
PRINT MOV R11,R13 | |||
CLR R0 | |||
C R6,R7 | |||
JLE P2 | |||
P1 LI R0,36000 | |||
P2 S R6,R7 | |||
A R0,R7 | |||
LI R0,>27 | |||
LI R1,RESTXT | |||
LI R2,16 | |||
XOP @VIDXOP,0 | |||
LI R1,COUNT | |||
LI R2,4 | |||
XOP @VIDXOP,0 | |||
LI R1,VALBUF+7 | |||
MOV R7,R3 | |||
BL @ITOA | |||
LI R1,VALBUF+7 | |||
S R2,R1 | |||
INC R1 | |||
LI R0,>27 | |||
XOP @VIDXOP,0 | |||
LI R0,>27 | |||
LI R1,CRLF | |||
LI R2,2 | |||
XOP @VIDXOP,0 | |||
MOV R13,R11 | |||
RT | |||
Get the time from the clock chip. We do not have high precision timers here; in fact, we can only measure up to one tenth of a second. But this is no problem if we use loops in our test. That way, the actual time can be calculated afterwards. For example, if some command takes 1.2 microseconds, and you have it executed 10 million times, you will get a time period of 12 seconds. | |||
* | |||
* GETTIM: Gets the time as seconds and tenths | |||
* Returns time in R7 | |||
* | |||
* Uses R6-R10 | |||
* | |||
GETTIM LI R9,10 | |||
MOVB @>F135,R7 // digit for 10 m | |||
SLA R7,4 | |||
SRL R7,12 | |||
MPY R9,R7 // R8 contains minutes (tens) * 10 | |||
MOVB @>F134,R6 // minutes (units) | |||
SLA R6,4 | |||
SRL R6,12 | |||
A R6,R8 // add units | |||
MOV R8,R7 // store in R7 | |||
LI R9,60 | |||
MPY R9,R7 // R8 now contains minutes since begin. of hour | |||
MOV R8,R10 // as seconds. Save in R10. | |||
LI R9,10 | |||
MOVB @>F133,R7 // digit for 10 s | |||
SLA R7,4 | |||
SRL R7,12 // | |||
MPY R9,R7 // *10 (-> R7,R8) | |||
MOVB @>F132,R6 // seconds (units) | |||
SLA R6,4 | |||
SRL R6,12 | |||
A R6,R8 // | |||
A R10,R8 // add seconds to the above value | |||
MOV R8,R7 | |||
MPY R9,R7 // R8 has seconds *10 | |||
MOVB @>F131,R6 // tenths | |||
SLA R6,4 | |||
SRL R6,12 | |||
A R6,R8 // add tenths | |||
MOV R8,R7 // now in R7: number of 10ths seconds in this hour | |||
RT | |||
Convert our binary value to a string of ASCII so that we can output it on the screen. | |||
* | |||
* Integer to ASCII | |||
* R1 = Pointer of target buffer | |||
* R3 = 16 bit value | |||
* Returns: R2: length of number | |||
* | |||
ITOA LI R8,10 | |||
CLR R2 | |||
MOV R3,R5 | |||
ITOAL CLR R4 | |||
DIV R8,R4 // R5=number mod 10 | |||
SLA R5,8 | |||
AI R5,>3000 | |||
MOVB R5,*R1 | |||
DEC R1 | |||
INC R2 | |||
MOV R4,R5 | |||
JNE ITOAL | |||
RT | |||
This subprogram copies the test routine into the target memory location. | |||
* | |||
* Copy into test area | |||
* | |||
COPY MOV *R11+,R0 | |||
MOV *R11+,R2 | |||
LI R1,CODE | |||
C1 MOV *R0+,*R1+ | |||
C R0,R2 | |||
JLE C1 | |||
RT | |||
Now what follows are the test routines. You can see a sample below. Just add the pointers to the start and end of the routine to the list above. You can add the routines as text to this file, or you can use COPY directives to let the assembler add the code. | |||
== Determining the video interrupt rate == | == Determining the video interrupt rate == | ||
Line 10: | Line 218: | ||
* interrupts have occured | * interrupts have occured | ||
T01 LIMI 0 | |||
CLR R12 | CLR R12 | ||
SBO 2 enable VDP interrupt propagation through 9901 | SBO 2 enable VDP interrupt propagation through 9901 | ||
Line 98: | Line 306: | ||
Accordingly, we get '''60 Hz''' for the NTSC and '''50 Hz''' for the PAL setting. The timing is not affected by the number of display lines (192 or 212) and not by interlace mode (on or off). | Accordingly, we get '''60 Hz''' for the NTSC and '''50 Hz''' for the PAL setting. The timing is not affected by the number of display lines (192 or 212) and not by interlace mode (on or off). | ||
[[Category:MDOS]] | |||
[[Category:Geneve]] | |||
[[Category:Programming]] |
Latest revision as of 18:09, 14 January 2012
Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. There are plenty of interesting things to be found out by benchmarking. For example, you can
- determine the execution times of instructions
- check the memory speed
- find out about the usage of wait states in the system
- find out when a certain bit in the system is set, and when it is reset.
In the last case, for example, I wanted to find out whether the EO bit of the video processor's status register 2 is used in non-interlaced modes or not. The EO bit indicates which of the two screens is currently displayed (when using interlace). It was not clear, however, whether it is locked to a constant value when we do not use interlace or whether it still alternates between 0 and 1.
How can we do this by measuring time? - The concept is to find out how long it takes until we have got 10000 times of this bit set to 1 when reading the status register. If the bit is locked to 0 it will take forever, so we need an upper bound for our iterations. If the bit is locked to 1 it will quickly terminate, so the number of iterations should not be too low.
What I found was that it takes equal time for getting 10000 status reads with the bit set to 0 or to 1. This means that the bit indeed continuously changes between 0 and 1.
Timer main program
For my benchmark experiments I use the same frame program which launches the tests that are included. I split the parts of the program in order to explain them one after another. The complete file can be downloaded as a file in TIFILES format or on a Sector Dump Format image.
DEF START START B @GO * F040 = ON-CHIP CODE EQU >6040 REGS EQU >F000 SRAM EQU >5000 DRAM EQU >6000 PAD EQU >F030 BOX EQU >8080 RESTXT TEXT 'Result for test ' CRLF BYTE 13,10 VALBUF DATA >3132,>3334,>3536,>3738 VIDXOP DATA 6
The list of tests is here. We are using pointers to the start and to the end of the test routine. The null value indicates the end of the list.
TESTS DATA T01,T01E DATA 0
Now for the main program. We will set the mapper so that some areas are available as SRAM, some as DRAM, and also the Peripheral Box will be available.
COUNT BYTE >30,>30,':',>20 H01 BYTE >01 H30 BYTE >30 H3A BYTE >3A SAVMAP DATA 0 GO LWPI >F000 LIMI 0 LI R1,>ED20 MOV R1,@>F112 4000=SRAM, 6000=DRAM LI R1,>EEEF MOV R1,@>F116 C000-FFFF = SRAM MOV @>F114,@SAVMAP LI R1,>BA00 MOVB R1,@>F114 8000=BOX 4000 LI R12,>1EE0 * SBZ 15 // wait state on SBO 15 // wait state off
This is the test loop. Repeat until we read a null value from the list. Copy each test routine to the target memory area. The pointer to the area is CODE. So when we set CODE to F040 above, the tests will be run in the on-chip RAM.
LI R15,TESTS BLOOP MOV *R15+,@PARM JEQ STOP MOV *R15+,@PARM+2 AB @H01,@COUNT+1 CB @H3A,@COUNT+1 JNE B1 AB @H01,@COUNT MOVB @H30,@COUNT+1 B1 BL @COPY PARM DATA 0,0 BL @GETTIM MOV R7,R14 BL @CODE BL @GETTIM MOV R14,R6 BL @PRINT LIMI 2 LIMI 0 JMP BLOOP STOP NOP LI R12,>1EE0 SBO 15 MOV @SAVMAP,@>F114 BLWP @>0000
This subprogram prints the difference of R6 and R7. We expect both registers to contain time values in tenths of seconds, starting from the current hour. So the maximum value will be 36000, 60 minutes times 60 seconds times 10 tenths. No test should be longer than one hour.
* * PRINT: Prints the difference of R6 and R7 * If R7<R6 (new hour), add 36000 to their difference * PRINT MOV R11,R13 CLR R0 C R6,R7 JLE P2 P1 LI R0,36000 P2 S R6,R7 A R0,R7 LI R0,>27 LI R1,RESTXT LI R2,16 XOP @VIDXOP,0 LI R1,COUNT LI R2,4 XOP @VIDXOP,0 LI R1,VALBUF+7 MOV R7,R3 BL @ITOA LI R1,VALBUF+7 S R2,R1 INC R1 LI R0,>27 XOP @VIDXOP,0 LI R0,>27 LI R1,CRLF LI R2,2 XOP @VIDXOP,0 MOV R13,R11 RT
Get the time from the clock chip. We do not have high precision timers here; in fact, we can only measure up to one tenth of a second. But this is no problem if we use loops in our test. That way, the actual time can be calculated afterwards. For example, if some command takes 1.2 microseconds, and you have it executed 10 million times, you will get a time period of 12 seconds.
* * GETTIM: Gets the time as seconds and tenths * Returns time in R7 * * Uses R6-R10 * GETTIM LI R9,10 MOVB @>F135,R7 // digit for 10 m SLA R7,4 SRL R7,12 MPY R9,R7 // R8 contains minutes (tens) * 10 MOVB @>F134,R6 // minutes (units) SLA R6,4 SRL R6,12 A R6,R8 // add units MOV R8,R7 // store in R7 LI R9,60 MPY R9,R7 // R8 now contains minutes since begin. of hour MOV R8,R10 // as seconds. Save in R10. LI R9,10 MOVB @>F133,R7 // digit for 10 s SLA R7,4 SRL R7,12 // MPY R9,R7 // *10 (-> R7,R8) MOVB @>F132,R6 // seconds (units) SLA R6,4 SRL R6,12 A R6,R8 // A R10,R8 // add seconds to the above value MOV R8,R7 MPY R9,R7 // R8 has seconds *10 MOVB @>F131,R6 // tenths SLA R6,4 SRL R6,12 A R6,R8 // add tenths MOV R8,R7 // now in R7: number of 10ths seconds in this hour RT
Convert our binary value to a string of ASCII so that we can output it on the screen.
* * Integer to ASCII * R1 = Pointer of target buffer * R3 = 16 bit value * Returns: R2: length of number * ITOA LI R8,10 CLR R2 MOV R3,R5 ITOAL CLR R4 DIV R8,R4 // R5=number mod 10 SLA R5,8 AI R5,>3000 MOVB R5,*R1 DEC R1 INC R2 MOV R4,R5 JNE ITOAL RT
This subprogram copies the test routine into the target memory location.
* * Copy into test area * COPY MOV *R11+,R0 MOV *R11+,R2 LI R1,CODE C1 MOV *R0+,*R1+ C R0,R2 JLE C1 RT
Now what follows are the test routines. You can see a sample below. Just add the pointers to the start and end of the routine to the list above. You can add the routines as text to this file, or you can use COPY directives to let the assembler add the code.
Determining the video interrupt rate
* Wait in a loop until the desired number of * interrupts have occured T01 LIMI 0 CLR R12 SBO 2 enable VDP interrupt propagation through 9901 LI R0,>8170 VReg 1 contains a flag to enable vertical sync interrupt SWPB R0 MOVB R0,@>F102 SWPB R0 MOVB R0,@>F102 LI R0,>8980 VReg 9 contains flags to set 192/212 lines, NTSC/PAL, interlace/non-interlace SWPB R0 MOVB R0,@>F102 SWPB R0 MOVB R0,@>F102 MOV @>0004,R6 Save INT2 vector to R6/R7 MOV @>0006,R7 LI R0,>F040 Set our own interrupt routine at INT2 MOV R0,@>0004 LI R0,INTR MOV R0,@>0006 * We set our counter to 1000 interrupts LI R3,1000 MOV R3,@ITER * Arm the interrupts LIMI 2 * ... and wait in a loop until the counter is zero T012 MOV @ITER,R0 JNE T012 * Block the interrupts again LIMI 0 MOV R6,@>0004 Restore the vector MOV R7,@>0006 T01E RT
This is the interrupt routine which we have to install:
ITER DATA 0 Counter * Start of the routine INTR LIMI 0 Block all interrupts (see below, 1) * Read the status registers. This will clear the flags. (2) * One of the flags is in SREG1 BL @GETREG DATA 1 BL @GETREG DATA 0 SLA R0,1 Is the leftmost flag set (VSYNC)? JNC SKIP If not, skip the DEC command DEC @ITER Decrease our counter SKIP RTWP * Routine to read a given status register into R0 * Register number must be in data line (LSB) GETREG MOV *R11+,R0 ORI R0,>8F00 SWPB R0 MOVB R0,@>F102 SWPB R0 MOVB R0,@>F102 CLR R0 NOP MOVB @>F102,R0 RT
Comments:
(1) We have to disable the interrupts here. The routine above has set the mask to 0002, which enables interrupts from other sources as well. If we do not block the interrupts, another interrupt request may interrupt this handler, and we will lose the return vector. The RTWP command at the end will restore the interrupt mask.
(2) We must clear the flag which caused the interrupt. Unless cleared, the INT line from the VDP will stay low (active) and will re-trigger the interrupt. So the first thing to do in the interrupt handler is to clear the origin of the interrupt.
Result of the video benchmark
Using the RTC we can determine the time which passed between the first and the 1000th interrupt:
- NTSC set (video register 9): 16.67 s
- PAL set (also in reg 9): 20.0 s
Accordingly, we get 60 Hz for the NTSC and 50 Hz for the PAL setting. The timing is not affected by the number of display lines (192 or 212) and not by interlace mode (on or off).