Benchmarking
Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. There are plenty of interesting things to be found out by benchmarking. For example, you can
- determine the execution times of instructions
- check the memory speed
- find out about the usage of wait states in the system
- find out when a certain bit in the system is set, and when it is reset.
In the last case, for example, I wanted to find out whether the EO bit of the video processor's status register 2 is used in non-interlaced modes or not. The EO bit indicates which of the two screens is currently displayed (when using interlace). It was not clear, however, whether it is locked to a constant value when we do not use interlace or whether it still alternates between 0 and 1.
How can we do this by measuring time? - The concept is to find out how long it takes until we have got 10000 times of this bit set to 1 when reading the status register. If the bit is locked to 0 it will take forever, so we need an upper bound for our iterations. If the bit is locked to 1 it will quickly terminate, so the number of iterations should not be too low.
What I found was that it takes equal time for getting 10000 status reads with the bit set to 0 or to 1. This means that the bit indeed continuously changes between 0 and 1.
Timer main program
For my benchmark experiments I use the same frame program which launches the tests that are included. I split the parts of the program in order to explain them one after another. The complete file can be downloaded as a TIFILES file or on a sector dump image.
DEF START START B @GO * F040 = ON-CHIP CODE EQU >6040 REGS EQU >F000 SRAM EQU >5000 DRAM EQU >6000 PAD EQU >F030 BOX EQU >8080 RESTXT TEXT 'Result for test ' CRLF BYTE 13,10 VALBUF DATA >3132,>3334,>3536,>3738 VIDXOP DATA 6
The list of tests is here. We are using pointers to the start and to the end of the test routine. The null value indicates the end of the list.
TESTS DATA T01,T01E
DATA 0
Now for the main program. We will set the mapper so that some areas are available as SRAM, some as DRAM, and also the Peripheral Box will be available.
COUNT BYTE >30,>30,':',>20
H01 BYTE >01
H30 BYTE >30
H3A BYTE >3A
SAVMAP DATA 0
GO LWPI >F000
LIMI 0
LI R1,>ED20
MOV R1,@>F112 4000=SRAM, 6000=DRAM
LI R1,>EEEF
MOV R1,@>F116 C000-FFFF = SRAM
MOV @>F114,@SAVMAP
LI R1,>BA00
MOVB R1,@>F114 8000=BOX 4000
LI R12,>1EE0
* SBZ 15 // wait state on
SBO 15 // wait state off
This is the test loop. Repeat until we read a null value from the list. Copy each test routine to the target memory area. The pointer to the area is CODE. So when we set CODE to F040 above, the tests will be run in the on-chip RAM.
LI R15,TESTS
BLOOP MOV *R15+,@PARM
JEQ STOP
MOV *R15+,@PARM+2
AB @H01,@COUNT+1
CB @H3A,@COUNT+1
JNE B1
AB @H01,@COUNT
MOVB @H30,@COUNT+1
B1 BL @COPY
PARM DATA 0,0
BL @GETTIM
MOV R7,R14
BL @CODE
BL @GETTIM
MOV R14,R6
BL @PRINT
LIMI 2
LIMI 0
JMP BLOOP
STOP NOP
LI R12,>1EE0
SBO 15
MOV @SAVMAP,@>F114
BLWP @>0000
This subprogram prints the difference of R6 and R7. We expect both registers to contain time values in tenths of seconds, starting from the current hour. So the maximum value will be 36000, 60 minutes times 60 seconds times 10 tenths. No test should be longer than one hour.
*
* PRINT: Prints the difference of R6 and R7
* If R7<R6 (new hour), add 36000 to their difference
*
PRINT MOV R11,R13
CLR R0
C R6,R7
JLE P2
P1 LI R0,36000
P2 S R6,R7
A R0,R7
LI R0,>27
LI R1,RESTXT
LI R2,16
XOP @VIDXOP,0
LI R1,COUNT
LI R2,4
XOP @VIDXOP,0
LI R1,VALBUF+7
MOV R7,R3
BL @ITOA
LI R1,VALBUF+7
S R2,R1
INC R1
LI R0,>27
XOP @VIDXOP,0
LI R0,>27
LI R1,CRLF
LI R2,2
XOP @VIDXOP,0
MOV R13,R11
RT
Get the time from the clock chip. We do not have high precision timers here; in fact, we can only measure up to one tenth of a second. But this is no problem if we use loops in our test. That way, the actual time can be calculated afterwards. For example, if some command takes 1.2 microseconds, and you have it executed 10 million times, you will get a time period of 12 seconds.
*
* GETTIM: Gets the time as seconds and tenths
* Returns time in R7
*
* Uses R6-R10
*
GETTIM LI R9,10
MOVB @>F135,R7 // digit for 10 m
SLA R7,4
SRL R7,12
MPY R9,R7 // R8 contains minutes (tens) * 10
MOVB @>F134,R6 // minutes (units)
SLA R6,4
SRL R6,12
A R6,R8 // add units
MOV R8,R7 // store in R7
LI R9,60
MPY R9,R7 // R8 now contains minutes since begin. of hour
MOV R8,R10 // as seconds. Save in R10.
LI R9,10
MOVB @>F133,R7 // digit for 10 s
SLA R7,4
SRL R7,12 //
MPY R9,R7 // *10 (-> R7,R8)
MOVB @>F132,R6 // seconds (units)
SLA R6,4
SRL R6,12
A R6,R8 //
A R10,R8 // add seconds to the above value
MOV R8,R7
MPY R9,R7 // R8 has seconds *10
MOVB @>F131,R6 // tenths
SLA R6,4
SRL R6,12
A R6,R8 // add tenths
MOV R8,R7 // now in R7: number of 10ths seconds in this hour
RT
Convert our binary value to a string of ASCII so that we can output it on the screen.
*
* Integer to ASCII
* R1 = Pointer of target buffer
* R3 = 16 bit value
* Returns: R2: length of number
*
ITOA LI R8,10
CLR R2
MOV R3,R5
ITOAL CLR R4
DIV R8,R4 // R5=number mod 10
SLA R5,8
AI R5,>3000
MOVB R5,*R1
DEC R1
INC R2
MOV R4,R5
JNE ITOAL
RT
This subprogram copies the test routine into the target memory location.
*
* Copy into test area
*
COPY MOV *R11+,R0
MOV *R11+,R2
LI R1,CODE
C1 MOV *R0+,*R1+
C R0,R2
JLE C1
RT
Now what follows are the test routines. You can see a sample below. Just add the pointers to the start and end of the routine to the list above. You can add the routines as text to this file, or you can use COPY directives to let the assembler add the code.
Determining the video interrupt rate
* Wait in a loop until the desired number of
* interrupts have occured
T01 LIMI 0
CLR R12
SBO 2 enable VDP interrupt propagation through 9901
LI R0,>8170 VReg 1 contains a flag to enable vertical sync interrupt
SWPB R0
MOVB R0,@>F102
SWPB R0
MOVB R0,@>F102
LI R0,>8980 VReg 9 contains flags to set 192/212 lines, NTSC/PAL, interlace/non-interlace
SWPB R0
MOVB R0,@>F102
SWPB R0
MOVB R0,@>F102
MOV @>0004,R6 Save INT2 vector to R6/R7
MOV @>0006,R7
LI R0,>F040 Set our own interrupt routine at INT2
MOV R0,@>0004
LI R0,INTR
MOV R0,@>0006
* We set our counter to 1000 interrupts
LI R3,1000
MOV R3,@ITER
* Arm the interrupts
LIMI 2
* ... and wait in a loop until the counter is zero
T012 MOV @ITER,R0
JNE T012
* Block the interrupts again
LIMI 0
MOV R6,@>0004 Restore the vector
MOV R7,@>0006
T01E RT
This is the interrupt routine which we have to install:
ITER DATA 0 Counter
* Start of the routine
INTR LIMI 0 Block all interrupts (see below, 1)
* Read the status registers. This will clear the flags. (2)
* One of the flags is in SREG1
BL @GETREG
DATA 1
BL @GETREG
DATA 0
SLA R0,1 Is the leftmost flag set (VSYNC)?
JNC SKIP If not, skip the DEC command
DEC @ITER Decrease our counter
SKIP RTWP
* Routine to read a given status register into R0
* Register number must be in data line (LSB)
GETREG MOV *R11+,R0
ORI R0,>8F00
SWPB R0
MOVB R0,@>F102
SWPB R0
MOVB R0,@>F102
CLR R0
NOP
MOVB @>F102,R0
RT
Comments:
(1) We have to disable the interrupts here. The routine above has set the mask to 0002, which enables interrupts from other sources as well. If we do not block the interrupts, another interrupt request may interrupt this handler, and we will lose the return vector. The RTWP command at the end will restore the interrupt mask.
(2) We must clear the flag which caused the interrupt. Unless cleared, the INT line from the VDP will stay low (active) and will re-trigger the interrupt. So the first thing to do in the interrupt handler is to clear the origin of the interrupt.
Result of the video benchmark
Using the RTC we can determine the time which passed between the first and the 1000th interrupt:
- NTSC set (video register 9): 16.67 s
- PAL set (also in reg 9): 20.0 s
Accordingly, we get 60 Hz for the NTSC and 50 Hz for the PAL setting. The timing is not affected by the number of display lines (192 or 212) and not by interlace mode (on or off).