Difference between revisions of "Benchmarking"

From Ninerpedia
Jump to navigation Jump to search
Line 1: Line 1:
Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode.
Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. There are plenty of interesting things to be found out by benchmarking. For example, you can
 
* determine the execution times of instructions
* check the memory speed
* find out about the usage of wait states in the system
* find out when a certain bit in the system is set, and when it is reset.
 
In the last case, for example, I wanted to find out whether the EO bit of the video processor's status register 2 is used in non-interlaced modes or not. The EO bit indicates which of the two screens is currently displayed (when using interlace). It was not clear, however, whether it is locked to a constant value when we do not use interlace or whether it still alternates between 0 and 1.
 
How can we do this by measuring time? - The concept is to find out how long it takes until we have got 10000 times of this bit set to 1 when reading the status register. If the bit is locked to 0 it will take forever, so we need an upper bound for our iterations. If the bit is locked to 1 it will quickly terminate, so the number of iterations should not be too low.
 
What I found was that it takes equal time for getting 10000 status reads with the bit set to 0 or to 1. This means that the bit indeed continuously changes between 0 and 1.  


== Timer main program ==
== Timer main program ==

Revision as of 16:49, 14 January 2012

Using the on-board real-time clock (RTC) we can do a lot of interesting tests on the Geneve, in its MDOS and GPL mode. There are plenty of interesting things to be found out by benchmarking. For example, you can

  • determine the execution times of instructions
  • check the memory speed
  • find out about the usage of wait states in the system
  • find out when a certain bit in the system is set, and when it is reset.

In the last case, for example, I wanted to find out whether the EO bit of the video processor's status register 2 is used in non-interlaced modes or not. The EO bit indicates which of the two screens is currently displayed (when using interlace). It was not clear, however, whether it is locked to a constant value when we do not use interlace or whether it still alternates between 0 and 1.

How can we do this by measuring time? - The concept is to find out how long it takes until we have got 10000 times of this bit set to 1 when reading the status register. If the bit is locked to 0 it will take forever, so we need an upper bound for our iterations. If the bit is locked to 1 it will quickly terminate, so the number of iterations should not be too low.

What I found was that it takes equal time for getting 10000 status reads with the bit set to 0 or to 1. This means that the bit indeed continuously changes between 0 and 1.

Timer main program

For my benchmark experiments I use the same frame program which launches the tests that are included. I split the parts of the program in order to explain them one after another. The complete file can be downloaded as a TIFILES file or on a sector dump image.

       DEF  START

START  B    @GO

* F040 = ON-CHIP
CODE   EQU  >6040
REGS   EQU  >F000
SRAM   EQU  >5000
DRAM   EQU  >6000
PAD    EQU  >F030
BOX    EQU  >8080

RESTXT TEXT 'Result for test '
CRLF   BYTE 13,10
VALBUF DATA >3132,>3334,>3536,>3738
VIDXOP DATA 6

The list of tests is here. We are using pointers to the start and to the end of the test routine. The null value indicates the end of the list.

TESTS  DATA T01,T01E
       DATA 0

Now for the main program. We will set the mapper so that some areas are available as SRAM, some as DRAM, and also the Peripheral Box will be available.

COUNT  BYTE >30,>30,':',>20
H01    BYTE >01
H30    BYTE >30
H3A    BYTE >3A
SAVMAP DATA 0

GO     LWPI >F000
       LIMI 0

       LI   R1,>ED20
       MOV  R1,@>F112     4000=SRAM, 6000=DRAM
       LI   R1,>EEEF
       MOV  R1,@>F116     C000-FFFF = SRAM 

       MOV  @>F114,@SAVMAP
       LI   R1,>BA00
       MOVB R1,@>F114     8000=BOX 4000 

       LI   R12,>1EE0
*      SBZ  15          // wait state on
       SBO  15          // wait state off

This is the test loop. Repeat until we read a null value from the list. Copy each test routine to the target memory area. The pointer to the area is CODE. So when we set CODE to F040 above, the tests will be run in the on-chip RAM.

       LI   R15,TESTS
BLOOP  MOV  *R15+,@PARM
       JEQ  STOP
       MOV  *R15+,@PARM+2
       AB   @H01,@COUNT+1
       CB   @H3A,@COUNT+1
       JNE  B1
       AB   @H01,@COUNT
       MOVB @H30,@COUNT+1 

B1     BL   @COPY
PARM   DATA 0,0
       BL   @GETTIM
       MOV  R7,R14
       BL   @CODE
       BL   @GETTIM
       MOV  R14,R6
       BL   @PRINT
       LIMI 2
       LIMI 0
       JMP  BLOOP
STOP   NOP
       LI   R12,>1EE0
       SBO  15
       MOV  @SAVMAP,@>F114
       BLWP @>0000

This subprogram prints the difference of R6 and R7. We expect both registers to contain time values in tenths of seconds, starting from the current hour. So the maximum value will be 36000, 60 minutes times 60 seconds times 10 tenths. No test should be longer than one hour.

*
* PRINT: Prints the difference of R6 and R7
*        If R7<R6 (new hour), add 36000 to their difference
*
PRINT  MOV  R11,R13
       CLR  R0
       C    R6,R7
       JLE  P2  

P1     LI   R0,36000
P2     S    R6,R7
       A    R0,R7 

       LI   R0,>27
       LI   R1,RESTXT
       LI   R2,16
       XOP  @VIDXOP,0

       LI   R1,COUNT
       LI   R2,4
       XOP  @VIDXOP,0 

       LI   R1,VALBUF+7
       MOV  R7,R3
       BL   @ITOA
       LI   R1,VALBUF+7
       S    R2,R1
       INC  R1
       LI   R0,>27
       XOP  @VIDXOP,0 

       LI   R0,>27
       LI   R1,CRLF
       LI   R2,2
       XOP  @VIDXOP,0 

       MOV  R13,R11
       RT

Get the time from the clock chip. We do not have high precision timers here; in fact, we can only measure up to one tenth of a second. But this is no problem if we use loops in our test. That way, the actual time can be calculated afterwards. For example, if some command takes 1.2 microseconds, and you have it executed 10 million times, you will get a time period of 12 seconds.

*
* GETTIM: Gets the time as seconds and tenths
*         Returns time in R7 
*
*         Uses R6-R10
*
GETTIM LI   R9,10
       MOVB @>F135,R7   // digit for 10 m
       SLA  R7,4
       SRL  R7,12
       MPY  R9,R7       // R8 contains minutes (tens) * 10
       MOVB @>F134,R6   // minutes (units)
       SLA  R6,4
       SRL  R6,12
       A    R6,R8       // add units
       MOV  R8,R7       // store in R7
       LI   R9,60
       MPY  R9,R7       // R8 now contains minutes since begin. of hour
       MOV  R8,R10      // as seconds. Save in R10.
       LI   R9,10
       MOVB @>F133,R7   // digit for 10 s
       SLA  R7,4
       SRL  R7,12       //
       MPY  R9,R7       // *10  (-> R7,R8)
       MOVB @>F132,R6   // seconds (units)
       SLA  R6,4
       SRL  R6,12
       A    R6,R8       //
       A    R10,R8      // add seconds to the above value
       MOV  R8,R7
       MPY  R9,R7       // R8 has seconds *10
       MOVB @>F131,R6   // tenths
       SLA  R6,4
       SRL  R6,12
       A    R6,R8       // add tenths
       MOV  R8,R7       // now in R7: number of 10ths seconds in this hour

       RT

Convert our binary value to a string of ASCII so that we can output it on the screen.

*
*  Integer to ASCII
*  R1 = Pointer of target buffer
*  R3 = 16 bit value
*  Returns: R2: length of number
*
ITOA   LI   R8,10
       CLR  R2
       MOV  R3,R5
ITOAL  CLR  R4
       DIV  R8,R4       // R5=number mod 10
       SLA  R5,8
       AI   R5,>3000
       MOVB R5,*R1
       DEC  R1
       INC  R2
       MOV  R4,R5
       JNE  ITOAL
       RT

This subprogram copies the test routine into the target memory location.

*
* Copy into test area
*
COPY   MOV  *R11+,R0
       MOV  *R11+,R2
       LI   R1,CODE
C1     MOV  *R0+,*R1+
       C    R0,R2
       JLE  C1
       RT 

Now what follows are the test routines. You can see a sample below. Just add the pointers to the start and end of the routine to the list above. You can add the routines as text to this file, or you can use COPY directives to let the assembler add the code.

Determining the video interrupt rate

* Wait in a loop until the desired number of
* interrupts have occured

T01    LIMI 0
       CLR  R12
       SBO  2               enable VDP interrupt propagation through 9901

       LI   R0,>8170        VReg 1 contains a flag to enable vertical sync interrupt
       SWPB R0
       MOVB R0,@>F102
       SWPB R0
       MOVB R0,@>F102

       LI   R0,>8980        VReg 9 contains flags to set 192/212 lines, NTSC/PAL, interlace/non-interlace
       SWPB R0
       MOVB R0,@>F102
       SWPB R0
       MOVB R0,@>F102

       MOV  @>0004,R6       Save INT2 vector to R6/R7
       MOV  @>0006,R7 

       LI   R0,>F040        Set our own interrupt routine at INT2
       MOV  R0,@>0004
       LI   R0,INTR
       MOV  R0,@>0006 

* We set our counter to 1000 interrupts
       LI   R3,1000
       MOV  R3,@ITER

* Arm the interrupts
       LIMI 2

* ... and wait in a loop until the counter is zero
T012   MOV  @ITER,R0
       JNE  T012

* Block the interrupts again
       LIMI 0
       MOV  R6,@>0004         Restore the vector
       MOV  R7,@>0006 

T01E   RT

This is the interrupt routine which we have to install:

ITER   DATA 0      Counter

* Start of the routine
INTR   LIMI 0      Block all interrupts (see below, 1)

* Read the status registers. This will clear the flags. (2)
* One of the flags is in SREG1
       BL   @GETREG
       DATA 1
       BL   @GETREG
       DATA 0
       SLA  R0,1         Is the leftmost flag set (VSYNC)?
       JNC  SKIP         If not, skip the DEC command
       DEC  @ITER        Decrease our counter 
SKIP   RTWP

* Routine to read a given status register into R0
* Register number must be in data line (LSB) 

GETREG MOV  *R11+,R0
       ORI  R0,>8F00
       SWPB R0
       MOVB R0,@>F102
       SWPB R0
       MOVB R0,@>F102
       CLR  R0
       NOP
       MOVB @>F102,R0
       RT

Comments:

(1) We have to disable the interrupts here. The routine above has set the mask to 0002, which enables interrupts from other sources as well. If we do not block the interrupts, another interrupt request may interrupt this handler, and we will lose the return vector. The RTWP command at the end will restore the interrupt mask.

(2) We must clear the flag which caused the interrupt. Unless cleared, the INT line from the VDP will stay low (active) and will re-trigger the interrupt. So the first thing to do in the interrupt handler is to clear the origin of the interrupt.

Result of the video benchmark

Using the RTC we can determine the time which passed between the first and the 1000th interrupt:

  • NTSC set (video register 9): 16.67 s
  • PAL set (also in reg 9): 20.0 s

Accordingly, we get 60 Hz for the NTSC and 50 Hz for the PAL setting. The timing is not affected by the number of display lines (192 or 212) and not by interlace mode (on or off).