Difference between revisions of "Geneve wait state generation"

From Ninerpedia
Jump to navigation Jump to search
(Created page with "Within the Geneve, wait states can be generated to slow down operation for keeping timing constraints. The TMS 9995 CPU can create wait states itself on every external memory acc...")
 
Line 13: Line 13:
== Memory access ==
== Memory access ==


We can use software to turn on additional wait states for memory access using the flag bit at [[Geneve CRU definitions|CRU address]] 1EFE. When active (low), wait states are created on every ''external'' memory access, either reading or writing. For CPU RAM accesses (memory locations F000 to F0FB) no wait states are created.
Some things should be considered:
* For each external memory access, 2 wait state cycles are created. This means that for word operations, 4 wait states are produced.
* Wait states created by this method ''do not add'' on the DRAM wait states. Here, memory operations in SRAM and DRAM work at the same speed.
* For calculating the total number of cycles spent with one operation, the number of memory accesses for determining the source and destination memory location must be considered. In particular, the address calculation may be quite easy when registers are used, and very complex when the contents of a register must first be read and then added to a value which must be read from the following memory location.
Assumung that the following line and the registers reside in CPU RAM, the instruction
MOV R2,R3
takes three cycles (read MOV (including the values 2 and 3), read value at location of R2, write value to location of R3). This does not change when wait states are active. In contrast,
MOV @SRAMLOC,@SRAMLOC+2
takes 7 cycles without wait states (read MOV, read SRAMLOC, read SRAMLOC+2, read byte from SRAMLOC, read byte from SRAMLOC+1, write byte to SRAMLOC+2, write byte to SRAMLOC+3) and 15 cycles with wait states (add 2 wait states for each of the last four operations).


== Video operation ==
== Video operation ==
As known from the TI-99/4A, accesses to the Video Display Processor must be properly timed, since the VDP does not keep up with the higher speed of the CPU. When bytes are written in a too high frequency, some of them may be lost; when reading, the value may not reflect the current video RAM contents. Setting the address may also fail when writing too quickly. All this is a consequence of a missing synchronization link between VDP and CPU. The CPU cannot find out whether the video processor is ready for the next byte. (Note that for V9938 commands, a ready flag is available to determine whether some command has finished processing.)
The problem has become worse with the higher performance of the Geneve. This may mean that programs that worked well with the TI may fail to run on the Geneve because of VDP overruns. For this reason, wait states may be inserted for video operations.
However, there is one thing to remember: Wait states can only be inserted into ''memory accesses''.
This means that if we do not use memory accesses, the wait states are not effective. The access to the VDP ports does not count as an external memery access. If we write a program that resides in CPU RAM completely, operating on the VDP, we cannot slow it down with the wait states. Therefore, '''video access should not be done from the CPU RAM'''.

Revision as of 23:40, 19 September 2011

Within the Geneve, wait states can be generated to slow down operation for keeping timing constraints. The TMS 9995 CPU can create wait states itself on every external memory access by a certain hardware initialization (READY high with RESET going from low to high). This is not used in the Geneve; those wait states could not be turned off.

Instead, we have an external wait state generation. The gate array circuit is used to create wait states in certain situations. When a wait state shall appear, the READY line of the CPU must be pulled down (cleared). Apart from the permanent wait state, the CPU itself does not create any wait state. This should be considered when only internal accesses are done: If code is running within the internal CPU RAM, wait states have no effect. They have only effect for external memory accesses.

One wait state has the exact duration of one cycle which is 333.3 nanoseconds. Three millions of them last for one second.

DRAM access

For each DRAM access, the gate array creates 1 wait state. That means that for word accesses (like CLR or MOV), two wait states will be created. If the operation and both word operands are in DRAM, we get at least 6 wait states which can have a significant impact on performance.

SRAM accesses are, by design, zero wait state accesses. Word operations in the SRAM are still slower than in the internal CPU RAM, since the internal RAM is organized as 128 words of 16 bits, so writing a word only takes one cycle, while it takes two cycles for the external memory which is connected on the 8-bit data bus.

Memory access

We can use software to turn on additional wait states for memory access using the flag bit at CRU address 1EFE. When active (low), wait states are created on every external memory access, either reading or writing. For CPU RAM accesses (memory locations F000 to F0FB) no wait states are created.

Some things should be considered:

  • For each external memory access, 2 wait state cycles are created. This means that for word operations, 4 wait states are produced.
  • Wait states created by this method do not add on the DRAM wait states. Here, memory operations in SRAM and DRAM work at the same speed.
  • For calculating the total number of cycles spent with one operation, the number of memory accesses for determining the source and destination memory location must be considered. In particular, the address calculation may be quite easy when registers are used, and very complex when the contents of a register must first be read and then added to a value which must be read from the following memory location.

Assumung that the following line and the registers reside in CPU RAM, the instruction

MOV R2,R3

takes three cycles (read MOV (including the values 2 and 3), read value at location of R2, write value to location of R3). This does not change when wait states are active. In contrast,

MOV @SRAMLOC,@SRAMLOC+2

takes 7 cycles without wait states (read MOV, read SRAMLOC, read SRAMLOC+2, read byte from SRAMLOC, read byte from SRAMLOC+1, write byte to SRAMLOC+2, write byte to SRAMLOC+3) and 15 cycles with wait states (add 2 wait states for each of the last four operations).

Video operation

As known from the TI-99/4A, accesses to the Video Display Processor must be properly timed, since the VDP does not keep up with the higher speed of the CPU. When bytes are written in a too high frequency, some of them may be lost; when reading, the value may not reflect the current video RAM contents. Setting the address may also fail when writing too quickly. All this is a consequence of a missing synchronization link between VDP and CPU. The CPU cannot find out whether the video processor is ready for the next byte. (Note that for V9938 commands, a ready flag is available to determine whether some command has finished processing.)

The problem has become worse with the higher performance of the Geneve. This may mean that programs that worked well with the TI may fail to run on the Geneve because of VDP overruns. For this reason, wait states may be inserted for video operations.

However, there is one thing to remember: Wait states can only be inserted into memory accesses.

This means that if we do not use memory accesses, the wait states are not effective. The access to the VDP ports does not count as an external memery access. If we write a program that resides in CPU RAM completely, operating on the VDP, we cannot slow it down with the wait states. Therefore, video access should not be done from the CPU RAM.