Text-to-Speech

From Ninerpedia
Revision as of 20:23, 14 December 2014 by Stephen Shaw (talk | contribs) (enlarge text to speech article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Text to Speech was a disk based program, catalog number PHD5076, that permitted the TI99/4a to read any text in an Extended Basic program. It did not restrict speech to the built in vocabulary of the speech synthesiser. The speech synthesiser peripheral was required and an Extended BASIC module.

Unfortunately to run a program on disk also required the Peripheral Expansion Box, a disk controller, a disk drive and an expansion memory, which was a costly purchase.

The Terminal Emulator II module also permitted text to speech without requiring the expensive peripherals. Like the TE2 module the Text to Speech disk permitted intonations and pitch to be altered.

Extra CALLs (CALL LINKs)

The disk made additional routines available to Extended Basic programmers:

The subroutine package contains the following subroutines accessed with CALL LINK:

  • SETUP which is the Text to Speech—English initialization subroutine;
  • XLAT which is the text-to-allophone translator
  • SPEAK which is the allophone-to-LPC (Linear Predictive Coding) translator.

Briefly, the steps involved in using these subroutines are initializing the system's expansion memory, loading the Text to Speech—English subroutines, then calling and executing the subroutines.

Initialisation

The routines may be loaded into memory for use on their own at the command line, or the initialisation steps can be introduced into a user written Extended Basic program. Because the routines and data require memory the user will have less memory available to store their own program.

To make the routines available they must be loaded into memory from the disk as follows:

CALL INIT :: CALL LOAD("DSK1.SETUP","DSK1.XLAT","DSK1.SPEAK")

Note that this process may take as long as two minutes.

Now the database needs to be loaded: CALL LINK("SETUP","DSK1.DATABASE")

CALL LINK("XLAT",T$,A$)

The XLAT routine creates an allophone string from any given text string.

The general format of a call to XLAT is

    CALL LINK("XLAT", text-string, allophone-string)

In this call, the text-string is a string expression giving the text string to be translated; and allophone-string is a string variable (or string array element) that receives the resulting allophone string.

If the text string to be translated is longer than 128 bytes, an error message STRING TRUNCATED is returned. If the resulting allophone string is longer than 255 bytes, an error message SPEECH STRING TOO LONG is returned.

An example of the XLAT routine call is:

  CALL LINK("XLAT", "PRESS Y FOR YES. PRESS N FOR NO", B$)

In addition to English words the routine recognises special characters. If it finds a character which is not in the alphabet and not a recognised special character the word will be terminated at that point.

The special characters recognized by the XLAT routine can be divided into four functional groups:

  • numerical characters,
  • pause and break characters,
  • inflection symbols, and
  • special symbols.

Numeric Characters

The numerical character group consists of the standard numerical characters 0 through 9, and the symbols: , . - +

The characters 0 through 9 are always pronounced in the usual way. The other symbols, however, are only recognized if they directly precede a numerical character. Then, and only then, it is pronounced.

Pause Characters

This character group consists of the characters:

   .   ,   !   ?   :    ;

Note that the period and comma symbols must be followed by a space unless it is the last character in the string.

These characters generate a pause code, the length of which depends upon the character. The "," comma symbol causes a short break (2 frames), while all of the others cause a long break (9 frames). The duration of one frame is approximately 25 milliseconds.

In addition to generating pause codes, these characters also affect the inflection contour of the sentence if a primary stress-point (vocal emphasis) has been indicated. The "," and "?" characters both specify rising contour for the preceding phrase, while all other codes generate the standard falling contour.

Inflection Characters

As noted above inflection can be influenced by punctuation (or pause characters) but there are also other effective characters.

This group consists of two main symbols, the ";" semi-colon and the "_" underline, plus one other symbol, the " >".

  • The ";" symbol denotes primary stress.
  • The "_" symbol denotes secondary stress.
  • The " > " symbol is the shift indicator and is used to shift the stress within a word or phrase.

When you input a word or phrase, for example, the word IMPORTANT, without any stress-points, it will have the standard falling contour.

If you want the primary stress to be on the second syllable, then type in " ; >IMPORTANT". The stress is shifted to the second syllable.

The primary stress-point triggers a rising or falling pitch that starts at the inflection symbol and continues to the end of the phrase.

The pause and break symbols within the phrase determine whether the pitch rises or falls. The "?" and "," make the pitch rise and all other symbols make the pitch fall.

The following examples illustrate the use of the primary stress-point and the effect that the pause and break characters have on those stress-points.

    THE SUN RISES IN THE ; WEST?
    THE SUN ; SETS IN THE WEST.

If a secondary stress-point is used, it forces the specific syllable to be pronounced at a higher pitch than the other syllables within the word or phrase.

If several secondary stress-points are used, then the first one starts at a higher pitch and all subsequent secondary stress-points step down to a lower pitch but still remain higher than the pitch level of the syllables that have no-stress.

Normally, if you use secondary stress-points within a phrase, you should use a primary stress-point also. If a primary stress-point is not used, then all syllables after the last secondary stress-point will take on a flat monotone level.

Only one primary stress-point can be used in a phrase If you use more than one primary stress-point, then all other primary stress-points after the first one will be changed to secondary stress-points.

Below are more examples of the usage of the inflection symbols.

    ;PRACTICAL
        Primary stress-point on the first syllable
    ;>OBSCURE
        Primary stress-point on the second syllable
    ;>>GRAVITATION
        Primary stress-point on third syllable
    _ONE _TIME ; ONLY?
        Secondary stress-points and primary stress-point within a phrase with a rising contour.


Special Symbols

This group consists of the symbols:

   @   $   %   &   *   (   )  =    /

These symbols are pronounced as follows:

   Symbol      Pronounced as
   @           at
   $           dollar
   %           percent
   &           and
   *           asterisk
   (           open
   )           close
   =           equals
   /           slash

CALL LINK("SPEAK",A$,P,S)

After an allophone string has been created with the XLAT routine, the selected allophones can be spoken by the SPEAK routine. The general format to call the SPEAK routine is

    CALL LINK("SPEAK", allophone-string, pitch, slope)

In this call, allophone-string is a string expression yielding the required allophone string, pitch is a numeric expression giving the pitch indicator; and slope is a numeric expression giving the slope value.

PITCH

The normal pitch value is 43.

The pitch expression should yield a value in the range of 0 through 63.

SLOPE

The slope expression is limited to a value of 0 through 255, although it actually indicates 32 times the slope increment used by the contouring algorithms. In general, the slope should be selected so that it is approximately 10% of the pitch value. For pitch equal to 40, the slope should be approximately 4 times 32, or 128.

The multiplication factor of 32 has been selected to allow for fractional increments in integer-type variables.

Note that the only restriction to the slope parameter is that it not exceed these limits.

    slope < (pitch - 1) * 16
    slope < (63 - pitch) * 16

The reason for this is that if the selected slope is selected too large, inflection parameters cause the pitch parameter to cross the range of 0 to 63, either on the high or low side. The current allophone stringer automatically corrects the slope parameter to avoid that kind of a situation.

If the selected pitch level is selected too high (greater than 63), the pitch level is automatically limited to level 63. That means that any level above level 63 is treated as if level 63 had been specified.

Standard allophone codes for TI Text to Speech—English.

Frequency tables for TI Text to Speech—English which indicate the pitch-interval to frequency relationship.

Special inflection codes for TI Text to Speech—English. In general, all codes above 240 are reserved for special function codes.

Speech contouring algorithms for TI Text to Speech—English

SAMPLE PROGRAM

The following program called Phrase is an example of a program that can be created using Text to Speech—English.

   100   CALL INIT
   110   CALL LOAD("DSK1.SPEAK","DSK1.XLAT","DSK1.SETUP")
   120   CALL LINK("SETUP","DSK1.DATABASE")
   130   LINPUT "PHRASE- ":A$
   140   IF A$="" THEN 160
   150   CALL LINK("XLAT",A$,B$)
   160   CALL LINK("SPEAK",B$,43,128)
   170   GOTO 130

The program lets you simply type a word, phrase, or sentence on the computer.

The input string can be 128 characters long, but if that string is translated into a string longer than 255 you will get an error SPEECH STRING TOO LONG which is generated by the XLAT routine.

If the input string exceeds 128 characters, you will receive an error STRING TRUNCATED.

When you press the ENTER key, the computer speaks the words you entered. This lets you have your computer speak a wide range of English language words. And it allows you to alter the spelling of a word to make it sound the way you want it to sound.

As you listen to the computer speak, you may decide that a word is not spoken the way you want, or you may decide that you want a longer pause between some words. You can change the pronunciation of words in two ways. First, you can misspell the word. For example, the computer pronounces the word break breek. Therefore, change the spelling of break to brake and the computer pronounces the word properly.

Or you can spell great as grreat, and the computer "stretches out" the r sound. Misspelling the word silently as sssilently causes the computer to pronounce the word with a longer s sound. You can experiment with other misspellings to create the pronunciation you want. Also, to achieve the pause duration you want, experiment with pause and break characters.