An Exception-Word EPROM Generator for the CTS256AL2

Background

The CTS256 is a companion chip to its better known sibling, the SP0256 allophone to speech chip. The CTS256 takes ASCII text and converts it to allophones for the SP0256; text to speech in 2 ICs.

This is 1980s tech that we're talking about, so it has its limitations. The CTS256 does a pretty good job at converting words to allophones, but every so often it goes off into the weeds. Simple words like "fine", "purpose" or "minutes" come out as "feen", "purples" or "minuets".

This is where the exception EPROM comes in. If present, it's searched before the CTS256's text to allophone rules are invoked. If the word is found in the EPROM, the associated allophones are used directly.

The problem is in the documentation. If you bought a CTS256 from Radio Shack back in the day, the datasheet that came with it mentioned the existence of the exception EPROM, and invited you to write to them for further (read: any) details. The 1988 Archer Semiconductor Reference Guide was better: it included a copy of a General Instrument application note that detailed, among other things, the use of the exception EPROM.

But the Archer copy of the application note was full of errors (I think it was OCR'd - badly - from the original). And the only copy of the original application note that I could track down on the web was a PDF of a photocopy of a dot matrix printed original. 8s and Bs, and Gs and 6s were very problematic. And since the exception EPROM contained a couple of large chunks of TMS7000 machine code, it really mattered whether that character was an 8 or a B!

I ended up hand disassembling the machine code to try and answer all the "which character is that?" questions. At the end of that there were still a couple that could have gone either way (addresses of function calls in the CTS256s ROM).

Long story short, I burned a 2716 with my best guess at things and, after moving the address select signal from /CS to /OE to change a 350nS 2716 into a 120nS 2716, it actually worked. Mirabile dictu!

The Exception-Word Generator is born

Now that I was fairly sure that I had the kinks out of the original example, it was time to start fixing some of those other words that I'd noticed the CTS256 getting wrong.

I first tried creating an assembler file full of DB statements. I'm sure that would have eventually worked, but it was just too painful.

What I really wanted was a program that would read a text file containing a list of words and their associated allophones, and generate a hex file that I could burn into an EPROM. Working off and on, I had a program that produced a working exception EPROM after just a day. It wasn't pretty, but it worked.

The Exception-Word list file

The format of the input file to the Xception program is as follows:

   BASE n ; where n is a hex digit between 1 and E, representing the
          ; base 4K address of the generated EPROM
   <[WORD]<=[allophone...]
   <[WORD]<=[allophone...]
        .
        .
        .
   <[WORD]<=[allophone...]
   <[SYMBOL or DIGIT]<=[allophone...]
   <[SYMBOL or DIGIT]<=[allophone...]
        .
        .
        .
   <[SYMBOL or DIGIT]<=[allophone...]

The format of the word/allophone definitions is the same as what's shown in the GI application note. The words should be ordered alphabetically (A-Z) but only by the first letter; within each letter group, the ordering of the words isn't important.

Words that are spoken differently depending on their usage, like wind as a verb (wind the clock) or a noun (the wind is blowing) can be differentiated by appending (V) or (N) to the word definition (<[WIND(N)]< for example).

Exception-Word Encoding Scheme

To store a unique word or symbol and its corresponding allophone address string in an efficient and flexible manner, the following encoding format was derived:

   <[encoded word or symbol]< = [encoded allophone address(es)]

   where < equals 13H
         [ equals 40H
         ] equals 80H

The first and last bytes is 13H. This informs the code-to-speech algorithm that the word or symbol is not a prefix or suffix.

If the word or symbol is an individual letter, then the representation of it between the brackes is an FFH; this includes the value of the left and right brackets. If it is a number or punctuation, then it is represented by its value from TABLE-1 plus the value of the left and right brackets.

Otherwise.

The first letter in the word or symbol is always to be ignored; this does not apply to numbers or punctuation.
The next letter in the word is represented by the value of the letter from TABLE-1, plus the value of the left bracket "[" which is 40H.
The following letter(s), if and only if it is not the last letter in the word or symbol, is represented solely by its value from TABLE-1.
The last letter in the word or symbol is represented by the value of the letter from TABLE-1, plus the value of the right bracket "]" which is 80H.

The allophone address string is encoded in a similar manner:

If only one allophone is used for the pronounciation, it is represented by its value from TABLE-2, plus the value of the left "[" and right "]" brackets which are 40H and 80H respectively.

Otherwise:

The first allophone is represented by its value from TABLE-2, plus the value of the left bracket "[" which is 40H.
The following allophone(s), if and only if it is not the last allophone in the string, is represented by its value from TABLE-2.

The last allophone is represented by its value from TABLE-2 plus the value of the right bracket "]" which is 80H.

Example: To encode "Au" to pronounce as "GOLD"
<[Au]< = [GG2 OW LL DD1]
13, F5, 13, 7D, 35, 2D, 95 <--This line is ready to store in
    ^                         EXCEPTION-WORD EPROM under the
    |                         "A" category. (The encoded string
    |                         is shown in Hex notation.)
    |
    +--Remember, throw away the first letter (in this case an
       "A"), then find the value of the next letter in TABLE-1
       and add 40H plus 80H to it so as to represent the left
       "[" and right "]" brackets.

For words, the leading "<" (which marks the start of a word) is mandatory. The trailing "<" (which marks the end of the word) is optional, and if it's left off it marks the word as a prefix form. This allows constructs such as:

   <[CAP]AB=[KK1 EY PP] ; CAPABILITY, CAPABLE

Without this, "capable" would be pronounced "cap-able", whereas it ought to be "cape-able".

For symbols, both the leading and trailing "<"s are optional. This allows symbols to occur in the middle of a word (e.g. "up&down" would become "up and down" with the example exception list).

TABLE-1

LETTER	ENCODED VALUE (hex)	NUMBER	ENCODED VALUE (hex)	SYMBOL	ENCODED VALUE (hex)
A	21	0	10	space	00
B	22	1	11	!	01
C	23	2	12	"	02
D	24	3	13	#	03
E	25	4	14	$	04
F	26	5	15	%	05
G	27	6	16	&	06
H	28	7	17	'	07
I	29	8	18	(	08
J	2A	9	19	)	09
K	2B			*	0A
L	2C			+	0B
M	2D			'	0C
N	2E			-	0D
O	2F			.	0E
P	30			/	0F
Q	31			:	1A
R	32			;	1B
S	33			<	1C
T	34			=	1D
U	35			>	1E
V	36			?	1F
W	37			@	20
X	38			[	3B
Y	39			\	3C
Z	3A			]	3D
				^	3E
				_	3F
				`	40
				{	5B
				\|	5C
				}	5D
				~	5E

TABLE-2

ENCODED VALUE	ALLOPHONE	SAMPLE WORD	DURATION(ms)
00	PA1	PAUSE	10
01	PA2	PAUSE	30
02	PA3	PAUSE	50
03	PA4	PAUSE	100
04	PA5	PAUSE	200
05	OY	bOY	290
06	AY	skY	170
07	EH	End	50
08	KK3	Coab	80
09	PP	Pow	150
0A	JH	dodGe	400
0B	NN1	thiN	170
0C	IH	sIt	50
0D	TT2	To	100
0E	RR1	Rural	130
0F	AX	sUcceed	50
10	MM	Milk	180
11	TT1	parT	80
12	DH1	THey	140
13	IY	sEE	170
14	EY	bEIge	200
15	DD1	coulD	50
16	UW1	tO	60
17	AO	OUght	70
18	AA	hOt	60
19	YY2	Yes	130
1A	AE	hAt	80
1B	HH1	He	90
1C	BB1	Business	40
1D	TH	Thin	130
1E	UH	bOOk	70
1F	UW2	fOOd	170
20	AW	OUt	250
21	DD2	Do	80
22	GG3	wiG	120
23	VV	Vest	130
24	GG1	Guest	80
25	SH	SHip	120
26	ZH	aZUre	130
27	RR2	bRain	80
28	FF	Food	110
29	KK2	sKy	140
2A	KK1	Can't	120
2B	ZZ	Zoo	150
2C	NG	aNchor	200
2D	LL	Lake	80
2E	WW	Wool	140
2F	XR	repaIR	250
30	WH	WHig	150
31	YY1	Yes	90
32	CH	CHurch	150
33	ER1	fIR	110
34	ER2	fIR	210
35	OW	bEAU	170
36	DH2	THey	180
37	SS	veSt	60
38	NN2	No	140
39	HH2	Hoe	130
3A	OR	stORe	240
3B	AR	alARm	200
3C	YR	cleAR	250
3D	GG2	Got	80
3E	EL	saddLE	140
3F	BB2	Business	60

Using the Exception-Word Generator

The Exception-Word Generator is written in fairly standard C. Any reasonably modern C compiler ought to be able to compile the source.

   xception <input text file >output hex file

The output hex file is sized for a 4K EPROM, as that's what was specified in the original application note. However, careful reading of the note reveals that the exception code can spread over multiple 4K blocks, up to a maximum of 48K using the memory map in the application note.

A demonstration Arduino sketch is included that shows the operation of the EPROM. The computer Joshua's lines from the 1983 film WarGames. Several words from Joshua's lines (e.g. file, minutes, island) are not rendered properly by the CTS256. When the exception-word EPROM is included in the circuit, they are spoken correctly (without mangling the spelling of the text).

References

GI CTS256A-AL2 Code to Speech Chipset AN-0505D
Archer 1988 Semiconductor Reference Guide: CTS256AL2 Code-to-Speech chip
Archer CTS256A-AL2 Technical Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

An Exception-Word EPROM Generator for the CTS256AL2

Background

The Exception-Word Generator is born

The Exception-Word list file

Exception-Word Encoding Scheme

Using the Exception-Word Generator

References

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

An Exception-Word EPROM Generator for the CTS256AL2

Background

The Exception-Word Generator is born

The Exception-Word list file

Exception-Word Encoding Scheme

Using the Exception-Word Generator

References