-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathrationale.tex
1625 lines (1278 loc) · 64.4 KB
/
rationale.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% !TeX root = forth.tex
% !TeX spellcheck = en_US
% !TeX program = pdflatex
\annex{Rationale} % A. (informative annex)}}}
\label{annex:rationale}
\setwordlist{core}
\ifinline\else
\namespace{rat}
\defersection{}
\fi
\newcommand{\readrationale}[1]{%
\ifinline
\begin{editor}
In the \emph{review} (r) version of the document the
rationale text for each of the words is given in with
the main definition of the word. The rationale for
words in the \textbf{#1} word set will appear here in
the final document.
\end{editor}
\else
\defersection{}
\input{r-#1.sub}
\stepsection
\fi
}
\section{Introduction} % A.1
\subsection{Purpose} % A.1.1
\subsection{Scope} % A.1.2
\label{rat:scope}
When judging relative merits of proposed changes to the standard, the
members of the committee were guided by the following goals (listed
in alphabetic order):
\begin{tabular}{lp{0.75\textwidth}}
Consistency &
The standard provides a functionally complete set of words with
minimal functional overlap.
\\[\parskip]
Cost of compliance &
This goal includes such issues as common practice, how much
existing code would be broken by the proposed change, and the
amount of effort required to bring existing applications and
systems into conformity with the standard.
\\[\parskip]
Efficiency &
Execution speed, memory compactness.
\\[\parskip]
Portability &
Words chosen for inclusion should be free of system-dependent
features.
\\[\parskip]
Readability &
Forth definition names should clearly delineate their behavior.
That behavior should have an apparent simplicity which supports
rapid understanding. Forth should be easily taught and support
readily maintained code.
\\[\parskip]
Utility &
Be judged to have sufficiently essential functionality and
frequency of use to be deemed suitable for inclusion.
\end{tabular}
\section{Terms and notation} % A.2
\subsection{Definitions of terms} % A.2.1
\begin{description}
\item[aligned]~
Data can only be loaded from and stored to addresses that are aligned
according to the alignment requirements of the accessed type. Field
offsets that are added to structure addresses also need to be aligned.
\item[ambiguous condition] ~
The response of a Standard System to an ambiguous condition is left
to the discretion of the implementor. A Standard System need not
explicitly detect or report the occurrence of ambiguous conditions.
\item[cross compiler] ~
Cross compilers may be used to prepare a program for execution in an
embedded system, or may be used to generate Forth kernels either for
the same or a different run-time environment.
\item[data field] ~
In earlier standards, data fields were known as ``parameter fields''.
On subroutine threaded Forth systems, everything is object code.
There are no traditional code or data fields. Only a word defined by
\word{CREATE} or by a word that calls \word{CREATE} has a data field.
Only a data field defined via \word{CREATE} can be manipulated portably.
\item[word set] ~
This standard recognizes that some functions, while useful in certain
application areas, are not sufficiently general to justify requiring
them in all Forth systems. Further, it is helpful to group Forth
words according to related functions. These issues are dealt with
using the concept of word sets.
The ``Core'' word set contains the essential body of words in a Forth
system. It is the only ``required'' word set. Other word sets defined
in this standard are optional additions to make it possible to
provide Standard Systems with tailored levels of functionality.
\end{description}
\subsection{Notation} % A.2.2
\addtocounter{subsubsection}{1}
\subsubsection{Stack notation} % A.2.2.2
The use of \emph{-sys}, \emph{orig}, and \emph{dest} data types in
stack effect diagrams conveys two pieces of information. First, it
warns the reader that many implementations use the data stack in
unspecified ways for those purposes, so that items underneath on
either the control-flow or data stacks are unavailable. Second, in
cases where \emph{orig} and \emph{dest} are used, explicit pairing
rules are documented on the assumption that all systems will
implement that model so that its results are equivalent to employment
of some stack, and that in fact many implementations do use the data
stack for this purpose. However, nothing in this standard requires
that implementations actually employ the data stack (or any other)
for this purpose so long as the implied behavior of the model is
maintained.
\section{Usage requirements} % A.3
Forth systems are unusually simple to develop, in comparison with
compilers for more conventional languages such as C. In addition to
Forth systems supported by vendors, public-domain implementations and
implementation guides have been widely available for nearly twenty
years, and a large number of individuals have developed their own
Forth systems. As a result, a variety of implementation approaches
have developed, each optimized for a particular platform or target
market.
The committee has endeavored to accommodate this diversity by
constraining implementors as little as possible, consistent with a
goal of defining a standard interface between an underlying Forth
System and an application program being developed on it.
Similarly, we will not undertake in this section to tell you how to
implement a Forth System, but rather will provide some guidance as
to what the minimum requirements are for systems that can properly
claim compliance with this standard.
\subsection{Data types} % A.3.1
\label{rat:types}
Most computers deal with arbitrary bit patterns. There is no way to
determine by inspection whether a cell contains an address or an
unsigned integer. The only meaning a datum possesses is the meaning
assigned by an application.
When data are operated upon, the meaning of the result depends on
the meaning assigned to the input values. Some combinations of input
values produce meaningless results: for instance, what meaning can
be assigned to the arithmetic sum of the ASCII representation of the
character ``A'' and a TRUE flag? The answer may be ``no meaning'';
or alternatively, that operation might be the first step in
producing a checksum. Context is the determiner.
The discipline of circumscribing meaning which a program may assign
to various combinations of bit patterns is sometimes called
\emph{data typing}. Many computer languages impose explicit data
typing and have compilers that prevent ill-defined operations.
Forth rarely explicitly imposes data-type restrictions. Still, data
types implicitly do exist, and discipline is required, particularly
if portability of programs is a goal. In Forth, it is incumbent upon
the programmer (rather than the compiler) to determine that data are
accurately typed.
This section attempts to offer guidance regarding \emph{de facto}
data typing in Forth.
\setcounter{subsubsection}{1}
\subsubsection{Character types} % A.3.1.2
\label{rat:char}
The correct identification and proper manipulation of the character
data type is beyond the purview of Forth's enforcement of data type
by means of stack depth. Characters do not necessarily occupy the
entire width of their single stack entry with meaningful data. While
the distinction between signed and unsigned character is entirely
absent from the formal specification of Forth, the tendency in
practice is to treat characters as short positive integers when
mathematical operations come into play.
\begin{enumerate}
\item \textbf{Standard Character Set}
\begin{enumerate}
\item The storage unit for the character data type
(\word{C@}, \word{C!}, \word{FILL}, etc.)
must be able to contain unsigned numbers from 0 through 255.
\item An implementation is not required to restrict character
storage to that range, but a Standard Program without
environmental dependencies cannot assume the ability to
store numbers outside that range in a ``char'' location.
\item Since a ``\emph{char}'' can store small positive numbers
and since the character data type is a sub-range of the
unsigned integer data type, \word{C!} must store the \param{n}
least-significant bits of a cell ($8 <= n <=$ bits/cell).
Given the enumeration of allowed number representations and
their known encodings, ``\word{TRUE} \texttt{xx} \word{C!}
\texttt{xx} \word{C@}'' must leave a stack item with some
number of bits set, which will thus will be accepted as
non-zero by \word{IF}.
\item For the purposes of input (\word{KEY}, \word{ACCEPT}, etc.)
and output (\word{EMIT}, \word{TYPE}, etc.), the encoding
between numbers and human-readable symbols is ISO646/IRV
(ASCII) within the range from 32 to 126 (space to \tilde).
Outside that range, it is up to the implementation. The
obvious implementation choice is to use ASCII control
characters for the range from 0 to 31, at least for the
``displayable'' characters in that range (TAB, RETURN, LINEFEED,
FORMFEED). However, this is not as clear-cut as it may seem,
because of the variation between operating systems on the
treatment of those characters. For example, some systems TAB
to 4 character boundaries, others to 8 character boundaries,
and others to preset tab stops. Some systems perform an automatic
linefeed after a carriage return, others perform an automatic
carriage return after a linefeed, and others do neither.
The codes from 128 to 255 may eventually be standardized,
either formally or informally, for use as international
characters, such as the letters with diacritical marks found
in many European languages. One such encoding is the 8-bit
ISO Latin-1 character set. The computer marketplace at large
will eventually decide which encoding set of those characters
prevails. For Forth implementations running under an
operating system (the majority of those running on standard
platforms these days), most Forth implementors will probably
choose to do whatever the system does, without performing any
remapping within the domain of the Forth system itself.
\item A Standard Program can depend on the ability to receive
any character in the range 32 {\ldots} 126 through \word{KEY},
and similarly to display the same set of characters with
\word{EMIT}. If a program must be able to receive or display
any particular character outside that range, it can declare
an environmental dependency on the ability to receive or
display that character.
\item A Standard Program cannot use control characters in
definition names. However, a Standard System is not required
to enforce this prohibition. Thus, existing systems that
currently allow control characters in words names from
\word[block]{BLOCK} source may continue to allow them, and
programs running on those systems will continue to work. In
text file source, the parsing action with space as a
delimiter (e.g., \word{BL} \word{WORD}) treats control
characters the same as spaces. This effectively implies that
you cannot use control characters in definition names from
text-file source, since the text interpreter will treat the
control characters as delimiters. Note that this
``control-character folding'' applies only when space is the
delimiter, thus the phrase ``\word{CHAR} \texttt{)} \word{WORD}''
may collect a string containing control characters.
\end{enumerate}
\item \textbf{Storage and retrieval}
Characters are transferred from the data stack to memory by
\word{C!} and from memory to the data stack by \word{C@}. A
number of lower-significance bits equivalent to the
implementation-dependent width of a \emph{character} are
transferred from a popped data stack entry to an address by the
action of \word{C!} without affecting any bits which may comprise
the higher-significance portion of the cell at the destination
address; however, the action of \word{C@} clears all
higher-significance bits of the data stack entry which it pushes
that are beyond the implementation-dependent width of a character
(which may include implementation-defined display information in
the higher-significance bits). The programmer should keep in mind
that operating upon arbitrary stack entries with words intended
for the character data type may result in truncation of such data.
\item \textbf{Manipulation on the stack}
In addition to \word{C@} and \word{C!}, characters are moved to,
from and upon the data stack by the following words:
\begin{quote}\ttfamily
\word{toR} \word{qDUP} \word{DROP} \word{DUP}
\word{OVER} \word{PICK} \word{Rfrom} \word{R@}
\word{ROLL} \word{ROT} \word{SWAP}
\end{quote}
\item \textbf{Additional operations}
The following mathematical operators are valid for character data:
\begin{quote}\ttfamily
\word{+} \word{-} \word{*} \word{/}
\word{/MOD} \word{MOD}
\end{quote}
The following comparison and bitwise operators may be valid for
characters, keeping in mind that display information cached in
the most significant bits of characters in an implementation-defined
fashion may have to be masked or otherwise dealt with:
\begin{quote}\ttfamily
\word{AND} \word{OR} \word{more} \word{less}
\word{Umore} \word{Uless} \word{=} \word{ne}
\word{0=} \word{0ne} \word{MAX} \word{MIN}
\word{LSHIFT} \word{RSHIFT}
\end{quote}
\end{enumerate}
\pagebreak
\subsubsection{Single-cell types} % A.3.1.3
A single-cell stack entry viewed without regard to typing is the
fundamental data type of Forth. All other data types are actually
represented by one or more single-cell stack entries.
\begin{enumerate}
\item \textbf{Storage and retrieval}
Single-cell data are transferred from the stack to memory by
\word{!}; from memory to the stack by \word{@}. All bits are
transferred in both directions and no type checking of any sort
is performed, nor does the Standard System check that a memory
address used by \word{!} or \word{@} is properly aligned or
properly sized to hold the datum thus transferred.
\item \textbf{Manipulation on the stack}
Here is a selection of the most important words which move
single-cell data to, from and upon the data stack:
\begin{quote}\ttfamily
\word{!} \word{@} \word{toR} \word{qDUP}
\word{DROP} \word{DUP} \word{OVER} \word{PICK}
\word{Rfrom} \word{R@} \word{ROLL} \word{ROT}
\word{SWAP}
\end{quote}
\item \textbf{Comparison operators}
The following comparison operators are universally valid for one
or more single cells:
\begin{quote}\ttfamily
\word{=} \word{ne} \word{0=} \word{0ne}
\end{quote}
\end{enumerate}
\paragraph{Flags} ~ % A.3.1.3.1
A \word{FALSE} flag is a single-cell datum with all bits unset, and
a \word{TRUE} flag is a single-cell datum with all bits set. While
Forth words which test flags accept any non-null bit pattern as true,
there exists the concept of the \emph{well-formed flag}. If an
operation whose result is to be used as a flag may produce any
bit-mask other than \word{TRUE} or \word{FALSE}, the recommended
discipline is to convert the result to a well-formed flag by means
of the Forth word \word{0ne} so that the result of any subsequent
logical operations on the flag will be predictable.
In addition to the words which move, fetch and store single-cell
items, the following words are valid for operations on one or more
flag data residing on the data stack:
\begin{quote}\ttfamily
\word{AND} \word{OR} \word{XOR} \word{INVERT}
\end{quote}
\paragraph{Integers} ~ % A.3.1.3.2
A single-cell datum may be treated by a Standard Program as a signed
integer. Moving and storing such data is performed as for any
single-cell data. In addition to the universally-applicable operators
for single-cell data specified above, the following mathematical and
comparison operators are valid for single-cell signed integers:
\begin{quote}\ttfamily
\word{*} \word{*/} \word{*/MOD} \word{/MOD}
\word{MOD} \word{+} \word{+!} \word{-}
\word{/} \word{1+} \word{1-} \word{ABS}
\word{MAX} \word{MIN} \word{NEGATE} \word{0less}
\word{0more} \word{less} \word{more}
\end{quote}
Given the same number of bits, unsigned integers usually represent
twice the number of absolute values representable by signed integers.
A single-cell datum may be treated by a Standard Program as an
unsigned integer. Moving and storing such data is performed as for
any single-cell data. In addition, the following mathematical and
comparison operators are valid for single-cell unsigned integers:
\begin{quote}\ttfamily
\word{UM*} \word{UM/MOD} \word{+} \word{+!}
\word{-} \word{1+} \word{1-} \word{*}
\word{Uless} \word{Umore}
\end{quote}
\paragraph{Addresses} ~ % A.3.1.3.3
An address is uniquely represented as a single cell unsigned number
and can be treated as such when being moved to, from, or upon the
stack. Conversely, each unsigned number represents a unique address
(which is not necessarily an address of accessible memory). This
one-to-one relationship between addresses and unsigned numbers forces
an equivalence between address arithmetic and the corresponding
operations on unsigned numbers.
Several operators are provided specifically for address arithmetic:
\begin{quote}\ttfamily
\word{CHAR+} \word{CHARS}
\word{CELL+} \word{CELLS}
\end{quote}
and, if the floating-point word set is present:
\begin{quote}\ttfamily
\word[floating]{FLOAT+} \word[floating]{FLOATS}
\word[floating]{SFLOAT+} \word[floating]{SFLOATS}
\word[floating]{DFLOAT+} \word[floating]{DFLOATS}
\end{quote}
A Standard Program may never assume a particular correspondence
between a Forth address and the physical address to which it is
mapped.
\paragraph{Counted strings} ~ % A.3.1.3.4
\label{rat:cstring}
Forth 94 moved toward the consistent use of the ``\param{c-addr u}''
representation of strings on the stack. The use of the alternate
``address of counted string'' stack representation is discouraged.
The traditional Forth words \word{WORD} and \word{FIND} continue
to use the ``address of counted string'' representation for historical
reasons. The new word \word{Cq}, added as a porting aid for existing
programs, also uses the counted string representation.
Counted strings remain useful as a way to store strings in memory.
This use is not discouraged, but when references to such strings
appear on the stack, it is preferable to use the ``\param{c-addr u}''
representation.
\paragraph{Execution tokens} ~ % A.3.1.3.5
The association between an execution token and a definition is static.
Once made, it does not change with changes in the search order or
anything else. However it may not be unique, e.g., the phrases
\begin{quote}\ttfamily
\word{'} \word{1+} and \\
\word{'} \word{CHAR+}
\end{quote}
might return the same value.
\paragraph{Error results} ~ % A.3.1.3.6
\label{rat:ior}
The term \param{ior} was originally defined to describe the result of
an input/output operation. This was extended to include other
operations.
\subsubsection{Cell-pair types} % A.3.1.4
\begin{enumerate}
\item \textbf{Storage and retrieval}
Two operators are provided to fetch and store cell pairs:
\begin{quote}\ttfamily
\word{2@} \word{2!}
\end{quote}
\item \textbf{Manipulation on the stack}
Additionally, these operators may be used to move cell pairs
from, to and upon the stack:
\begin{quote}\ttfamily
\word{2toR} \word{2DROP} \word{2DUP} \word{2OVER}
\word{2Rfrom} \word{2SWAP} \word[double]{2ROT}
\end{quote}
\item \textbf{Comparison}
The following comparison operations are universally valid for
cell pairs:
\begin{quote}\ttfamily
\word[double]{D=} \word[double]{D0=}
\end{quote}
\end{enumerate}
\paragraph{Double-Cell Integers} ~ % A.3.1.4.1
If a double-cell integer is to be treated as signed, the following
comparison and mathematical operations are valid:
\begin{quote}\ttfamily
\word[double]{D+} \word[double]{D-} \word[double]{Dless}
\word[double]{D0less} \word[double]{DABS} \word[double]{DMAX}
\word[double]{DMIN} \word[double]{DNEGATE}
\word[double]{M*/} \word[double]{M+}
\end{quote}
If a double-cell integer is to be treated as unsigned, the following
comparison and mathematical operations are valid:
\begin{quote}\ttfamily
\word[double]{D+} \word[double]{D-}
\word{UM/MOD} \word[double]{DUless}
\end{quote}
\paragraph{Character strings} ~ % A.3.1.4.2
See: \xref[A.3.1.3.4 Counted Strings]{rat:cstring}.
\subsection{The Implementation environment} % A.3.2
\subsubsection{Numbers} % A.3.2.1
\label{rat:env:num}
Traditionally, Forth has been implemented on two's-complement machines
where there is a one-to-one mapping of signed numbers to unsigned
numbers --- any single cell item can be viewed either as a signed or
unsigned number. Indeed, the signed representation of any positive
number is identical to the equivalent unsigned representation. Further,
addresses are treated as unsigned numbers: there is no distinct pointer
type. Arithmetic ordering on two's complement machines allows \word{+}
and \word{-} to work on both signed and unsigned numbers. This
arithmetic behavior is deeply embedded in common Forth practice.
As a consequence of these behaviors, the range of
signed numbers is $-n-1$ to $n$ and for unsigned numbers is $0$ to
$2n+1$, where $n$ is the largest positive signed number.
Signed numbers in the $0$ to $n$ range are bitwise identical to the
corresponding unsigned number.
\setcounter{paragraph}{1}
\paragraph{Digit conversion} ~ % A.3.2.1.2
For example, an implementation might convert the characters ``a''
through ``z'' identically to the characters ``A'' through ``Z'', or
it might treat the characters `` [ '' through ``\~{}'' as additional
digits with decimal values 36 through 71, respectively.
\subsubsection{Arithmetic} % A.3.2.2
\paragraph{Integer division} ~ % A.3.2.2.1
The Forth-79 Standard specifies that the signed division operators
(\word{/}, \word{/MOD}, \word{MOD}, \word{*/MOD}, and \word{*/})
round non-integer quotients towards zero (symmetric division).
Forth 83 changed the semantics of these operators to round towards
negative infinity (floored division). Some in the Forth community
have declined to convert systems and applications from the Forth-79
to the Forth-83 divide. To resolve this issue, a Forth-\snapshot{} system
is permitted to supply either floored or symmetric operators. In
addition, a standard system must provide a floored division primitive
(\word{FM/MOD}), a symmetric division primitive (\word{SM/REM}), and
a mixed precision multiplication operator (\word{M*}).
This compromise protects the investment made in current Forth
applications; Forth-79 and Forth-83 programs are automatically
compliant with Forth 94 with respect to division. In practice, the
rounding direction rarely matters to applications. However, if a
program requires a specific rounding direction, it can use the
floored division primitive \word{FM/MOD} or the symmetric division
primitive \word{SM/REM} to construct a division operator of the
desired flavor. This simple technique can be used to convert Forth-79
and Forth-83 programs to Forth 94 without any analysis of the
original programs.
\subsubsection{Stacks} % A.3.2.3
The only data type in Forth which has concrete rather than abstract
existence is the stack entry. Even this primitive typing Forth only
enforces by the hard reality of stack underflow or overflow. The
programmer must have a clear idea of the number of stack entries to
be consumed by the execution of a word and the number of entries that
will be pushed back to a stack by the execution of a word. The
observation of anomalous occurrences on the data stack is the first
line of defense whereby the programmer may recognize errors in an
application program. It is also worth remembering that multiple stack
errors caused by erroneous application code are frequently of equal
and opposite magnitude, causing complementary (and deceptive) results.
For these reasons and a host of other reasons, the one unambiguous,
uncontroversial, and indispensable programming discipline observed
since the earliest days of Forth is that of providing a stack diagram
for all additions to the application dictionary with the exception of
static constructs such as \word{VARIABLE}s and \word{CONSTANT}s.
\setcounter{paragraph}{1}
\paragraph{Control-flow stack} % A.3.2.3.2
The simplest use of control-flow words is to implement the basic
control structures shown in figure~\textbf{\ref{fig:basic}}.
\begin{figure}[ht]
\begin{center}
\fbox{\includegraphics[bb=0 0 658 202,width=0.8\textwidth]{basic.png}}
\caption{The basic control-flow patterns}
\label{fig:basic}
\end{center}
\end{figure}
In control flow every branch, or transfer of control, must terminate
at some destination. A natural implementation uses a stack to
remember the origin of forward branches and the destination of
backward branches. At a minimum, only the location of each origin or
destination must be indicated, although other implementation-dependent
information also may be maintained.
An origin is the location of the branch itself. A destination is
where control would continue if the branch were taken. A destination
is needed to resolve the branch address for each origin, and conversely,
if every control-flow path is completed no unused destinations can
remain.
With the addition of just three words (\word[tools]{AHEAD},
\word[tools]{CS-ROLL} and \word[tools]{CS-PICK}), the basic control-flow
words supply the primitives necessary to compile a variety of transportable
control structures. The abilities required are compilation of forward
and backward conditional and unconditional branches and compile-time
management of branch origins and destinations.
Table~\textbf{\ref{table:control}} shows the desired behavior.
\begin{table}[ht]
\begin{center}
\caption{Compilation behavior of control-flow words}
\label{table:control}
\begin{tabular}{lccl}
\hline\hline
\multicolumn{4}{l}{at compile-time,} \\
word: & supplies: & resolves: & is used to: \\ \hline
\word{IF} & \emph{orig} & & mark origin of forward conditional branch \\
\word{THEN} & & \emph{orig} & resolve \word{IF} or \word[tools]{AHEAD} \\
\word{BEGIN} & \emph{dest} & & mark backward destination \\
\word{AGAIN} & & \emph{dest} & resolve with backward unconditional branch \\
\word{UNTIL} & & \emph{dest} & resolve with backward conditional branch \\
\word[tools]{AHEAD} & \emph{orig} & & mark origin of forward unconditional branch \\
\word[tools]{CS-PICK} & & & copy item on control-flow stack \\
\word[tools]{CS-ROLL} & & & reorder items on control-flow stack \\
\hline\hline
\end{tabular}
\end{center}
\end{table}
The requirement that control-flow words are properly balanced by other
control-flow words makes reasonable the description of a compile-time
implementation-defined \emph{control-flow stack}. There is no
prescription as to how the control-flow stack is implemented, e.g.,
data stack, linked list, special array. Each element of the
control-flow stack mentioned above is the same size.
With these tools, the remaining basic control-structure elements,
shown in figure~\textbf{\ref{fig:additional}}, can be defined. The
stack notation used here for immediate words is ( \emph{compilation
/ execution} ).
\begin{quote}\ttfamily
\begin{tabbing}
\tab \= \hspace{10em} \= \kill
\+ \word{:} \word{WHILE}~ \word{p} dest -{}- orig dest / flag -{}- ) \\
\word{bs} conditional exit from loops \\
\word{POSTPONE} \word{IF} \> \word{bs} conditional forward brach \\
\- 1 \word[tools]{CS-ROLL} \> \word{bs} keep dest on top \\
\word{;} \word{IMMEDIATE} \\[2\parskip]
\+ \word{:} \word{REPEAT}~ \word{p} orig dest -{}- / -{}- ) \\
\word{bs} resolve a single WHILE and return to BEGIN \\
\word{POSTPONE} \word{AGAIN} \> \word{bs} uncond. backward branch to dest \\
\- \word{POSTPONE} \word{THEN} \> \word{bs} resolve forward branch from orig \\
\word{;} \word{IMMEDIATE} \\[2\parskip]
\+ \word{:} \word{ELSE}~ \word{p} orig1 -{}- orig2 / -{}- ) \\
\word{bs} resolve IF supplying alternate execution \\
\word{POSTPONE} \word[tools]{AHEAD} \> \word{bs} unconditional forward branch orig2 \\
1 \word[tools]{CS-ROLL} \> \word{bs} put orig1 back on top \\
\- \word{POSTPONE} \word{THEN} \> \word{bs} resolve forward branch from orig1 \\
\word{;} \word{IMMEDIATE}
\end{tabbing}
\end{quote}
\begin{figure}[ht]
\begin{center}
\fbox{\includegraphics[bb=0 0 529 262,width=0.8\textwidth]{additional.png}}
\caption{Additional basic control-flow patterns}
\label{fig:additional}
\end{center}
\end{figure}
Forth control flow provides a solution for well-known problems with
strictly structured programming.
The basic control structures can be supplemented, as shown in the
examples in figure~\textbf{\ref{fig:extended}}, with additional
\word{WHILE}s in \word{BEGIN} {\ldots} \word{UNTIL} and \word{BEGIN}
{\ldots} \word{WHILE} {\ldots} \word{REPEAT} structures. However, for
each additional \word{WHILE} there must be a \word{THEN} at the end
of the structure. \word{THEN} completes the syntax with \word{WHILE}
and indicates where to continue execution when the \word{WHILE}
transfers control. The use of more than one additional \word{WHILE}
is possible but not common. Note that if the user finds this use of
\word{THEN} undesirable, an alias with a more likable name could be
defined.
Additional actions may be performed between the control flow word (the
\word{REPEAT} or \word{UNTIL}) and the \word{THEN} that matches the
additional \word{WHILE}. Further, if additional actions are desired
for normal termination and early termination, the alternative actions
may be separated by the ordinary Forth \word{ELSE}. The termination
actions are all specified after the body of the loop.
\begin{figure}[ht]
\begin{center}
\fbox{\includegraphics[bb=0 0 598 462, width=0.8\textwidth]{extended.png}}
\caption{Extended control-flow patterns}
\label{fig:extended}
\end{center}
\end{figure}
Note that \word{REPEAT} creates an anomaly when matching the
\word{WHILE} with \word{ELSE} or \word{THEN}, most notably when
compared with the \word{BEGIN}{\ldots}\word{UNTIL} case. That is,
there will be one less \word{ELSE} or \word{THEN} than there are
\texttt{WHILE}s because \word{REPEAT} resolves one \word{THEN}. As
above, if the user finds this count mismatch undesirable, \word{REPEAT}
could be replaced in-line by its own definition.
Other loop-exit control-flow words, and even other loops, can be
defined. The only requirements are that the control-flow stack is
properly maintained and manipulated.
The simple implementation of the \word{CASE} structure
below is an example of control structure extension. Note the
maintenance of the data stack to prevent interference with the
possible control-flow stack usage.
\begin{quote}\ttfamily
\begin{tabbing}
\tab \= \hspace{10em} \= \kill
0 \word{CONSTANT} \word{CASE} \word{IMMEDIATE}~ \word{p} init count of OFs ) \\[2\parskip]
\+ \word{:} \word{OF}~ \word{p} \#of -{}- orig \#of+1 / x -{}- ) \\
\word{1+} \> \word{p} count OFs ) \\
\word{toR} \> \word{p} move off the stack in case the control-flow ) \\
\> \word{p} stack is the data stack. ) \\
\word{POSTPONE} \word{OVER}~ \word{POSTPONE} \word{=}~
\word{p} copy and test case value) \\
\word{POSTPONE} \word{IF} \> \word{p} add orig to control flow stack ) \\
\word{POSTPONE} \word{DROP} \> \word{p} discards case value if = ) \\
\- \word{Rfrom} \> \word{p} we can bring count back now ) \\
\word{;} \word{IMMEDIATE} \\[2\parskip]
\+ \word{:} \word{ENDOF}~ \word{p} orig1 \#of -{}- orig2 \#of ) \\
\word{toR} \> \word{p} move off the stack in case the control-flow ) \\
\> \word{p} stack is the data stack. ) \\
\word{POSTPONE} \word{ELSE} \\
\- \word{Rfrom} \> \word{p} we can bring count back now ) \\
\word{;} \word{IMMEDIATE} \\[2\parskip]
\+ \word{:} \word{ENDCASE}~ \word{p} orig1..orign \#of -{}- ) \\
\word{POSTPONE} \word{DROP} \> \word{p} discard case value ) \\
0 \word{qDO} \\
\tab \word{POSTPONE} \word{THEN} \\
\- \word{LOOP} \\
\word{;} \word{IMMEDIATE}
\end{tabbing}
\end{quote}
\paragraph{Return stack} ~ % A.3.2.3.3
The restrictions in section \xref[3.2.3.3 Return stack]{usage:returnstack}
are necessary if implementations are to be allowed to place loop
parameters on the return stack.
\addtocounter{subsubsection}{2}
\subsubsection{Environmental queries} % A.3.2.6
The size in address units of various data types may be determined by
phrases such as \texttt{1} \word{CHARS}. Similarly, alignment may be
determined by phrases such as \texttt{1} \word{ALIGNED}.
The environmental queries are divided into two groups: those that
always produce the same value and those that might not. The former
groups include entries such as \texttt{MAX-N}. This information is
fixed by the hardware or by the design of the Forth system; a user
is guaranteed that asking the question once is sufficient.
The other, now obsolescent, group of queries are for things that may
legitimately change over time. For example an application might test
for the presence of the Double Number word set using an environment
query. If it is missing, the system could invoke a system-dependent
process to load the word set. The system is permitted to change
\word{ENVIRONMENTq}'s database so that subsequent queries about
it indicate that it is present.
Note that a query that returns an ``unknown'' response could produce
a ``known'' result on a subsequent query.
\subsubsection{Obsolescent Environmental Queries} % A.3.2.7
\label{rat:obsolete}
When reviewing the Forth 94 Standard, the question of adapting the
word set queries had to be addressed. Despite the recommendation
in Forth 94, word set queries have not been
supported in a meaningful way by many systems. Consequently, these
queries are not used by many programmers. The committee was unwilling
to exacerbate the problem by introducing additional queries for the
revised word sets. The committee has therefore declared the word set
environment queries (see table \ref{table:obsolete}) as obsolescent
with the intention of removing them altogether in the next revision.
They are retained in this standard to support existing Forth 94
programs. New programs should not use them.
\subsubsection{Extension queries} % A.3.2.8
\subsection{The Forth dictionary} % A.3.3
A Standard Program may redefine a standard word with a non-standard
definition. The program is still standard (since it can be built on
any Standard System), but the effect is to make the combined entity
(Standard System plus Standard Program) a non-standard system.
\subsubsection{Name space} % A.3.3.1
\setcounter{paragraph}{1}
\paragraph{Definition names} ~ % A.3.3.1.2
The language in this section is there to ensure the portability of
Standard Programs. If a program uses something outside the Standard
that it does not provide itself, there is no guarantee that another
implementation will have what the program needs to run. There is no
intent whatsoever to imply that all Forth programs will be somehow
lacking or inferior because they are not standard; some of the finest
jewels of the programmer's art will be non-standard. At the same time,
the committee is trying to ensure that a program labeled ``Standard''
will meet certain expectations, particularly with regard to portability.
In many system environments the input source is unable to supply
certain non-graphic characters due to external factors, such as the
use of those characters for flow control or editing. In addition,
when interpreting from a text file, the parsing function specifically
treats non-graphic characters like spaces; thus words received by the
text interpreter will not contain embedded non-graphic characters. To
allow implementations in such environments to call themselves standard,
this minor restriction on Standard Programs is necessary.
A Standard System is allowed to permit the creation of definition
names containing non-graphic characters. Historically, such names
were used for keyboard editing functions and ``invisible'' words.
\subsubsection{Code space} % A.3.3.2
\subsubsection{Data space} % A.3.3.3
\label{rat:dataspace}
The words \word{toIN}, \word{BASE}, \word[block]{BLK}, \word[block]{SCR},
\word{SOURCE}, \word{SOURCE-ID}, \word{STATE} contain information
used by the Forth system in its operation and may be of use to the
application. Any assumption made by the application about data
available in the Forth system it did not store other than the data
just listed is an environmental dependency.
There is no point in specifying (in the Standard) both what is and
what is not addressable. A Standard Program may NOT address:
\begin{itemize}
\item Directly into the data or return stacks;
\item Into a definition's data field if not stored by the application.
\end{itemize}
The read-only restrictions arise because some Forth systems run from
ROM and some share I/O buffers with other users or systems. Portable
programs cannot know which areas are affected, hence the general
restrictions.
\paragraph{Address alignment} ~ % A.3.3.3.1
\label{rat:aaddr}
Some processors have restrictions on the addresses that can be used
by memory access instructions. For example, some architectures require
16-bit data to be loaded or stored only at even addresses and 32-bit
data only at addresses that are multiples of four.
An implementor can handle these alignment restrictions in one of two
ways. Forth's memory access words (\word{@}, \word{!}, \word{+!},
etc.) could be implemented in terms of smaller-width access instructions,
which have no alignment restrictions. For example, on a system with
16-bit cells, \word{@} could be implemented with two byte-fetch
instructions and a reassembly of the bytes into a 16-bit cell. Although
this conceals hardware restrictions from the programmer, it is inefficient,
and may have unintended side effects in some hardware environments.
An alternate implementation could define each memory-access word
using the native instructions that most closely match the word's function.
The 16-bit cell system could implement \word{@} using the processor's
16-bit fetch instruction, in this case, the responsibility for giving
\word{@} a correctly-aligned address falls on the programmer. A
portable program must assume that alignment may be required and
follow the requirements of this section.
\paragraph{Contiguous regions} ~ % A.3.3.3.2
\label{rat:regions}
The data space of a Forth system comes in discontiguous regions. The
location of some regions is provided by the system, some by the
program. Data space is contiguous within regions, allowing address
arithmetic to generate valid addresses only within a single region.
A Standard Program cannot make any assumptions about the relative
placement of multiple regions in memory.
Section \ref{usage:contiguous} does prescribe conditions under which
contiguous regions of data space may be obtained. For example:
\begin{quote}\ttfamily
\word{CREATE} TABLE \quad
1 \word{C,} 2 \word{C,} \word{ALIGN} 1000 \word{,} 2000 \word{,}
\end{quote}
makes a table whose address is returned by \texttt{TABLE}. In
accessing this table,
\begin{quote}
\begin{tabular}{ll}
\texttt{TABLE} \word{C@} & will return 1 \\
\texttt{TABLE} \word{CHAR+} \word{C@} & will return 2 \\
\texttt{TABLE} \texttt{2} \word{CHARS} \word{+}
\word{ALIGNED} \word{@} & will return 1000 \\
\texttt{TABLE} \texttt{2} \word{CHARS} \word{+}
\word{ALIGNED} \word{CELL+} \word{@} & will return 2000. \\
\end{tabular}
\end{quote}
Similarly,
\begin{quote}\ttfamily
\word{CREATE} DATA \quad 1000 \word{ALLOT}
\end{quote}
makes an array 1000 address units in size. A more portable strategy
would define the array in application units, such as:
\begin{quote}\ttfamily
500 \word{CONSTANT} NCELLS \\
\word{CREATE} CELL-DATA NCELLS \word{CELLS} \word{ALLOT}
\end{quote}
This array can be indexed like this:
\begin{quote}\ttfamily
\word{:} LOOK \quad
NCELLS 0 \word{DO}
CELL-DATA \word{I} \word{CELLS} \word{+} \word[tools]{q}
\word{LOOP}
\word{;}
\end{quote}
\setcounter{paragraph}{3}
\paragraph{Text-literal regions} ~
\label{rat:"literal}
Additional transient buffers are provided for use by \word{Cq}, \word{Sq} and
\word{Seq}. The buffers should be able to store two consecutive strings, thus
allowing the command line:
\begin{quote}
\texttt{\word[core]{Sq} name1" \word[core]{Sq} name2" \word{RENAME-FILE}}
\end{quote}
The buffers may be implemented in a circular arrangement, where a string