Skip to content

Dialects & Language

Benjamin Kowarsch edited this page May 29, 2017 · 20 revisions

Dialect Selection

M2J supports the following Modula-2 dialects:

  • following Wirth's 3rd Edition (PIM3)
  • following Wirth's 4th Edition (PIM4)
  • a PIM subset with language extensions (Ext)

Selecting PIM3 Mode

To select the PIM3 dialect, use ...

m2j --pim3 sourcefile

Selecting PIM4 Mode

To select the PIM4 dialect, use ...

m2j --pim4 sourcefile

Selecting Extended Mode

To select the PIM3 dialect, use ...

m2j --ext sourcefile

Language Features of the PIM Dialects

When a PIM dialect is selected, certain outdated or unsafe language features specified in Wirth's language reports are disabled by default:

  • synonyms
  • octal literals
  • cast syntax
  • coroutines
  • variant records
  • local modules

Any of these features may be individually enabled using compiler options. For full PIM compliance use ...

m2j --pim3 --compliant sourcefile
m2j --pim4 --compliant sourcefile

Synonyms

In place of PIM synonyms ~ and &, reserved words NOT and AND should be used.

foo := NOT bar; (* in place of ~ *)
foo := bar AND baz; (* in place of & *)

In place of PIM synonym <>, symbol # should be used.

foo := bar # baz; (* in place of <> *)

Octal Literals

In place of octal character code literals, the built-in CHR function should be used.

CONST newLine = CHR(10); (* in place of 12C *)

In place of octal number literals, decimal number literals should be used.

CONST foo = 255; (* in place of 377B *)

Cast Syntax

In place of PIM cast syntax, the PIM dialect modes provide a CAST function which may be imported from module SYSTEM.

IMPORT SYSTEM;

foo := SYSTEM.CAST(FooType, bar); (* in place of FooType(bar) *)
bar := SYSTEM.CAST(BarType, foo); (* in place of BarType(foo) *)

Module Priority

Module priority syntax is recognised but always ignored.

Legacy Export

When compiler option --legacy-export is used, export directives in definition modules are permitted but ignored, allowing compilation of PIM2 sources in PIM3 or PIM4 dialect mode.

Language Features of the Extended Dialect

Omissions

Outdated and unsafe PIM language features are omitted in the extended dialect:

  • no synonyms
  • no suffix literals
  • no octal literals
  • no unqualified import
  • no implicit cast
  • no variant records
  • no module priority
  • no local modules
  • no WITH statement

Changes

Some changes relative to PIM3 and PIM4 apply:

  • the set difference operator is \ instead of -
  • Opaque type definitions are marked by reserved word OPAQUE
  • the | symbol within CASE statements is a branch prefix, not a separator
  • pseudo-module SYSTEM is renamed to UNSAFE
  • INC, DEC and HALT need to be imported from module UNSAFE
  • cardinal types are not subrange types of integer types (same as in PIM3)
  • ARRAY OF CHAR values are always terminated by ASCII NUL (same as in PIM4)
  • global variables are always read-only when imported (recommended by PIM)
  • built-in function TSIZE has return type LONGCARD, its result is given in octets

Extensions

In addition to the basic PIM subset, the extended dialect supports a number of language extensions drawn from Oberon and Modula-2 R10 which are described below.

  • line comments
  • lowline character in identifiers
  • prefix literals replacing suffix literals
  • escaped tab and newline in string literals
  • increment and decrement statements
  • CONST parameters
  • variadic parameters
  • extensible record types
  • variable size record types
  • unified conversion function
  • explicit CAST function
  • additional builtins
  • foreign function interface pragmas

Line Comments

In addition to block comments, the extended dialect supports Fortran-style line comments.

! this is a line comment, terminating at the end-of-line

Lowline Character in Identifiers

The extended dialect supports the use of the lowline _ in identifiers.

CONST ASCII_TAB = CHR(9);

This feature is disabled by default and may be enabled by compiler option --lowline-identifiers. It is primarily intended for use with foreign function interfaces.

Prefix Literals

In place of PIM-style suffix literals, the extended dialect uses C-style prefix literals.

CONST int = 0xFF00; (* whole number *)
CONST newLine = 0uA; (* character *)

Escaped Tab and Newline

The extended dialect supports C-style \t and \n escaped character codes within string literals.

CONST header = "Foo\tBar\tBaz";
CONST newLine = "This is the end of the line\n";

Increment and Decrement Statements

The extended dialect supports C-style postfix increment and decrement statements.

index++;
counter--;

Unlike C, the notation is only permitted in statements, not in expressions.

CONST Parameters

The extended dialect supports the CONST attribute in formal types and formal parameters. Parameters marked with the CONST attribute are immutable within the procedure or function.

TYPE P = PROCEDURE ( CONST ARRAY OF CHAR );
PROCEDURE WriteString ( CONST s : ARRAY OF CHAR );

Variadic Parameters

The extended dialect supports the ARGLIST attribute in formal types and formal parameters. Parameters marked with the ARGLIST attribute may be passed a variable number of arguments.

PROCEDURE newVector ( values : ARGLIST OF REAL ) : Vector;

VAR v1, v2, v3 : Vector;

v1 := newVector(1.2, 3.4); (* two arguments *)
v2 := newVector(1.2, 3.4, 5.6); (* three arguments *)
v3 := newVector(1.2, 3.4, 5.6, 7.8); (* four arguments *)

Within the procedure or function, the argument count may be obtained using built-in function COUNT and the arguments are addressable using array subscript notation.

PROCEDURE PrintList ( values : ARGLIST OF REAL );
VAR index : CARDINAL;
BEGIN
  FOR index := 0 TO COUNT(values)-1 DO
    WriteReal(values[index]);
    WriteLn
  END
END PrintList;

Extensible Record Types

In place of variant record types, the extended dialect provides Oberon-style extensible record types.

TYPE Base = RECORD ( NIL )
  (* compiler inserts hidden type tag *)
  foo : Foo
END;

In the above example, NIL is specified as the base type of the new record type. This tells the compiler that this type is intended as a base type to be extended in other declarations. As a result, the compiler inserts a hidden type tag field into its field list. By contrast, a record type declaration without a base type parameter will not receive a hidden type tag field and will not be extensible.

TYPE ExtBar = RECORD ( Base )
  (* inherits foo from Base *)
  bar : Bar
END;

TYPE ExtBaz = RECORD ( ExtBar )
  (* inherits foo and bar from ExtBar *)
  baz : Baz
END;

In the above example, type ExtBar is declared to be an extension of type Base and type ExtBaz is declared to be an extension of type ExtBar. An extension type inherits all the fields from its base type, including the hidden type tag field.

PROCEDURE bam ( x : Base );
BEGIN
  CASE x OF (* type: *)
  | ExtBar : x.bar := BarValue
  | ExtBaz : x.bar := BarValue; x.baz := BazValue
  END
END bam;

In the above example, the CASE statement is used to test the actual type of a record of an extensible type. The hidden type tag field of the record is used to determine its actual type at runtime. Only those fields that are present in the actual type are accessible. Addressing such fields outside of the appropriate case branch in a CASE statement raises a compile time error. This facility obsoletes Oberon type guard syntax.

Variable Size Record Types

The extended dialect supports a type safe and bounds checked variant of C99-style variable length array members.

TYPE Buffer = VAR RECORD
  size : CARDINAL (* immutable after allocation *)
  (* other field declarations may appear here *)
IN (* indeterminate field *)
  buffer : ARRAY size OF SomeType (* size set at allocation *)
END;

The example above is equivalent to the following C99 declaration:

struct buffer_t {
  unsigned size;
  sometype_t buffer[];
}

However, in C the compiler does not associate the size field with the buffer array and consequently the array is not bounds checked. It is the responsibility of the C programmer to insert any such checks manually.

By contrast, M2J automatically inserts code to calculate the size for the buffer at allocation, write the value into the size field and treat the field as immutable. The buffer array is then automatically bounds checked against the value stored in the size field.

VAR b : Buffer;

NEW(b, 100); (* allocate buffer of size 100 *)

WriteCard(b^.size); (* prints 100 *)

FOR i := 0 TO b^.size - 1 DO
  b^.buffer[i] := someValue
END;

b^.size := 0; (* compile time error: attempt to write to an immutable field *)

Unified Conversion Function

In place of the inconsistent and confusingly named conversion functions in PIM, the extended dialect provides universal conversion function CONV for safe type conversions.

i := CONV(INTEGER, r); (* instead of i := TRUNC(r); *)
r := CONV(REAL, i); (* instead of r := FLOAT(i); *)

Explicit Cast Function

In place of PIM cast syntax, the extended dialect provides a CAST function which may be imported from module UNSAFE.

IMPORT UNSAFE;

foo := UNSAFE.CAST(FooType, bar);
bar := UNSAFE.CAST(BarType, foo);

Additional Built-in Types

The extended dialect supports the following additional built-in types:

OCTET, SHORTINT, SHORTCARD, LONGCARD;

Further, module UNSAFE provides type BYTE and constant BytesPerWord.

Additional Built-in Functions

The extended dialect supports the following additional built-in functions:

COUNT, LENGTH, PRED, SUCC, MAXORD;

Further, module UNSAFE provides the following bit manipulation primitives:

SHL, SHR, BWNOT, BWAND, BWOR, BWXOR;

Foreign Function Interface Pragmas

M2J supports foreign function interface pragmas FFI to declare a Modula-2 definition module as an interface for a foreign implementation module written in Java and FFIDENT to map a Modula-2 identifier to a Java identifier.

(* Modula-2 interface for Java package com.jurassic.flintstones *)
DEFINITION MODULE Flintstones <*FFI="JVM"*>
  <*FFIDENT="com.jurassic.flintstones"*>;

PROCEDURE Fred ( foo : INTEGER ) <*FFIDENT="fred"*>;
(* => public void fred (int foo); *)

PROCEDURE wilma ( bar : CARDINAL ) : INTEGER;
(* => public int wilma (uint bar); *)
...
END Flintstones.

Implementation Defined Features Common To All Dialects

Disabling Sections of Source Code

M2J supports special non-nesting tags ?< and >? to temporarily and safely disable arbitrary sections of source code. The tags must always be used at the first column of a line. The compiler emits a warning for each disabled code section.

MODULE Foo;
?<
CONST delimiter = '*)';
>?
BEGIN
...
END Foo.

Please note that these tags are debugging aids. They should be removed again before the code is committed to a repository or published or shipped. They should not be used for commenting either. Hence the warnings.

Size of Type WORD

The size of built-in type WORD is implementation defined. In M2J it is defined as one octet.

Source File Types

The compiler recognises source files according to their file types.

Source File File Types
Definition modules .def or .DEF
Implementation modules .mod or .MOD
Program modules .mod or .MOD

+++