Name	Name	Last commit message	Last commit date
parent directory ..
Main.hs	Main.hs
Parser.hs	Parser.hs
README.md	README.md
Repl.hs	Repl.hs
Sub.hs	Sub.hs
subBNF.txt	subBNF.txt

Simply-Typed Lambda Calculus with Subtyping

Haskell implementation on Benjamin Pierce's Lambda Calculus with Subtyping. It has the features of STLC but with generalised records, a Unit type, and subtype polymorphism.

This strongly normalising calculus serves as a foundation for object-oriented languages such as Java, Scala, and Python.

Prerequisites

You need Haskell, this compiles with GHC 8.2.2 at least (Stack resolver: lts-11.0).

To Build & Run

You can use cabal to build and run this, see this README, alternatively you can use vanilla ghc to build:

To compile and run do: ghc -O2 -o sub Main then run ./sub

Alternatively to use the GHCi Interpreter do: ghci Main then type main

In either case you get something like the following:

Welcome to the Sub REPL
Type some terms or press Enter to leave.
>

Note: When run in GHCi, you don't have the luxury of escaped characters, backspace, delete etc... Compile it using GHC if you need this.

Examples

Where you can then have some fun, try these examples:

{x=(), id=\x:A.x} a record containing a unit () and the identity function for type A assigned to labels x and id respectively.
\r:{a:A, b:B}.r this is the identity function for 2 element records containing an A and a B.
(\r:{u:1}.r) {u={u=()}} identity function for a singleton-record {u:1}, which due to subtyping will accept {u={u=()}}.
\r:{b:B}.r.b a function taking a record {b:B} and projecting out the label b (submit a PR with a less confusing syntax).
(\f:1->1.f) (\g:A->A.g) subtyping with arrow types.

The parser is smart enough to recognise λ and ⊤; so you can copy and paste from the output:

Welcome to the Sub REPL
Type some terms or press Enter to leave.
>   \r:{u:1}.r
=   λr:{u:⊤}.r
>   λr:{u:⊤}.r
=   λr:{u:⊤}.r

> denotes the REPL waiting for input, = means no reductions occurred (it's the same term), ~> denotes one reduction, and ~>* denotes 0 or more reductions (although in practice this is 1 or more due to =).

There is also a reduction tracer, which should print each reduction step. prefix any string with ' in order to see the reductions:

>   '(\x:1.\f:1->1.f {id=\a:A.a, u=x}) () (\r:{u:1, g:A->1}.r.u)
~>  (λf:⊤->⊤.f {id=λa:A.a, u=()}) (λr:{u:⊤, g:A->⊤}.r.u)
~>  (λr:{u:⊤, g:A->⊤}.r.u) {id=λa:A.a, u=()}
~>  {id=λa:A.a, u=()}.u
~>  ()

There is also a typing mechanism, which should display the type or fail as usual.

>   t\r:{a:A, b:B}.r.b
{a:A, b:B}->B
>   t\r:{a:A, b:B}.r.c
Cannot Type Term: \r:{a:A, b:B}.r.c

Note: if you provide a non-normalizing term, the type checker will fail and reduction will not occur.

You can save terms for the life of the program with a let expression. Any time a saved variable appears in a term, it will be substituted for the saved term:

>   let f = (\x:1.\f:1->1.f {id=\a:A.a, u=x})
Saved term: λx:⊤.λf:⊤->⊤.f {id=λa:A.a, u=x}
>   let g = (\r:{u:1, g:A->1}.r.u)
Saved term: λr:{u:⊤, g:A->⊤}.r.u
>   f () g
~>* ()

Note: Consequently let and = are keywords, and so you cannot name variables with these.

Additionally we have type level lett statements that allow you to define and use types:

>   lett RECORD = {a:A, b:B, f:A->A, g:B->B}
Saved type: {a:A, b:B, f:A->A, g:B->B}
>   \r:RECORD.r
=   λr:{a:A, b:B, f:A->A, g:B->B}.r

This makes it easier to define both terms and types, but does not allow type level application (See Omega). lett is also a keyword.

Syntax

We base the language on the BNF for the typed calculus, with the addition of generalised records, record projections (using the record names as projections), the unit, and types for these terms:

$\begin{matrix} \mathbf{\tau}& ::= & \lambda \mathbf{\upsilon}\tt{:}\sigma . \mathbf{\tau}\\ & | & \tau\, {\tt space}\, \tau\\ & | & \upsilon \\ &|& \{\upsilon= \tau,...\} \\ &|&\tau . \upsilon\\ &|& () \\ &&\\ \upsilon & ::= & \tt{a} | \tt{b} | \tt{c} | ... | \tt{aa} | ... \\ &&\\ \sigma & ::= & {\tt A,B,C,...}\\ & | & \sigma \rightarrow \sigma\\ &|& \top\\ &|& \{\mathbf{\upsilon}\tt{:}\sigma,...\} \end{matrix}$

However we adopt standard bracketing conventions to eliminate ambiguity in the parser. Concretely, the parser implements the non-ambiguous grammar for terms as follows:

$\begin{matrix} &&\\ \mathbf{\tau}& ::= & \lambda \mathbf{\upsilon}\tt{:}\sigma . \mathbf{\tau}\\ & | & \alpha\\ &&\\ \alpha & ::= & \beta \\ &| &\mathbf{\alpha\, \tt{space}\, \beta} \\ &&\\ \beta & ::= & \tt{(}\tau \tt{)}\\ &|& \upsilon \\ &|& () \\ &|& \{\eta\}\\ &|& \upsilon.\upsilon\\ &|& \{\eta\}.\upsilon\\ &&\\ \upsilon & ::= & \tt{a} | \tt{b} | \tt{c} | ... | \tt{aa} | ... \\\ &&\\ \end{matrix}$

$\begin{matrix} \eta & ::= & \upsilon = \tau\\ &|& \upsilon = \tau , \eta \\ \end{matrix}$

and types:

$\begin{matrix} \sigma & ::= & \gamma \\ & | & \gamma \rightarrow \sigma\\ \\ \gamma &::=& A,B,C,...\\ &|& (\sigma)\\ &|&\top\\ &|& \{\varrho\} \\ \\ \varrho & ::= & \upsilon : \sigma\\ &|& \upsilon : \sigma , \varrho \\ \end{matrix}$

Some notes about the syntax:

The above syntax only covers the core calculus, and not the repl extensions (such as let bindings above). The extensions are simply added on in the repl.
Variables are strings (excluding numbers), as this is isomorphic to a whiteboard treatment and hence the most familiar.
Units are largely uninteresting, but can be formed as () like in Haskell, and typed like 1 or (or ⊤).
Records are generalised products formed as comma-separated sequence of assignments of terms to labels t=v, nested inside curly braces {u=(), ida=\a:A.a}. Record types are comma-separated assignments of typings to labels t:v, nested inside curly braces (such as {1:⊤, f:A->A}).
Despite using subtyping, the subtyping relation is not part of the syntax or language itself but rather a feature used during typechecking. As such it is not featured in the grammars.
Types range over upper-case characters A,B,C..., nested arrow types: T -> T, the unit type (), and generalised record types {a:A,f:B->A,...,c:C}.
Arrows associate to the right so that T -> T -> T is the same as T -> (T -> T) but not ((T -> T) -> T).
Nested terms don't need brackets: \a:A.\b:B. b unless enforcing application on the right. Whitespace does not matter \r:{a:A, b:B}. r unless it is between application where you need at least one space.
To quit use Ctrl+C or whatever your machine uses to interrupt computations.

Semantics

The semantics implements beta-reduction on terms and alpha-equivalence as the Eq instance of STerm. The semantics are the same as STLC with the addition of subtype polymorphism with augments the typing relation with a subtyping relation <. We reformulate the semantics as typing judgements:

We have typing rules for the Unit type (otherwise known as Top):

$\overline{\Gamma\vdash () : \top}$

The subtyping relation is a reflexive and transitive relation < defined as follows:

$\overline{S < S}$

$\frac{A < B\quad B < C}{A < C}$

$\overline{S < \top}$

$\frac{T_1 < S_1 \quad S_2 < T_2}{S_1 \rightarrow S_2 < T_1 \rightarrow T_2}$

These rules denote reflexivity, transitivity, that () is the supertype of all types, and the arrow-subtyping rule for functions. The standard typing rules from STLC are then influenced by this, but structurally are the same beyond a type substitution rule:

Variables:

$\overline{\Gamma \vdash x:T},\quad \mbox{(if $x:T \in \Gamma$)}$

for abstractions:

$\frac{\Gamma ,x:A \vdash t:B}{\Gamma \vdash (\lambda x : A. t) : B }$

for application:

$\frac{\Gamma \vdash f : A \Rightarrow B\quad \Gamma \vdash x : A}{\Gamma \vdash (f x) : B}$

the subtyping substitution rule (implicit in typing, not explicitly used):

$\frac{\Gamma \vdash t:A\quad A < B}{\Gamma \vdash t:B }$

and the reduction relation (adopted from STLC):

$(\lambda x : T . M)N \rightsquigarrow M [x := N]$

Next, there is a special elimination rule for records:

$\frac{\forall i .\,\Gamma \vdash t_i : T_i}{\forall i.\,\Gamma \vdash \{l_i=t_i \}:\{l_i:T_i\}}$

with sensible structural (internal) reduction rules for terms inside records. Lastly we have rules for record projections:

$\frac{\forall i .\,\Gamma \vdash t : \{l_i:T_i\}}{\forall i.\,\Gamma \vdash t.l_i : T_i}$

$\frac{t \rightsquigarrow t'}{t.l \rightsquigarrow t'.l}$

$\{l_i=t_i\}.l \rightsquigarrow t_i$

with width subtyping rules for records the subtype record can have more fields than its supertype record as long as the common fields match on label and type):

$\{ l_i : T_i^{\forall i \in 1..n+k} \} < \{ l_i : T_i^{\forall i \in 1..n} \}$

Which we extend width record depth subtyping in which common fields in records need to match on labels but their types need only to form the subtyping relation, rather than stricter width subtyping above:

$\frac{\forall i. S_i < T_i}{\{l_i:S_i\} < \{l_i : T_i\}}$

We do structural subtyping here so that the structure of a type determines whether it's a subtype of another type. The alternative is nominal subtyping where terms are subtypes when defined in a certain way (like how classes extend supertypes in Java).
Field ordering in records does not matter, enabling permutation subtyping.
The subtyping relation is not explicit in the language but rather done at typechecking time and so any terms using < are for demonstration and are not parsable in Sub.
Records are typeable only if all of their subterms are typeable and all of the labels are unique. We leverage the STLC typing rules and the subtyping relation to ensure record fields are typeable.
Projections are typeable only if it is applied to a term which is a well-typed record and the projection label (x in {x=()}.x) explicitly matches a label in that record.
Units are considered to be equivalent to Object in conventional object-oriented languages and hence a function taking a () will take anything, as everything is a subtype of Object: (\x:1.x) (\a:A.a). We should mention that languages like Java might include additional baggage like hashCode, our treatment is simpler as we don't need this to demonstrate how subtyping works. The subtyping relation therefore states that all types are subtypes of units ():1.
All base types are on the same level and hence A < B for arbitrary A != B is false but A < A is true (due to the reflexivity of <).
For arrow types f:S1 -> S2 and g:T1 -> T2 we require that T1 < S1 and S2 < T2 if f < g. The first is needed as g expects a type T1 and f provides a type S1 with a domain (input space) that may supercede T1. The second is because the codomain (result) of f should be a subtype of g if we are going to use S2 anywhere T2 would be expected. This is demonstrated by the term (\f:1->1.f) (\g:A->A.g)
This implementation follows a small-step operational semantics and Berendregt's variable convention (see substitution in Sub.hs). The variable convention is adopted for both types and terms.
Reductions include the one-step reduction (see reduce1 in Sub.hs), the many-step reduction (see reduce in Sub.hs).

Other Implementation Details

Sub.hs contains the Haskell implementation of the calculus, including substitution, subtyping, reduction, and other useful things.
Parser.hs contains the monadic parser combinators needed to parse input strings into typed-term ASTs for the calculus.
Repl.hs contains a simple read-eval-print loop which hooks into main, and into the parser.
Main.hs is needed for GHC to compile without any flags, it also invokes the repl.

For contributions, see the project to-do list or submit a PR with something you think it needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub

Sub

README.md

Simply-Typed Lambda Calculus with Subtyping

Prerequisites

To Build & Run

Examples

Syntax

Semantics

Other Implementation Details

Files

Sub

Directory actions

More options

Directory actions

More options

Latest commit

History

Sub

Folders and files

parent directory

README.md

Simply-Typed Lambda Calculus with Subtyping

Prerequisites

To Build & Run

Examples

Syntax

Semantics

Other Implementation Details