Haskell implementation on Benjamin Pierce's Lambda Calculus with Subtyping. It has the features of STLC but with generalised records, a Unit type, and subtype polymorphism.
This strongly normalising calculus serves as a foundation for object-oriented languages such as Java, Scala, and Python.
You need Haskell, this compiles with GHC 8.2.2 at least (Stack resolver: lts-11.0).
You can use cabal to build and run this, see this README, alternatively you can use vanilla ghc to build:
To compile and run do:
ghc -O2 -o sub Main
then run ./sub
Alternatively to use the GHCi Interpreter do:
ghci Main
then type main
In either case you get something like the following:
Welcome to the Sub REPL
Type some terms or press Enter to leave.
>
Note: When run in GHCi, you don't have the luxury of escaped characters, backspace, delete etc... Compile it using GHC if you need this.
Where you can then have some fun, try these examples:
{x=(), id=\x:A.x}
a record containing a unit()
and the identity function for typeA
assigned to labelsx
andid
respectively.\r:{a:A, b:B}.r
this is the identity function for 2 element records containing anA
and aB
.(\r:{u:1}.r) {u={u=()}}
identity function for a singleton-record{u:1}
, which due to subtyping will accept{u={u=()}}
.\r:{b:B}.r.b
a function taking a record{b:B}
and projecting out the labelb
(submit a PR with a less confusing syntax).(\f:1->1.f) (\g:A->A.g)
subtyping with arrow types.
The parser is smart enough to recognise λ and ⊤; so you can copy and paste from the output:
Welcome to the Sub REPL
Type some terms or press Enter to leave.
> \r:{u:1}.r
= λr:{u:⊤}.r
> λr:{u:⊤}.r
= λr:{u:⊤}.r
>
denotes the REPL waiting for input, =
means no reductions occurred (it's the same term), ~>
denotes one reduction, and ~>*
denotes 0 or more reductions (although in practice this is 1 or more due to =
).
There is also a reduction tracer, which should print each reduction step. prefix any string with '
in order to see the reductions:
> '(\x:1.\f:1->1.f {id=\a:A.a, u=x}) () (\r:{u:1, g:A->1}.r.u)
~> (λf:⊤->⊤.f {id=λa:A.a, u=()}) (λr:{u:⊤, g:A->⊤}.r.u)
~> (λr:{u:⊤, g:A->⊤}.r.u) {id=λa:A.a, u=()}
~> {id=λa:A.a, u=()}.u
~> ()
There is also a typing mechanism, which should display the type or fail as usual.
> t\r:{a:A, b:B}.r.b
{a:A, b:B}->B
> t\r:{a:A, b:B}.r.c
Cannot Type Term: \r:{a:A, b:B}.r.c
Note: if you provide a non-normalizing term, the type checker will fail and reduction will not occur.
You can save terms for the life of the program with a let
expression. Any time a saved variable appears in a term, it will be substituted for the saved term:
> let f = (\x:1.\f:1->1.f {id=\a:A.a, u=x})
Saved term: λx:⊤.λf:⊤->⊤.f {id=λa:A.a, u=x}
> let g = (\r:{u:1, g:A->1}.r.u)
Saved term: λr:{u:⊤, g:A->⊤}.r.u
> f () g
~>* ()
Note: Consequently let
and =
are keywords, and so you cannot name variables with these.
Additionally we have type level lett
statements that allow you to define and use types:
> lett RECORD = {a:A, b:B, f:A->A, g:B->B}
Saved type: {a:A, b:B, f:A->A, g:B->B}
> \r:RECORD.r
= λr:{a:A, b:B, f:A->A, g:B->B}.r
This makes it easier to define both terms and types, but does not allow type level application (See Omega). lett
is also a keyword.
We base the language on the BNF for the typed calculus, with the addition of generalised records, record projections (using the record names as projections), the unit, and types for these terms:
However we adopt standard bracketing conventions to eliminate ambiguity in the parser. Concretely, the parser implements the non-ambiguous grammar for terms as follows:
and types:
Some notes about the syntax:
- The above syntax only covers the core calculus, and not the repl extensions (such as let bindings above). The extensions are simply added on in the repl.
- Variables are strings (excluding numbers), as this is isomorphic to a whiteboard treatment and hence the most familiar.
- Units are largely uninteresting, but can be formed as
()
like in Haskell, and typed like1
or (or⊤
). - Records are generalised products formed as comma-separated sequence of assignments of terms to labels
t=v
, nested inside curly braces{u=(), ida=\a:A.a}
. Record types are comma-separated assignments of typings to labelst:v
, nested inside curly braces (such as{1:⊤, f:A->A}
). - Despite using subtyping, the subtyping relation is not part of the syntax or language itself but rather a feature used during typechecking. As such it is not featured in the grammars.
- Types range over upper-case characters
A,B,C...
, nested arrow types:T -> T
, the unit type()
, and generalised record types{a:A,f:B->A,...,c:C}
. - Arrows associate to the right so that
T -> T -> T
is the same asT -> (T -> T)
but not((T -> T) -> T)
. - Nested terms don't need brackets:
\a:A.\b:B. b
unless enforcing application on the right. Whitespace does not matter\r:{a:A, b:B}. r
unless it is between application where you need at least one space. - To quit use
Ctrl+C
or whatever your machine uses to interrupt computations.
The semantics implements beta-reduction on terms and alpha-equivalence as the Eq
instance of STerm
. The semantics are the same as STLC with the addition of subtype polymorphism with augments the typing relation with a subtyping relation <
. We reformulate the semantics as typing judgements:
We have typing rules for the Unit type (otherwise known as Top):
The subtyping relation is a reflexive and transitive relation <
defined as follows:
These rules denote reflexivity, transitivity, that ()
is the supertype of all types, and the arrow-subtyping rule for functions. The standard typing rules from STLC are then influenced by this, but structurally are the same beyond a type substitution rule:
Variables:
for abstractions:
for application:
the subtyping substitution rule (implicit in typing, not explicitly used):
and the reduction relation (adopted from STLC):
Next, there is a special elimination rule for records:
with sensible structural (internal) reduction rules for terms inside records. Lastly we have rules for record projections:
with width subtyping rules for records the subtype record can have more fields than its supertype record as long as the common fields match on label and type):
Which we extend width record depth subtyping in which common fields in records need to match on labels but their types need only to form the subtyping relation, rather than stricter width subtyping above:
- We do structural subtyping here so that the structure of a type determines whether it's a subtype of another type. The alternative is nominal subtyping where terms are subtypes when defined in a certain way (like how classes extend supertypes in Java).
- Field ordering in records does not matter, enabling permutation subtyping.
- The subtyping relation is not explicit in the language but rather done at typechecking time and so any terms using
<
are for demonstration and are not parsable in Sub. - Records are typeable only if all of their subterms are typeable and all of the labels are unique. We leverage the STLC typing rules and the subtyping relation to ensure record fields are typeable.
- Projections are typeable only if it is applied to a term which is a well-typed record and the projection label (
x
in{x=()}.x
) explicitly matches a label in that record. - Units are considered to be equivalent to Object in conventional object-oriented languages and hence a function taking a
()
will take anything, as everything is a subtype of Object:(\x:1.x) (\a:A.a)
. We should mention that languages like Java might include additional baggage like hashCode, our treatment is simpler as we don't need this to demonstrate how subtyping works. The subtyping relation therefore states that all types are subtypes of units():1
. - All base types are on the same level and hence
A < B
for arbitraryA != B
is false butA < A
is true (due to the reflexivity of<
). - For arrow types
f:S1 -> S2
andg:T1 -> T2
we require thatT1 < S1
andS2 < T2
iff < g
. The first is needed asg
expects a typeT1
andf
provides a typeS1
with a domain (input space) that may supercedeT1
. The second is because the codomain (result) off
should be a subtype ofg
if we are going to useS2
anywhereT2
would be expected. This is demonstrated by the term(\f:1->1.f) (\g:A->A.g)
- This implementation follows a small-step operational semantics and Berendregt's variable convention (see
substitution
in Sub.hs). The variable convention is adopted for both types and terms. - Reductions include the one-step reduction (see
reduce1
in Sub.hs), the many-step reduction (seereduce
in Sub.hs).
- Sub.hs contains the Haskell implementation of the calculus, including substitution, subtyping, reduction, and other useful things.
- Parser.hs contains the monadic parser combinators needed to parse input strings into typed-term ASTs for the calculus.
- Repl.hs contains a simple read-eval-print loop which hooks into main, and into the parser.
- Main.hs is needed for GHC to compile without any flags, it also invokes the repl.
For contributions, see the project to-do list or submit a PR with something you think it needs.