Skip to content

Type System

Ovid edited this page Sep 14, 2021 · 5 revisions

Please see the main page of the repo for the actual RFC. As it states there:

Anything in the Wiki should be considered "rough drafts."

Click here to provide feedback.


Corinna needs a type system. However, this will probably be in v2 because it turns out to a) be hard and b) we need types for:

  • Cor
  • Signatures
  • Variable declarations
  • Whatever else I've forgotten

Variable declarations hasn't had much discussion and Dave Mitchell's work on types for signatures predates Cor's attempts at declaring types, so it's not compatible. I've opened up a ticket on the Perl git repo about types. We absolutely can't have a type system for Corinna which isn't integrated into the rest of the language.

But before we discuss this, I want to discuss a bit about type systems.

Type systems are not just about "Data Types"

Unfortunately, because data is "typed", we tend to conflate types systems and data types. People often think that declaring an Int or a Str is what makes a type system. This is not true. For the purposes of this discussion, I'll borrow Benjamin C. Pierce's definition from his excellent book Types and Programming Languages:

A type system is a tractable syntactic method for proving the absense of certain program behaviors by classifying phrases according to the kinds of values they compute.

Or to rephrase a bit more simply, if less accurately:

A type system should be designed with the goal of ensuring the software cannot exhibit certain unwanted behaviors.

So if we declare that a variable must contain an instance of a DateTime object, assigning the string "foo" to it should throw an exception. Whether it should do so at compile-time or runtime is a subject of much debate and I'll largely skip that aside from mentioning that—for Perl—compile-time checks should be preferred where doing so does not make our programming life harder. I'm also a fan of gradual typing, so we get the benefit of quick coding to solve immediate problems, but we can fall back to more rigorous restrictions for large-scale systems as needed.

That being said, let's consider taint checking. Taint checking does a more than most developers realize, but the most common behavior is not using data from outside your program to affect things inside your program unless you've "scrubbed" the outside data (obviously, this is a gross oversimplication). Otherwise, you'll get a failure. This happens at runtime:

$ perl -TE 'my $arg = shift; say "1. $arg"; system($arg); say "2. $arg"' foo
1. foo
Insecure $ENV{PATH} while running with -T switch at -e line 1.

Thus, taint checking is a type system because there are certain unwanted behaviors which are avoided.

Enabling strictures (use strict) is enabling another type system that, amongst many other things, forces non-package variables to be predeclared in some way. This failure happens at compile-time:

$ perl -Mstrict -E 'my @foo = (1,2); say "Hello"; say %foo'
Global symbol "%foo" requires explicit package name (did you forget to declare "my %foo"?) at -e line 1.
Execution of -e aborted due to compilation errors.

So Perl developers are quite happy with compile-type "type" failures, even if they don't always recognize that "type" means more than what is often thought of.

So for Corinna (and Perl in general), it's worth asking ourselves what kind of type system we need for the future.

Benefits of Type Systems

These are drawn from the introduction of the "Types and Programming Languages" book previously mentioned. I'll use the Corinna type syntax for examples.

Detecting Errors

Here's a common problem in many "dynamic" programming languages:

Python:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def inverse(self):
        return Point(self.y,self.x)

point = Point(7,3.2).inverse()
print(point.x)
point.x = "foo"
print(point.x)

That prints "3.2" and the "foo". In Python, it's considered "unpythonic" to validate the data. When faced with a million line codebase, you're supposed to just know all of your classes. This is not good.

In core Perl, it's not better (and the code is arguably much worse):

package Point {
    sub new {
        my ( $class, $x, $y ) = @_;
        return bless { x => $x, y => $y } => $class;
    }

    sub x {
        my $self = shift;
        if (@_) {
            $self->{x} = shift;
        }
        return $self->{x};
    }

    sub y {
        my $self = shift;
        if (@_) {
            $self->{y} = shift;
        }
        return $self->{y};
    }

    sub inverse {
        my $self = shift;
        return Point->new( $self->y, $self->x );
    }
}

Fortunately, developers have gotten tired of the related bugs because the code should protect programmers where it can. In Cor, that would be something like this:

class Point {
    has ( $x, $y ) :reader :writer :new :isa(Num);

    method invert () {
        return Point->new( x => $y, y => $x );
    }
}

If you tried to assign the string "foo" to x or y, you'd get an exception.

However, what about this:

my $x :isa(Int) = -7 / 2;

What should be the "safety" here? Should you get -3.5? That's what standard Perl would do because it tries not to throw away information. For some languages, that would return -3 or -4, both arguably wrong. For a type system in Perl, I would suggest that we get an exception (or at least a warning) there. I don't want the float silently converted to an integer, nor do I want the type to change.

And what about this?

my $x :isa(Int) = @args;

Do we coerce that to an integer? Unfortunately, yes, because of context. But if we're talking about the return value of a function, that's more complicated:

sub list {
    return ( 2, 4, 6 );
}

sub array {
    my @array = ( 2, 4, 6 );
    return @array;
}

my $list  = list();
my $array = array();
say $list;
say $array;

The say $list prints 6 and the say $array prints 3. This is due to a weird combination of context with lists and arrays (which are containers for lists, though this container has a nasty habit of handing you a naked list and running away).

How do you design a system for type safety in the face of that without gutting context?

Retrofitting a type system on a language that's not explicitly designed for one is filled with pitfalls.

Documentation

Consider this:

my $x :isa(Int);

method foo ($bar :isa(Num), $baz :isa(Str)) {
	...
}

In the above, we can read our code and get a reasonable idea about the kinds of data we need. Unlike POD or other forms of "documentation", this documentation cannot get out of sync with the program because it is the program. Further, tools such as javadoc can produce huge amounts of genuinely useful documentation in a variety of formats just by reading this information. It would be lovely if a future version of perldoc could take advantage of this.

Safety

Type systems can often guarantee various kinds of safety in programming. Perl's "taint mode" is one such type system. However, array bounds checking is another. The C language, for example, expects you to check the array boundaries yourself. Other languages can throw an "out of bounds" exception at compile-time (harder than it sounds) or runtime and Perl simply resizes the array on the fly to avoid overflow errors.

There are other types of safety would could achieve, though. Imagine a Secret[] modifier (this example is not proposed seriously):

my $password :isa(Secret[Str]) :where {...} = readline();

Now, wherever $password is printed, the resulting string would be hunter2. Stack trace? They'd see hunter2. Printed at the command line? hunter2? Interpolated in a string? hunter2.

In Cor, if you declared a class as Secret, that might effectively add a method like the following:

final method to_string () {'hunter2'}

Any attempt to override that in a subclass or with a role would be fatal.

Maybe we could go further by preventing that data from being written to swap? Automatically clearing that memory when the variable goes out of scope? There are all sorts of "safe" things you can do with a decent type system.

But how would get the original information in a secure way? Let's say you need to salt it, run blowfish over that, and save the results to the database. If you make Secret[] too secure, it's unusable. But if you make it too usable, it's not secure. And knowing the fun tricks I can play with namespaces in Perl, I suspect that most naïve schemes to "protect" the data would fail. Still, this is exactly the sort of safety a decent type system can improve (though not guarantee).

Efficiency

This one might take more time, but it's a clear goal to strive for. If I declare a type to be an integer, there's a lot of internal stuff that Perl would no longer need to store in the scalar. Or at the very least, it would always return a valid value from the IV slot and never check another slot in the scalar. However, I expect that the first implementations would actually slow down Perl because hey, you gotta check the type! Offering some way of turning off the type checking after exhaustive testing and then shipping into production would be a bad idea, but someone's going to propose it.

Humans versus Computers

In the original version of C (before it was even released), the only types were integers, characters, and arrays of integers and characters and pointers to them. Types were not really there to help the human; they were to help the computer. However, with the advent of OO systems, you could declare something to be an instance of a class. Thus, the type systems became a bit more human-friendly because the types matched the programmer's needs, not just the computer's needs, though we had things like this:

Bicycle yourBike = new Bicycle();

That's useful, but the repetition bothers me.

In Raku, you can avoid the repetition with the .= operator:

my Bicycle $your_bike .= new;

Corinna should also be "human-friendly", though it's unclear to me the best way to implement that. We could possibly implement a form of generics:

my $your_bike :isa(*) = Bicycle->new;

That would assign to $your_bike the type it gets, but it's unclear to me what that gains us over leaving the type annotation off.

Alternatively, we could implement a ->= operator:

my $your_bike Bicycle ->= new(%args);

But I expect many Perl developers would be unhappy with that and I'm one of them. I've tried very hard to limit adding new syntax to the language. Suggestions welcome on how we can improve typing without creating an overly verbose language.

However, we have many cases were our types are almost good enough, but not quite. What if I want an integer, but it must be greater than a given number? That's where we steal where from Raku:

my $age :isa(Int) :where { $_ >= $minimum_age };

Again, we get a more expressive type system that's geared to what humans need and not what computers need. Sure, we could implement a MinimumAge class, but that's overkill for many situations.

But ultimately, that leads us down the road of needing subtypes.

subtype MinimumAge :isa(Int) :where { $_ >= $minimum_age };

my $age :isa(MinimumAge);

Generics

Do we want to have generics on Cor? At first glance, we can simply omit the types. I've looked at various examples and so long as the data we're passing to something is typed, omitting the type annotations seems enough. However, I'd love to see counter-examples.

Type Hierarchy

And yeah, we need to think about a type hierarchy. I think the one in Moose is reasonable, but this is not my area of expertise.