Our current programming languages pay almost no thought to security as part of language design, and are extremely vulnerable as a result.
Let's talk about JavaScript. JavaScript in 2018 has reached an interesting place where we have package managers, bundlers and a wonderful ecosystem of packages to draw on from npm.
Now, if you're interested in security and JavaScript, there's a good chance you've read David Gilbertson's article: "I’m harvesting credit card numbers and passwords from your site. Here’s how.".
The premise of this article is that your project's third party dependencies are a huge attack vector. What's to stop someone from including malicious code in a package and stealing your secrets?
Do you review all of the code in all of your npm dependencies? Do you review all of the code in all of the sub-dependencies, all the way down the tree?
Perhaps you do, but did you do that by looking at the actual code shipped in each version of the modules you're installing?
And of course, even just installing a single package on your system from npm might be enough to compromise you, considering that npm allows you to configure postinstall
hooks that can run any command the author desires.
I mean it's fine, it's running in user space. Hopefully you don't have any secrets in your dotfiles, and if you do, I hope your permissions are set correctly!
So we can't even install a single version of a package safely. Time to start reviewing the tarball
of every package before you install it!
Of course, all of these problems apply roughly equally to Ruby, Python and many other common languages.
At this point, you might be feeling a little queasy, and I don't blame you.
Thankfully, I think we can have a better future. What we need is a system to allow us to manage the permissions that our untrusted code has!
It's not a new idea. Anyone who's used or created an app for Android or iOS will be familiar with the idea of granting permissions to third parties.
I've also heard rumour that Java has a security/permissions system of this sort built in nowadays.
Let's contrive an example: we're building an issue management system, and we want to play a noise when a new ticket comes in. We see that there's an awesome module we can use to play the noise for us, so we don't have to figure out how the scary WebAudio API works.
In JavaScript, there's nothing to stop this package from reading the contents of localstorage and sending it off to a remote server the second we import the module.
Elm improves on JavaScript here. Just importing a module in Elm shouldn't cause any effects, but when we call the function and return the resulting Cmd
out of our update, there's nothing to stop our malicious package from returning a batch of Cmd
s that reads localStorage
, sends the data off over HTTP and then plays a sound.
(Side note: there is actually something to stop that from happening, but it's that Elm doesn't yet support localStorage
. Once that's supported this attack vector should work just fine.)
Elm's Cmd
s are an opaque type, so there's no way for us to easily inspect each Cmd
and see what will actually happen when it is executed. However, even if we could see the full details of a Cmd
, that's still not a great place to be.
The issue I see with Elm's approach is that it doesn't require us to grant permission to third party module to produce effects of certain types. Any Elm module can import an http library and make requests, assuming you're willing to execute Cmd
s from that module.
So what about Forest? Well, to start with, a Forest program cannot perform arbitrary side effects. All effects must be returned from the main
function. The execution of these effects is not within the control of a Forest program.
In Forest, all side effects will be performed by drivers. This is a design choice taken from Cycle.js. Each driver has a distinct responsibility, like HTTP requests, DOM interaction, localStorage/IndexedDB and any other browser/system API we want to talk to.
The only place a driver can be registered is in the main file of your application code. The arguments to your main function are then the sources of your drivers, and the return type of your main function are the sinks for you drivers.
If you want to make an HTTP request, you must register an HTTP driver, take the HTTP source as an argument, and return an effect that was created from the HTTP source.
Third party modules will not be able to register drivers. If you want to use a third party module to make HTTP requests, you would need to pass an HTTP source in and return the effects back out of your main function.
With this in mind, let's return to our earlier example. We pull in a third party module to play a sound. Since this is Forest, the third party module can't just play sounds willy-nilly. Instead, we have to register a WebAudio driver, and then pass the WebAudio source into our third party module.
We can then have confidence that the effects return by our third party sound module will only cause sounds to play. If that module wants to perform HTTP requests or access localStorage, it has to take HTTP/storage sources as an argument.
The action of registering a driver and passing a driver's source into a third party module is a way for you to give permission for that third party module to perform effects of that type.
Now, the immediate objection I can see is that users don't always want to have to explicitly pass around sources and sinks, and this is a common complaint from people new to Cycle.js.
Personally, I value having explicit annotations that tell me what is required to call a certain piece of code. One of the major places this pays off is in your tests. If you didn't have to explicitly register that your app uses audio, your app might randomly start playing sounds during your test suite, or worse, making HTTP requests!
With Forest's approach, when you go to test an app, you would be able to see that it depends on HTTP and WebAudio, and mock out the sources accordingly, or use real drivers when appropriate.
I am sympathetic to the pain of having to pass all of your sources around. I'm currently considering a compromise where your explicitly annotate your main function, but then sources are automatically passed down into functions to save boilerplate. The only place you would need to be explicit is when you're crossing the boundary into third party code, and at that point you would need to grant permissions by passing sources.
Few addendums:
So what about postinstall
? How will Forest
fix this?
You don't want to hear this, and I don't want to say it, but we'll need a package manager isn't okay with running arbitrary effectful code.
You keep saying effect, but what do you actually mean? How is that defined?
Effects mean many things to many people. Since Forest compiles to WebAssembly, and currently WebAssembly can only be used for performing numerical calculations in memory, we'll define an effect as anything that requires calling into JavaScript code ("native code", from Forest's perspective).
TL;DR: in a pure functional language, dependency injection can be used as permissions management for untrusted code.