Skip to content

Latest commit

 

History

History
497 lines (302 loc) · 15.9 KB

STYLEGUIDE.adoc

File metadata and controls

497 lines (302 loc) · 15.9 KB

Plutus Haskell style guide

The purpose of this document is to codify various aspects of how we write Haskell code as we contribute to Plutus.

The goal is twofold:

  1. To provide guidance in cases where it is not obvious what to do, or the choice is arbitrary.

  2. To help resolve disagreements between team members. Ideally these can be settled by referring to a recommendation in this document, or if not, then perhaps the resolution of the disagreement will create a new recommendation.

The contents of this document are “recommendations”, that is:

  • They are more concrete than “principles” (like “write simple code”) because we want them to be clearly action-guiding.

  • They are more vague than “rules” (like “never use function X”) because it’s usually impossible to be that precise, and so judgement is still required in applying them.

Haskell

Basic formatting

Use 4 spaces for indentation. Avoid lines over 100 characters. This is codified in .editorconfig.

Format Haskell source with stylish-haskell (config in .stylish-haskell.yaml).

Reasoning

The indentation is arbitrary. The line length is longer than the “traditional” 80 characters, but anecdotally Haskell code tends toward longer lines, and breaking them up doesn’t help readability much.

stylish-haskell is a fairly modest formatter that doesn’t touch too much. At the moment we mostly use it for formatting imports!

In the past we tried ormolu, but its lack of support for multi-line comments was a deal-breaker.

Write type signatures

Every top-level binding should have a type signature. Most other named bindings (e.g. let and where bindings) should also have type signatures.

Reasoning

Haskell is traditionally lauded for its excellent type inference. People used to make the argument that this brought Haskell closer to dynamically-typed languages in ease of use: you don’t have to write type signatures, the compiler will infer it for you.

However, not writing type signatures has a heavy maintainability cost.

Non-local errors

If types are inferred, then the way a binding is used can affect the type which is inferred for it. That means that a mistake in using a binding can result in an error inside the binding (or elsewhere) due to the inferred type not matching what goes on in the RHS of the binding. More generally, errors can end up appearing in unexpected and counter-intuitive places.

Pinning down the type of a binding means that any errors relating to using that binding will occur at the use site, where they belong!

Documentation

It’s a Haskell truism that the type forms part of the documentation of a binding. But that requires you to be able to see the type. Of course, in this day and age, we should all have an IDE that shows us the type on hover. But sometimes you’re stuck using vim. Or you’re reviewing the code on Github. Or the IDE is broken.

So do your colleagues a favor and just write it down.

Silent changes

Changing the type of a binding often means that something relatively significant has changed. But if the type is inferred, this can happen without you noticing it. This is almost always bad!

Write kind signatures

Every type definition that has parameters which are not all of kind Type should have a kind signature using StandaloneKindSignatures.

Examples

Bad:

data Term tyname name uni fun a

Good:

type Term :: Type -> Type -> (Type -> Type) -> Type -> Type -> Type
data Term tyname name uni fun a

Reasoning

The reasoning is essentially the same as for type signatures. We’ve got used to GHC inferring all this, and in the past we didn’t even have the means to easily state kind signatures. But these days with people using fancier type-level machinery, and with better support for kind signatures, it’s time to just start writing them down.

Write Haddock

Every top-level exported binding should have Haddock. Non-exported bindings should probably have Haddock too.

Put the module’s haddock comment right above the module M where line, and below the PRAGMAs.

Reasoning

Prefer pattern-matching

Prefer to use pattern matching where possible, unless it significantly complicates the code.

Examples

Instead of an equality check

Bad:

data SortOrder = Ascending | Descending
    deriving Eq

sortWithOrder' :: Ord a => SortOrder -> [a] -> [a]
sortWithOrder' order = f . sort
  where
    f = if order == Ascending then id else reverse

Good:

sortWithOrder :: Ord a => SortOrder -> [a] -> [a]
sortWithOrder Ascending  = id . sort
sortWithOrder Descending = reverse . sort
Instead of destructor functions

Bad:

either f g x

Good:

case x of
  Left e -> f e
  Right s -> g s

But this might be okay:

fmap (either f g) eithers

Reasoning

Pattern matching is easy to read, and allows the compiler to give better errors and warnings (e.g. incomplete match warnings).

Avoid punning

Avoid using the same names for things at the term and type level. Except for newtype` constructors.

Examples

Bad:

data Foo = Foo Int

Good:

data Foo = MkFoo Int

Avoid ticks in function names

Generally avoid using ticks to distinguish function names. All this conveys is that it is “another” version of the function. Try expressing the difference in the function name, even if it makes it longer.

Examples

Bad:

runCek
runCek

Good:

runCek
runCekWithLogs

Reasoning

It’s a tempting naming convention, but no one likes reading code with such functions. The function names should convey helpful information when possible.

Avoid multi-line parenthesized expressions

A parenthesized expression should not span multiple lines. Pull it out to a named binding, use $, or otherwise reorganize the code.

Examples

Bad:

foldr (\a acc -> let
    x = ...
  in a + x + acc) x xs

Good:

foldr meaningfulName x xs
  where
    meaningfulName :: ...
    meaningfulName a acc = let x = ... in a + x + acc

Reasoning

A parenthesis forces the user to keep a stack in their head to remember when the current "argument" finishes. Line length limits this to some degree, but if we allow line breaks then the amount of stack can become quite unwieldy.

This also explains why $ is good: since it indicates there will be no closing paren, there is no need for a stack (it’s the "tail call" of bracketing).

Use deriving strategies

Always use deriving strategies.

Reasoning

Not using deriving strategies requires the compiler to guess which strategy you want. This can have consequences, especially when DeriveAnyClass is enabled, since you can accidentally end up using anyclass deriving when you didn’t mean to. Better to be explicit.

Use Text

Use Text instead of String unless you have a good reason not to.

Reasoning

It’s 2022, use a proper, unicode aware string type instead of a linked list.

Derive Show, define Pretty, use Pretty for human-readable output

Always derive Show, do not define it manually.

Always use Pretty for human-readable output, not Show.

Always define Pretty explicitly (when you need it). It’s okay to delegate to the Show instance if you think it’s good enough.

Reasoning

The derived version of Show is always useful as a way of seeing the explicit strucuture of a value as a Haskell value. Defining Show can mean that this is no longer true, and you can’t do a better job than the derived version.

We use Pretty always for human-readable output, because it’s actually friendly to layout and the derived Show instance is not usually human friendly. Therefore if you need to produce output for humans, define a Pretty instance. This will typically need to be hand-written, unless it happens that you can defer to the Show instance, e.g. for simple enums Show can be fine since it just prints the constructor names.

Use the package name as the root of the module name hierarchy

If the package is foo-bar, then the modules should all be FooBar.X.

Reasoning

See “Naming conventions” here. We do it slightly differently (“FooBar” rather than “Foo.Bar”), but the main principle is the same.

Reading direction

Try to keep a single line mostly reading left-to-right or right-to-left.

Examples

Bad:

traverse (\x -> <some long function body) things

Good:

for xs $ \x -> 

Reasoning

Haskell can get quite condensed and hard to read, especially when the reading direction changes frequently. Often there are symmetrical versions of operators like ⇐< and >⇒ or =<< and >>= that you can switch between to make code easier to read.

Avoid orphan instances except where they’re safe

Avoid orphan instances, but don’t worry about it if you can be sure that they’re safe.

Reasoning

See the blog post.

Make data strict by default, use nothunks for long-lived data structures

Use StrictData for new code; make fields strict unless you have a good reason not to.

For data structures that might live for a long time, use nothunks to assert that they don’t contain unexpected thunks.

Reasoning

A painful lesson of Haskell in production is that space leaks really suck, are a huge pain to track down, and can originate in surprising locations. This suggests that it’s worth a bit of proactive paranoia: just make things strict as much as possible, in the hopes of squashing any nascent space leaks.

This may seem like overkill to you…​ until you’ve experienced debugging a space leak!

Extensions

The Good

These are basically all fine and can be put in default-extensions.

  • Anything in GHC2021. Once we have a GHC version that supports the GHC2021 language, we will likely switch to using it.

  • LambdaCase: clear, helpful

  • DerivingStrategies: always

  • GADTs: well established, useful

  • OverloadedStrings: essential when working with Text, which you should

  • NegativeLiterals

  • DerivingVia: great

  • RoleAnnotations: if you need it, you need it

The Situational

The following extensions are generally fine if you find that they’ll make your life much easier, but you probably don’t want to use them all the time.

  • RecordWildCards

  • TypeFamilies: often very useful, but can make things tricky. Think before using.

  • DataKinds

  • FunctionalDependencies

  • ViewPatterns: can be very nice, can be a huge mess

  • OverloadedLists: sometimes a lifesaver, not as indispensable as OverloadedStrings

The Bad

UnicodeSyntax: not worth it

Imports

Avoid complex imports statements

If you find you have:

  1. A long explicit import list

  2. Several hiding declarations

Then either just import the module in its entirety, or qualify it. Usually if you are using hiding you will need to qualify it.

Examples

Bad:

import Control.Lens (first, … , _Right) hiding (ix, lens)

Good:

import Control.Lens qualified as Lens

Reasoning

Complex import statements are difficult to maintain and cause annoying diffs which are also hard to merge. Qualified function usages are quite easy to read, and not that much worse to write.

Prefer importing specific module to the whole umbrella

When working inside a package that exports an “umbrella module”, avoid importing that module directly.

Examples

Bad:

import PlutusCore

Good:

import PlutusCore.Name

Reasoning

Since the umbrella module likely imports everything else, it is easy to accidentally end up with cyclic imports if you import it. Outside the package where it is defined this is usually not a problem.

Libraries

Prefer using the strict versions of data structures

Use the strict versions of most data structures by default unless you have a good reason not to.

Reasoning

Lazy data structures are easy ways to get space leaks, and the performance difference is typically negligible.

Use lenses judiciously

Use lenses where they allow a significant simplification of the code. For simple use cases just use normal record accessors.

Reasoning

Arguably if we’re going to allow lenses in our codebase and force people to know about them, we should commit to them wholesale and use them everywhere. But in practice we just use them for places where they’re hard to beat.

data-default

Don’t use data-default, instead just define defaultX values for your X type.

Reasoning

Default is not terribly bad. It’s truly ad-hoc polymorphism: all you get is name reuse, you can’t (or shouldn’t) write a function that’s polymorphic over Default a. That’s fine, but it also means that the benefit is fairly marginal.

Additionally, just defining specific values is more flexible. If, say, you want multiple default values for different contexts, then that is straightforward with values but not with Default.

Cabal

default-language

Use Haskel2010.

Reasoning

It’s the latest.

default-extensions

Put your commonly-used extensions in default-extensions rather than repeating them constantly.

Reasoning

It’s nice for files to be self-contained, but this is typically a fiction: you need to know about compilation flags from cabal files anyway. It saves a lot of typing to put the really essential stuff in the cabal file.

Haskell “languages” are basically a blessed set of extensions anyway, and people are fine putting those in the cabal file. A lot of what we’re currently doing is manually implementing the GHC2021 language!

Warning flags

Use the following set of warning flags:

-Wall
-Wnoncanonical-monad-instances
-Wincomplete-uni-patterns
-Wincomplete-record-updates
-Wredundant-constraints
-Widentities
-Wunused-packages
-Wmissing-deriving-strategies

Reasoning

GHC’s warnings are generally pretty good. -Wall doesn’t include them all, so we add some additional useful ones.

Warnings as errors

Don’t set -Werror by default, only set it in CI builds.

Reasoning

Working with -Werror enabled is very disruptive, because you can’t e.g. have an unused variable or import even temporarily.

However, it is very useful to keep our code warning-free, so setting -Werror in CI is recommended.

Common stanzas

Use a common stanza (usually called “lang”) to include a) the language (Haskell2010), b) the default-extensions, c) the default set of warnings.

Reasoning

Common stanzas are great and make it easier to keep things in sync.

Multiple public libraries

Use multiple public libraries judiciously. For now, only use them for additional libraries to be used in test code (“testlibs”).

Reasoning

Multiple public libraries are a very useful feature, but they’re not entirely mature yet. One place where they are invaluable is to export a “test library” containing code for testing the main library, without forcing the main library to depend on test libraries.

In due course we may want to use them more widely.

Internal libraries

Use internal libraries where it is useful to enforce a clear separation of a “sub-package”.

Reasoning

Internal libraries are fairly well supported and make it easy to totally segregate a “sub-package” from the main library. This can be useful for, say, a standalone implementation of a data structure, or similar.