Common Lisp doesn't have any pattern-matching facilities in the language. A number have been written: CLiki has a list: Marco Antoniotti's CL-UNIFICATION is my favourite, as I used to be interested in unification grammars. Many of these systems are quite general: they seek to be able to match very general objects to be extensible and to have very good performance. This causes inevitable hair in their implementations, and also means that they often make doing something rather simple much harder than it needs to be.
The simple thing that should be easy is providing a generalised version of destructuring-bind
or, equivalently1, macro argument lists. That's what dsm
does, and that's all it does: if you understand destructuring-bind
and case
you can pretty much stop reading now: destructuring-match
is pretty much case
except that the cases are lambda lists for destructuring-bind
.
dsm
's whole purpose in life is to allow you to pattern match against source code: it does not, for instance, support matching against instances of general classes, because instances of general classes do not occur in source code. It is a tool to make doing what Lisp does best easier: implementing programming languages built on Lisp. And that is all it does. Because this is all it is meant to do,dsm
cares about correctness, but it does not care at all about performance: the performance of macroexpansion never matters2.
- An example
- The interface
- Conditions
- Some simple examples
- Another example: define-destructuring-macro
- A final example: destructuring-bind
- Notes on the implementation
- Extensions
- Other notes
- Lost futures
- Package, module, feature, dependencies
As an example, let's consider a macro where there are a few possible variations on the syntax:
(with-foo x ... use x ...)
(with-foo (x a) ... use x ...)
(with-foo (x a b) ... use x ...)
(with-foo ((x y) a b) ...)
Well, to write a reasonably error-protected version of this you end up with code like, for instance
(defmacro with-foo (binding &body forms)
(typecase binding
(symbol
...)
(cons
(destructuring-bind (thing . more) binding
(typecase thing
(symbol ...)
(cons
(unless (= (length thing) 2)
(error ...))
(destructuring-bind (var opt) thing
...))
(t (error ...)))))
(t (error ...))))
And this is just horrible.
This can be improved with, for instance, a system like my simple pattern matcher, spam
:
(defmacro with-foo (binding &body forms)
(matching binding
((var)
...)
((one-of
(list-matches (var) (any))
(list-matches (var) (any) (any)))
(destructuring-bind (var &optional a1 a2) binding
...))
((one-of
(list-matches (list-matches (var) (any)) (any))
(list-matches (list-matches (var) (any)) (any) (any)))
(destructuring-bind ((var opt) &optional a1 a2) binding
...))
(otherwise
(error ...))))
This is a lot better.
The underlying problem here is that, before you can use destructuring-bind
you need to wrap numerous guards around it, and this is especially true if you want to allow variations in syntax which don't match the same lambda list. What would be nice is something that did what destructuring-bind
does, but which could also try several possibilities. Like this:
(defmacro with-foo (binding &rest forms)
(destructuring-match binding
(var
(:when (symbolp var))
...)
((var &optional a1 a2)
(:when (symbolp var))
...)
(((var opt) &optional a1 a2)
(:when (symbolp var))
...)
(otherwise
(error ...))))
This is what dsm
lets you do: it provides a macro, destructuring-match
, which understands lambda lists similar to destructuring-bind
s although slightly extended, except that it also matches against many possible lambda lists, and that matches can have 'guard clauses' which allow arbitrary additional tests before a match succeeds.
Again,dsm
is not intended as a general-purpose pattern matcher: all it does is allow matching against many possible lambda lists, succeeding on the first match. Guard clauses allow some additional tests before a match succeeds, but that's it. The best way to understand dsm
is that it's a tool for writing macros: it's not anything more general than that. But as a tool for writing macros it can make your life a lot easier. It would be relatively simple to implement, on top of destructuring-match
, a pattern-matching macro language like Scheme's syntax-rules
although without hygiene of course3.
dsm
provides a single macro: destructuring-match
.
destructuring-match
is a macro which combines something like case
with a slightly extendeddestructuring-bind
.
(destructuring-match <expression>
<clause> ...)
<expression>
is any expression.<clause>
is one of(<lambda-list> [<guard>] ...)
;(otherwise ...)
;(t ...)
.
- For
<lambda-list>
see below. <guard>
is a form like(<when/unless> expression ...)
, where<when/unless>
is:when
or:unless
.
The lambda lists understood by destructuring-match
are4 the same as the lambda lists understood by destructuring-bind
, extended in two ways:
- a 'lambda list' which is a symbol binds the whole value of the expression, in the same way that
(lambda x ...)
does in Scheme; - any variable whose name is
_
, regardless of package, is a 'blank', and is turned into an anonymous variable which is ignored, with each occurrence of such a variable being distinct.
The guards specified by a guard clause may be repeated, so (:when ... :unless ... :when ...)
is perfectly legal. Guards are evaluated after variables are bound but before the match is committed. If the guards fail the next clause is tried.
Clauses which beginotherwise
and t
are the same: they're the default case, and bind no variables.
In addition, dsm
exposes three condition classes:
dsm-error
is the condition type of all errors that dsm
knowingly signals.
dsm-error/yours
is the type of errors signalled by dsm
which it considers to be your fault: things like botched lambda lists and so on.
dsm-error/mine
is the type of errors signalled by dsm
which it considers to be its fault. Please report any of these.
Of course, other errors may occur which it has not foreseen: report these also.
(defmacro bind (v/v &body forms)
(destructuring-match v/v
(var
(:when (symbolp var))
`(let ((,var nil)) ,@forms))
((var val)
(:when (symbolp var))
`(let ((,var ,val)) ,@forms))
(otherwise
(error "bad binding ~S" v/v))))
is a trivial version of let
which binds only one variable.
(defmacro bind* (bindings &body forms)
(destructuring-match bindings
(()
`(locally ,@forms))
(((var &optional (val nil)))
(:when (symbolp var))
`((lambda (,var) ,@forms) ,val))
((var)
(:when (symbolp var))
`((lambda (,var) ,@forms) nil))
(((var &optional (val nil)) . more)
(:when (symbolp var))
`((lambda (,var) (bind* ,more ,@forms)) ,val))
((var . more)
(:when (symbolp var))
`((lambda (,var) (bind* ,more ,@forms)) nil))
(otherwise
(error "what even is this?"))))
is a more elaborate version of let*
.
An example of blank variables: this function will extract a list of keyword variable names from the various possible keyword argument specifications allowed by CL:
(defun keyword-variable-names (keyword-argument-specifications)
(mapcar (lambda (spec)
(destructuring-match spec
(v
(:when (symbolp v))
v)
((v _ &optional _)
(:when (symbolp v))
v)
(((_ v) _ &optional _)
(:when (symbolp v))
v)
(otherwise
(error "not a keyword argument specification"))))
keyword-argument-specifications))
Without blank variables, something like this would need to be covered in explicitignore
declarations.
destructuring-match
was designed for writing macros, and it's easy to use it to write this5:
(defmacro define-matching-macro (name &body clauses)
(let ((<whole> (make-symbol "WHOLE"))
(<junk> (make-symbol "JUNK")))
(destructuring-match clauses
((doc . the-clauses)
(:when (stringp doc))
`(defmacro ,name (&whole ,<whole> &rest ,<junk>)
,doc
(destructuring-match ,<whole> ,@the-clauses)))
(the-clauses
`(defmacro ,name (&whole ,<whole> &rest ,<junk>)
(destructuring-match ,<whole> ,@the-clauses))))))
This can then be used, for instance, like this:
(define-matching-macro with-scrot
((_ (s) &body forms)
(:when (symbolp s))
`(call-with-simple-scrot (lambda (,s) ,@forms)))
((_ (s v) &body forms)
(:when (symbolp s))
`(call-with-general-scrot (lambda (,s) ,@forms) (ensure-scrot ,v)))
(badness
(error "~S is a bad bad thing" badness)))
So now it's very easy to write macros which accept several argument patterns and which can also report syntax errors usefully.
My friend Zyni6 pointed out to me that, of course, destructuring-match
can implement destructuring-bind
. This implementation (due to her) may not be completely correct but it gets the point across:
(defmacro destructuring-bind (ll form &body decls/forms)
(multiple-value-bind (decls forms) (parse-simple-body decls/forms)
`(destructuring-match ,form
(,ll
,@decls
(progn ,@forms)))
(otherwise (error "fleașcă"))))
(The reason to parse the declarations out like this is so you don't have to think about bodies which look like ((:when ...) ...)
and worry about whether:when
or :unless
are legal operator names. parse-simple-body
comes from org.tfeb.hax.utilities
.)
dsm
has to implement its own parsing and compilation of lambda lists. It is intended to be compatible with destructuring-bind
with the extension of 'lambda lists' which are a symbol. However there are many corner cases, especially around keyword handling: I think it gets these right, but there may be bugs remaining: please let me know if you find any.
There is a fairly extensive test-suite included with dem
which tests quite a lot of the corner cases, but it is not completely comprehensive, and I may also just not understand a lot of the corner cases of lambda list parsing.
When dsm
binds a variable to an object which is part of the structure of the lambda list, that object will actually be part of that structure. So, for instance
(let ((l '(a b c)))
(destructuring-match l
((&rest r)
(eq l r))))
will be true.
A lambda list like (a &rest b)
just peels off the first element of the thing it's matching for a
and then puts everything else in b
: that may not be a proper list or even a cons. In particular
this | is the same as this |
---|---|
(a b ... &rest c) |
(a b ... . c) |
(&rest r) |
r |
and so on. Checking that &rest
lists are in fact proper lists is expensive and probably not actually useful since it would make cyclic structures impossible to match, so dsm
does not waste time doing so.
Declarations are 'raised' to where they belong by the compiler, so something like
(destructuring-match x
((&key y)
(:when (evenp y))
(declare (type integer y))
y))
Will do the right thing, and the guard clause will be within the scope of the declaration.
The system should now deal with the short form of type declarations properly. This is hard in CL because there is no predicate which tells you whether something is a valid type specifier or not. It now works by using canonicalize-declaration-specifier
which should reliably deal with these.
However, if you want to declare types, just use the long form7.
Other standard declaration types which affect variable bindings, such as ignore
, dynamic-extent
and so on, are also raised. If there are implementation-specific declarations which effect bindings they won't be raised.
dsm
can generate code like this as a particular case (this often happens when using &rest
lists in particular):
(let ((x '()))
(if (not (null x))
(fail ...)
...))
This can cause SBCL at least to mutter about eliminating dead code: I decided that the additional complexity to deal with this special case wasn't worth it: probably any serious compiler will work it out and remove the test.
dsm
contains the seeds of what could be a general-purpose lambda list parser & compiler, which could, in theory, be taught how to parse & compile other sorts of lambda lists, including lambda lists not native to CL. At present these are not well-separated from the code that recognizes and compiles destructuring-bind
-style lambda lists, but they might one day be.
Since dsm
is intended for use in macros I made no real attempt to worry about performance. There is a small set of rudimentary benchmarks which compare its performance with destructuring-bind
for various cases: the results are obviously implementation-dependent, but generally it seems to be between about 1/2 and 1/10 the speed. Given that it's portable code I'm happy with this8.
dsm
contains an 'implementation' layer which has its own package, and which may one day expose more of it with some documented interface. Currently this interface is entirely internal to dsm
and may change at any time.
dsm
has some extensions (currently one) which live in the package org.tfeb.dsm/extensions
.
If you are trying to match against some syntax which contains 'noise words', then you can end up writing code like this
(destructuring-match form
((a -> b)
(:when (eql -> '->))
...)
...)
literals
helps with this. You can write, instead:
(destructuring-match form
((a -> b)
(:when (literals ->))
...)
...)
or, more generally
(destructuring-match form
((a = b -> c)
(:when (literals = ->))
...)
...)
which is why it is literals
not literal
. Each argument to literals
can be one of two things.
- A variable name, which is checked to be bound to its own name.
- A list of the form
(<v> &key as test)
where,<v>
is a variable name,as
is a list which it is matched against andtest
is the test function.as
defaults to a list of the variable's name, andtest
to#'eql
. Bothas
andtest
are evaluated. In this case the variable must be both a symbol and it must match an element of the list undertest
.
The second form allows things like
(destructuring-match form
((a in b)
(:when (literals (in :test #'string=)))
...)
...)
which will test that in
is bound to any symbol whose name is "IN"
.
'Lambda lists' which are symbols happened by mistake (there's what is essentially an error in the recognizer where it's looking for dotted lambda lists), but they are in fact so useful that I decided this was a feature, not a bug.
dsm
contains a lambda list parser and compiler which, in principle, are fairly general. Cleaning up and exposing their interfaces was too exhausting when I was writing dsm
but might happen in future.
What constitutes a blank variable is parameterized internally and could be made user-configurable. On the other hand everybody elsr uses _
, so I am not sure any useful purpose would be served by doing so.
dsm
lives in org.tfeb.dsm
and provides :org.tfeb.dsm
. The extension package is org.tfeb.dsm/extensions
. There is an ASDF system definition for both it and its tests.
dsm
depends on a fair number of other things I have written: if you have a recent Quicklisp distribution then it should know about all of them. At least, by the time dsm
makes it into Quicklisp it should. If not, you need at least version 9 of my CL hax, and at least version 8 of my CL tools.
Destructuring match is copyright 2022-2024 by Tim Bradshaw. See LICENSE
for the license.
Footnotes
-
Almost equivalently: neither
destructuring-bind
notdestructuring-match
support the&environment
lambda list keyword. ↩ -
Once, perhaps, it did, but that was most of a lifetime ago. ↩
-
A toy version of a macro to define pattern-matching macros like this is included as an example. ↩
-
Or should be! ↩
-
Note that this is 12 lines, 6 of which is code to handle docstrings. ↩
-
Not her real name. ↩
-
I think you should always do this, anyway: the whole shorthand for type declarations is just a horrid mistake CL made, probably in the name of compatibility. ↩
-
Apocryphally it also outperforms some of those hairy pattern matchers which obsess about performance, although they obviously do a lot more than
dsm
does. ↩