Parser combinators for Augmented BNF grammars (RFC 4234)
The abnf
library provides a collection of combinators to help constructing parsers
for Augmented Backus-Naur form (ABNF) grammars
RFC 4234.
The combinator procedures in this library are based on the interface provided by the lexgen library.
(char CHAR) => MATCHER
Procedure char
builds a pattern matcher function that matches a
single character.
(lit STRING) => MATCHER
lit
matches a literal string (case-insensitive).
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
(alpha STREAM-LIST) => STREAM-LIST
Matches any character of the alphabet.
(binary STREAM-LIST) => STREAM-LIST
Matches [0..1].
(decimal STREAM-LIST) => STREAM-LIST
Matches [0..9].
(hexadecimal STREAM-LIST) => STREAM-LIST
Matches [0..9] and [A..F,a..f].
(ascii-char STREAM-LIST) => STREAM-LIST
Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).
(cr STREAM-LIST) => STREAM-LIST
Matches the carriage return character.
(lf STREAM-LIST) => STREAM-LIST
Matches the line feed character.
(crlf STREAM-LIST) => STREAM-LIST
Matches the Internet newline.
(ctl STREAM-LIST) => STREAM-LIST
Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].
(dquote STREAM-LIST) => STREAM-LIST
Matches the double quote character.
(htab STREAM-LIST) => STREAM-LIST
Matches the tab character.
(lwsp STREAM-LIST) => STREAM-LIST
Matches linear white-space. That is, any number of consecutive
wsp
, optionally followed by a crlf
and (at least) one more
wsp
.
(sp STREAM-LIST) => STREAM-LIST
Matches the space character.
(vspace STREAM-LIST) => STREAM-LIST
Matches any printable ASCII character. That is, any character in the decimal range of [33..126].
(wsp STREAM-LIST) => STREAM-LIST
Matches space or tab.
(quoted-pair STREAM-LIST) => STREAM-LIST
Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.
(quoted-string STREAM-LIST) => STREAM-LIST
Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.
The following additional procedures are provided for convenience:
(set CHAR-SET) => MATCHER
Matches any character from an SRFI-14 character set.
(set-from-string STRING) => MATCHER
Matches any character from a set defined as a string.
(concatenation MATCHER-LIST) => MATCHER
concatenation
matches an ordered list of rules. (RFC 4234, Section 3.1)
(alternatives MATCHER-LIST) => MATCHER
alternatives
matches any one of the given list of rules. (RFC 4234, Section 3.2)
(range C1 C2) => MATCHER
range
matches a range of characters. (RFC 4234, Section 3.4)
(variable-repetition MIN MAX MATCHER) => MATCHER
variable-repetition
matches between MIN
and MAX
or more consecutive
elements that match the given rule. (RFC 4234, Section 3.6)
(repetition MATCHER) => MATCHER
repetition
matches zero or more consecutive elements that match the given rule.
(repetition1 MATCHER) => MATCHER
repetition1
matches one or more consecutive elements that match the given rule.
(repetition-n N MATCHER) => MATCHER
repetition-n
matches exactly N
consecutive occurences of the given rule. (RFC 4234, Section 3.7)
(optional-sequence MATCHER) => MATCHER
optional-sequence
matches the given optional rule. (RFC 4234, Section 3.8)
(pass) => MATCHER
This matcher returns without consuming any input.
(bind F P) => MATCHER
Given a rule P
and function F
, returns a matcher that first
applies P
to the input stream, then applies F
to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is empty.
(bind* F P) => MATCHER
The same as bind
, but will signal success if the input stream is
empty.
(drop-consumed P) => MATCHER
Given a rule P
, returns a matcher that always returns an empty
list of consumed tokens when P
succeeds.
abnf
supports the following abbreviations for commonly used combinators:
; ::
: concatenation
; :?
: optional-sequence
; :!
: drop-consumed
; :s
: lit
; :c
: char
; :*
: repetition
; :+
: repetition1
The following parser libraries have been implemented with abnf
, in
order of complexity:
- csv
- internet-timestamp
- json-abnf
- mbox
- smtp
- internet-message
- mime
(import abnf)
(define fws
(concatenation
(optional-sequence
(concatenation
(repetition wsp)
(drop-consumed
(alternatives crlf lf cr))))
(repetition1 wsp)))
(define (between-fws p)
(concatenation
(drop-consumed (optional-sequence fws)) p
(drop-consumed (optional-sequence fws))))
;; Date and Time Specification from RFC 5322 (Internet Message Format)
;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;; Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional.
;; Match the abbreviated weekday names
(define day-name
(alternatives
(lit "Mon")
(lit "Tue")
(lit "Wed")
(lit "Thu")
(lit "Fri")
(lit "Sat")
(lit "Sun")))
;; Match a day-name, optionally wrapped in folding whitespace
(define day-of-week (between-fws day-name))
;; Match a four digit decimal number
(define year (between-fws (repetition-n 4 decimal)))
;; Match the abbreviated month names
(define month-name (alternatives
(lit "Jan")
(lit "Feb")
(lit "Mar")
(lit "Apr")
(lit "May")
(lit "Jun")
(lit "Jul")
(lit "Aug")
(lit "Sep")
(lit "Oct")
(lit "Nov")
(lit "Dec")))
;; Match a month-name, optionally wrapped in folding whitespace
(define month (between-fws month-name))
;; Match a one or two digit number
(define day (concatenation
(drop-consumed (optional-sequence fws))
(alternatives
(variable-repetition 1 2 decimal)
(drop-consumed fws))))
;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))
;; Match a two-digit number
(define hour (repetition-n 2 decimal))
(define minute (repetition-n 2 decimal))
(define isecond (repetition-n 2 decimal))
;; Match a time-of-day specification of hh:mm or hh:mm:ss.
(define time-of-day (concatenation
hour (drop-consumed (char #\:))
minute (optional-sequence
(concatenation (drop-consumed (char #\:))
isecond))))
;; Match a timezone specification of the form
;; +hhmm or -hhmm
(define zone (concatenation
(drop-consumed fws)
(alternatives (char #\-) (char #\+))
hour minute))
;; Match a time-of-day specification followed by a zone.
(define itime (concatenation time-of-day zone))
(define date-time (concatenation
(optional-sequence
(concatenation
day-of-week
(drop-consumed (char #\,))))
date
itime
(drop-consumed (optional-sequence fws))))
(define (err s)
(print "lexical error on stream: " s)
`(error))
(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))
- 8.3 Removed unneeded dependency on yasos [thanks to Mario Domenech Goulart]
- 8.0 Ported to CHICKEN 5 and yasos collections interface
- 7.0 Added bind* variant of bind [thanks to Peter Bex]
- 6.0 Using utf8 for char operations
- 5.1 Improvements to the CharLex->CoreABNF constructor
- 5.0 Synchronized with lexgen 5
- 3.2 Removed invalid identifier :|
- 3.0 Implemented typeclass interface
- 2.9 Bug fix in consumed-objects (reported by Peter Bex)
- 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
- 2.6 Bug fixes in consumer procedures
- 2.5 Removed procedure memo
- 2.4 Moved the definition of bind and drop to lexgen
- 2.2 Added pass combinator
- 2.1 Added procedure variable-repetition
- 2.0 Updated to match the interface of lexgen 2.0
- 1.3 Fix in drop
- 1.2 Added procedures bind drop consume collect
- 1.1 Added procedures set and set-from-string
- 1.0 Initial release
Copyright 2009-2021 Ivan Raikov
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A full copy of the GPL license can be found at http://www.gnu.org/licenses/.