Skip to content

Latest commit

 

History

History
504 lines (412 loc) · 22.6 KB

README.md

File metadata and controls

504 lines (412 loc) · 22.6 KB

vcard4-ts

vcard4-ts — A vCard 4.0 library with type safety first

Coverage badge

vcard4-ts was designed with the following goals:

  • Compliant with RFC 6350 and its extensions

  • TypeScript (and type safety) from the ground up

  • Avoid mistakes, DRY (Don't Repeat Yourself)

    • The data structure definition, created from RFC 6350, contains instructions for the parser
  • The returned data structure is easy to use

    • The decisions to be made by the calling code should be as few and as simple as possible. Everything that can be delegated to the IDE (while writing your code) and TypeScript compile time should be handled there. E.g., no need to check whether there is a single or multiple values: if something can occur multiple times, the item is always in an array.

In addition to RFC6350, the following RFCs are implemented:

  • RFC 6474: BIRTHPLACE, DEATHPLACE, and DEATHDATE properties
  • RFC 6715: EXPERTISE, INTEREST, HOBBY, ORG-DIRECTORY properties and LEVEL, INDEX parameters
  • RFC 8605: CONTACT-URI property and CC (two-letter country code) parameter

vcard4-ts is compatible to the following RFCs, as it does not impose any limitation on string-valued parameters and values:

  • RFC 6473: The KIND:application property
  • RFC 7852: The TYPE=main-number parameter

Installation

yarn add vcard4-ts or npm i vcard4-ts. No dependencies (except devDependencies). And only about 10 kB (compressed) will end up in your code, the rest is tests, alternatives, debugging information, …

Usage

Simple example

Basic usage is straightforward:

import { parseVCards } from 'vcard4-ts';
import { readFileSync } from 'fs';

const vcf = readFileSync('example.vcf').toString();

const cards = parseVCards(vcf);
if (cards.vCards) {
  for (const card of cards.vCards) {
    console.log('Card found for ' + card.FN[0].value[0]);
  }
} else {
  console.error('No valid vCards in file');
}

We can see two basic principles in action:

  1. The types are always clear, no expensive run-time testing whether there is just a single value or there are multiple values. (This is the prime directive.)
  2. There are no null or undefined (aka nullish) values; and any arrays will always have at least one element. This is the secondary directive.

As a result of these principles, the following rules apply:

  1. Mandatory properties (BEGIN, END, VERSION, and FN) always do exist and are never null or undefined ("nullish").
  2. Optional properties (all the others defined in RFC 6350) only exist, if they do appear in the file. I.e., if they exist, they also have a value and are never nullish. (However, the strings may still be empty.)
  3. To match the prime directive, any property, whether mandatory or optional, that may appear more than once, is always an array of values.

These rules make software development more predictable and thus faster, less error-prone:

  • Typescript can verify type correctness.
  • Autocompletion and type inference in IDEs such as VSCode/VSCodium works and is very helpful.

More elaborate example

This example demonstrates the access to parsing errors and warnings, to structured information, and non-RFC properties. Explanations are in the design and reference sections below.

if (cards.nags) {
  // There were global problems, e.g. because the file did seem to contain invalid vCards.
  // Those cards can be obtained by passing `keepDefective: true` to `parseVCards()`.
  for (const nag of cards.nags) {
    if (nag.isError) {
      console.error(`${nag.key} (${nag.description}): ${nag.attributes}`);
    } else {
      console.warn(`${nag.key} (${nag.description}): ${nag.attributes}`);
    }
  }
}
for (let card of cards.vCards) {
  // If you would like element 0 to correspond to the most PREFerred item:
  sortByPREF(card);

  // You're guaranteed to have all these (required) properties,
  // no need to check their existence first. Also, the editor will
  // auto-complete and know the type.
  console.log('Found vCard with version ' + card.VERSION.value);
  console.log('Full name: ' + card.FN[0].value[0]);

  // Maybe some optional (any-cardinality) RFC6350 property is present?
  if (card.EMAIL) {
    // There might be multiple EMAIL property lines, but as the EMAIL field
    // is present, we're guaranteed to have at least one value. See
    // https://netfuture.ch/2021/11/array-thickening-more-can-be-less/
    console.log('Emailable at: ' + card.EMAIL[0].value);
    // Is it known whether it is a work or home address?
    if (card.EMAIL[0].parameters?.TYPE) {
      console.log('It is of type: ' + card.EMAIL[0].parameters.TYPE[0]);
    }
  }

  // The same with a structured any-cardinality property
  if (card.ADR) {
    // All elements of the address, including the locality, can have multiple
    // values. And we still could have multiple addresses (e.g., work and
    // home). We'll just print the first.
    console.log('Living in: ' + card.ADR[0].value.locality[0]);
  }

  // Any property not in the standard (and its extension RFCs)?
  // (Their name should be prefixed with `X-`)
  if (card.x) {
    for (const [k, v] of Object.entries(card.x)) {
      console.log('Non-RFC6350 property ' + k + ', with ' + JSON.stringify(v));
    }
  }

  // Any problems found while parsing the vCard?
  if (card.nags) {
    console.log(
      'While parsing this card, the following was noticed ' +
        '(and either the problematic part dropped or ignored)',
    );
    for (const nag of card.nags) {
      if (nag.isError) {
        console.error(`Global ${nag.key} (${nag.description})`);
      } else {
        console.warn(`Global ${nag.key} (${nag.description})`);
      }
    }

    // Some of these problems might be unparseable lines. They are archived
    // here.
    if (card.unparseable) {
      console.log('The following unparseable lines were encountered:');
      for (const line of card.unparseable) {
        console.log(line);
      }
    }
  }
}

Design

The prime design goal is to avoid mistakes in the code and enable calling code to avoid mistakes as well. Designing for (type) safety is achieved by Don't Repeat Yourself, Parse, don't validate, and Array thickening.

DRY

Don't Repeat Yourself was a basic design principle while developing the module. The description of the data structure is centralized. The goal was to have only a single authoritative source of type information, from which both compile-time type information and runtime parsing instructions would be derived. As TypeScript transpilation output no longer contains the type information, it was necessary to jump through hoops. (Luckily, Colin McDonnell's Zod was a great resource for educating about hoop-jumping.)

Parse, don't validate

The idea of parsing instead of validation was introduced by Alexis King, for the Haskell ecosystem. The gist of it: Directly parse the source data into the required (type-safe) format, instead of first parsing it into an (essentially) untyped format and then validating it to be of the right type. This assures that type safety starts earlier and is guaranteed to be consistent throughout the entire codebase.

In vcard4-ts, data structures are created and filled type-safe from the start. Because properties will be added on a line-by-line basis, required properties cannot be ensured to exist from the start. Therefore, as an exception to this rule, the existance of required fields is only ensured at the end.

Array thickening

The advantage of always having an array IMHO greatly outweighs the disadvantages. Calling code can always assume that the contents are an array. I.e., arrays with just a single value are never flattened (therefore the name). If you are only interested in one value, just use the one at index 0, which will always exist. If you want to deal with multiple values, use array methods such as map() and join(), which you can always use, because it is always an array. Yes, this results in more time and space spent during the creation of the data structure.

More importantly, this relieves calling code from performing case distinctions on every single access. Instead, the existence of the property can be asserted once and every reference to it later already knows how to deal with it. It is even possible to combine assertion and access with optional chaining.

Array thickening results in less code for the caller, which often also results in less code coverage, i.e., the uncommon case is not tested. In other words, array thickening turns the general case (whether common or uncommon) into the only case.

API

  • parseVCards(vcf: string, keepDefective?: boolean = false): ParsedVCards: Parse a string into possibly multiple VCards. Details below.
  • sortByPREF<T extends Partial<VCard4>>(vcard: T): Sort properties which exist multiple times by their preference parameter (1…100; the ones without PREF are sorted last).
  • groupVCard<T extends Partial<VCard4>>(vcard: T): GroupedVCard: Group properties with group labels into their named group (all non-lowercase names). Anything without an explicit group label will end up in the top. (GroupedVCard is Record<Uppercase<string> | 'top', Partial<VCard4>>).

Sorting and grouping are separate functions, not methods of an object, to ensure that their code will only be included if you call them.

If you need sorting and grouping, use the following sequence:

const cards = parseVCards(vcf);
if (cards.vCards) {
  for (const card of cards.vCards) {
    sortByPREF(card);
    const grouped = groupVCard(card);
    // Process the PREF-sorted groups here
  }
}

Reference

Property/parameter names

All vCard properties and parameters in the data structures are uppercase and dashes have been converted to underscores. This makes them clearly visible and easily accessible as JavaScript/TypeScript properties, avoiding the harder-to-type hash/array notation (i.e., card.SORT_AS instead of card['SORT-AS']).

Lowercase JavaScript/TypeScript properties are maintained by the parser.

Property cardinality

  • BEGIN, END, and VERSION exist exactly once (cardinality 1 in RFC6350; required value in TypeScript)
  • FN (full name) exists at least once (1* in RFC6350; optional array in TypeScript)
  • PRODID, UID, REV, KIND, N (name), BDAY, BIRTHPLACE, DEATHDATE, DEATHPLACE, ANNIVERSARY, and GENDER are optional (*1 in RFC6350; optional value in TypeScript)
  • All others can occur any number of times (* in RFC6350; optional array in TypeScript)

Property value type

  • N is an object with the following properties: familyNames, givenNames, additionalNames, honorificPrefixes, honorificSuffixes; each a required string[]. Remember that arrays are guaranteed to always have at least one element, i.e., the an empty honorificPrefixes property will be encoded as an array consisting of an empty string [''].
  • ADR is similar to N, but with the following string array fields: postOfficeBox, extendedAddress, streetAddress, locality (city), region, postalCode, and countryName.
  • GENDER consists of two strings, a required sex and an optional explanatory text. sex is required by RFC6350 to be one of M, F, O, N, U, or the empty string. However, this is not checked by vcard4-ts.
  • CLIENTPIDMAP consists of pidRef, a number, and a uri, a string.
  • All other properties' values are mapped to a single string, even if they are defined as more structured types, such as dates or URIs.

Property parameters

Properties can have (mostly optional) parameters:

  • PREF is a number. It is not asserted whether it is in the range [1…100] required by the RFC; non-numeric values are returned as NaN.
  • INDEX is a number. It is not asserted whether it is a strictly positive integer as mandated by RFC6715; non-numeric values are returned as NaN.
  • PID, TYPE, and SORT_AS (SORT-AS in the VCF) are string[]s, again with a guaranteed minimum array length of 1. (Please note that the example in the RFC quotes the enumeration of TYPEs, which seems inconsistent with the TYPE definition, so you may want to apply split(',') to all TYPE values first.)
  • All others are single strings.

Non-RFC properties and parameters

Any property or parameter whose type is not explicitely given in RFC6350 and the RFCs that extend it, including those prefixed by X-, are not included at the same level as the rest of the properties. One reason is that TypeScript does not really allow default types on object properties and therefore, nested index signatures are recommended for this.

Instead, non-RFC properties and parameters are put into an x object property. The actual value will be a plain, unprocessed string. If it has more structure, you need to extract it yourselves, e.g. using

  • scan1DValue(), which unescapes and splits at the specified splitChar (,, as used for PID or TYPE parameters; or ;, as used for the GENDER value); or
  • scan2DValue(), which splits into a string[][] at ; and , (used for ADR and N values).

For example, the string value of an X-ABUID property in card card would be available as card.x.X_ABUID.value.

Handling errors

Your application can just ignore the errors, if it does not want to bother.

One of the design goals so obvious that it was not specifically mentioned above, is that vcard4-ts should be as easy to use as possible. Anyone who ever had to deal with user-specified input can tell horror stories about what can go wrong. Last but not least, ensuring user-specified input fulfills certain requirements is also a matter of security.

Therefore, parseVCards() returns the information in a format as consistent as possible, minimizing doubt and variability. In general, any line that cannot be parsed is ignored, and any vCard which does not fulfill minimum criteria is discarded.

This process is documented in the nags property of the returned object(s). The nags property is an array of warnings and errors that occurred during the processing.

Warnings and errors

A warning indicates that even though the input does not fulfill an RFC6350 criteria, the parser believes that it could safely correct the problem and that the data returned is probably exactly what its originator meant it to be.

An error, on the other hand, indicates that some information was dropped, or, alternatively, that some required information was added. The resulting parsed data is not the same as originally provided, but it is the best the parser could do to achieve RFC6350 conformance.

If at least one actual error (not just warnings) is included in the nags, hasErrors is set to true. Depending on the policy of the calling code,

  • data can be accepted as returned by the parser (most lenient),
  • data can be refused if hasErrors is true (it always exists, but hopefully is false), or
  • data can be refused if nags exists (i.e., any errors or warnings occured; the most strict policy).

Global, local, and mixed nags

Local nags are specific to a vCard and are stored there, alongside the properties.

Local nags have the following type:

{
  key: string; // A short string to match against in the code
  description: string; // A longer english-language description to display to the user
  isError: boolean; // Error or warning?
  attributes: {
    property: string; // The property it occurred at (or '', if there was a property name parsing problem)
    parameter?: string; // If the problem occurred while parsing a parameter, this is its name
    line?: string; // The first few characters of the line on which this error occurred
  }
}

Global nags are set at the top level of the returned structure, alongside the vCards field, if it exists. They indicate problems not related to a vCard, or related to a vCard which was not included because it was considered too bad to be returned.

Global nags use the same type as local nags above, but without the attributes.

Mixed nags are used to indicate errors affecting an entire vCard (there are no mixed warnings). If parseVCards() detects a major problem with a vCard (VCARD_BAD_TYPE or VCARD_NOT_BEGIN), then—by default—this vCard is dropped and the error—unable to be stored in the vCard itself—is bubbled up to the global level. However, if keepDefective=true is passed as an optional argument, these vCards are not dropped and the error is stored in the vCard itself.

The nags

  • FILE_EMPTY: A global error.
  • FILE_CRLF: A global warning, that lines did not end in carriage return+line feed as specified in RFC6350, but just with line feeds. (This only checks the first line end and is therefore subject to false negatives, if line ends are not consistent.)
  • VCARD_BAD_TYPE: A mixed error resulting in a defective card. The BEGIN or END property does not have the required VCARD value.
  • VCARD_NOT_BEGIN: A mixed error resulting in a defective card. The first property of the vCard is not a BEGIN property.
  • VCARD_MISSING_PROP: A local error. A required property is missing and has been added with a default value. The default for VERSION is 4.0; for FN, the empty string.
  • PROP_NAME_EMPTY: A local error. The property has an empty name.
  • PROP_NAME_EOL: A local error. The property name is terminated by the end of line, i.e., colon and value are missing.
  • PROP_DUPLICATE: A local error. property which may not appear more than once has been seen a second time.
  • PARAM_UNCLOSED_QUOTE: A local error. A parameter had a quoted value, but the quote was unbalanced.
  • PARAM_MISSING_EQUALS: A local error. A parameter name was not terminated by an equals sign.
  • PARAM_INVALID_NUMBER: A local error. The parameter value should have been a number but wasn't.
  • PARAM_DUPLICATE: A local error. A parameter that can only have a single value was specified more than once.
  • PARAM_UNESCAPED_COMMA: A local warning. A parameter accepting only a single value contained an unescaped comma. This may indicate incomplete character escaping or trying to provide multiple values where they are not allowed.
  • PARAM_BAD_BACKSLASH: A local warning. In a double-quoted parameter value, a backslash was found. Escaping in quoted parameter values should be according to RFC6868, using circumflexes (^). This indicates a possible problem in the input file; the backslash was not treated as a special character.
  • PARAM_BAD_CIRCUMFLEX: A local warning. In a double-quoted parameter value, a circumflex (^) was found, which was not part of an escape sequence. This indicates a possible problem in the input file; that circumflex was not treated as a special character.
  • VALUE_INVALID: A local error. A property with a required value had a different value.
  • VALUE_UNESCAPED_COMMA: A local warning. A property accepting only a single value contained an unescaped comma. This may indicate old-style (vCard3) value, e.g. for PHOTO, which is considered incomplete character escaping in vCard4.

Unparseable lines

If any lines in the current vCard left the parser speechless, they are stored essentially unmodified in the unparseable array. The only modification is that wrapped lines have been unwrapped, as this happens before parsing. You most likely want to ignore those lines, unless you want to re-export the vCard as faithfully as possible, even if that violates the standard (and might cause errors for other parsers).

Related work

  • Searching for vcard on NPM results in mostly vCard generators or converters to/from other formats. Notable exceptions:

    • vcard4 is a vCard 4.0 generator which also includes parsing capabilities.
      Trying to create type annotations for vcard4 turned out to be hard. The resulting types for the parser would be so lax as not to help when writing a program processing it further, requiring runtime type verification in the application. Also, their design decision to transform arrays with a single member into requires every access to verify the field's structure. Furthermore, it has some minor issues with its RFC 6350 compliance (lack of proper property group support or incomplete unescaping rules) and the IETF's general Robustness principle (i.e., not accepting bare newlines).
    • vdata-parser is a generic vCard/vCalendar parser, handling multiple cards in a single file.
      Similar to vcard4 above, it does not seem amenable to reasonably tight types and mixes elements and arrays. Furthermore, it is unaware of the expected parameter/property structure and does not handle escaped data.
  • The runtime type introspection required for DRY is modeled after Zod.
    Zod was even used for an early prototype. However, a ultra-lightweight, tailored alternative to Zod was created (clocking in at under 200 bytes minified/gzipped). Zod would have created overhead (additional dependencies, bundle size, but especially the amount of code needed to define and query the schema, while having to touch Zod internals which might change in the future), while providing little benefit. For example, Zod's transform seemed to be impossible to apply to parsing directly. So, Zod's would just have been used to duplicate work that had already been performed