Skip to content

Configuration file types

Oswaldo Baptista Vicente Junior edited this page May 2, 2022 · 12 revisions

There is a great variety of file types to store application configuration data, each one with specific features, advantages, and disadvantages.

Some applications use binary formats, typically for speed and size optimization, or simply for the purpose of data obfuscation.

Text-based formats, on the other hand, may provide more usability and interoperability, with the most popular file types being Properties, INI, XML, JSON, YAML, and TOML.

Properties

This is the simplest and most popular type used by Java applications as the JVM provides out-of-the-box support for this file format.

Each line represents a single property that consists of a pair of strings, in the formats key=value, key:value, or key value. Keys cannot contain middle spaces. Single-quotes and double-quotes are considered part of the string. Leading spaces are insignificant but trailing spaces in the value part are not trimmed. Lines starting with either the # or ! sign are interpreted as comment lines.

# Comment line
database.host=10.110.1.12
database.port=1910

There are a few disadvantages to using Properties files on more robust configuration scenarios since all values are simple strings and the format does not support arrays or more complex data structures.

Initialization file (INI)

Although not standardized, the INI format is like Properties, with an additional feature that the key/value pairs can be organized into sections, still keeping the format simple and human-readable. It also supports comment lines (starting with either the # or ! sign) and middle spaces in the key part.

# Comment line
[database]
server=10.110.1.12
port=1910

[web]
server=10.110.1.24
port=8080

A disadvantage of the INI format is the lack of standardization, with features that vary depending on the selected provider.

Extensible Markup Language (XML)

A structured and standardized language, designed for simplicity, generality, and interoperability across the World Wide Web. Among other features, it supports schemas for data interpretation and validation, custom type creation, and namespaces to avoid element collisions. The contents of an XML document are organized hierarchically through elements and attributes, and the language supports comments as well.

<!-- Comment line -->
<application>
  <layers>
    <layer name="database">
      <server>10.110.1.12</server>
      <port>1910</port>
    </layer>
    <layer name="web">
      <server>10.110.1.24</server>
      <port>8080</port>
    </layer>
  </layers>
</application>

The format is still widely used but already considered too complex and verbose, increasing storage capacity and bandwidth needs, compared to other formats.

JavaScript Object Notation (JSON)

JSON is a popular and lightweight text format based on the JavaScript language that is easy for machines to parse and generate. Data in a JSON document is organized as a collection of key/value pairs, where the value part can be either strings, numbers, booleans, arrays, or even other objects. The structure is much simpler and less verbose than XML and the format became even more popular with the increasing adoption of RESTful Web Services and microservices.

{
  "layers": {
    "database": {
      "server": "10.110.1.12",
      "ports": [1910, 1911]
    },
    "web": {
      "server": "10.110.1.24",
      "ports": [8080, 8081]
    }
  }
}

However, a potential disadvantage to using JSON for configuration is that it does not support comments.

YAML Ain't Markup Language (YAML)

YAML is a data serialization format designed around the common native structures of modern agile programming languages, aiming for human readability with an intuitive visual structure based on indentation, still supporting complex data structures for robust applications. The format is considered extremely expressive and compact for holding configuration data. And it also accepts comments.

---
# Comment line
layers:
  database:
    server: 10.110.1.12  # in-line comment
    ports:
    - 1910
    - 1911
  web:
    server: 10.110.1.24
    ports:
    - 8080
    - 8081

The major drawback to YAML is that, due to its requirement on indentation, it is easy to get confused by tabs and spaces, leading to syntax and validation errors, specially writing large files manually. Also, YAML documents do not support minification (a process of removing tabs and spaces typically applied on XML and JSON for storage and/or bandwidth optimization).

Tom's Obvious Minimal Language (TOML)

TOML is a standardized format that prioritizes human readability. It supports key/value pairs, arrays, tables, numbers, booleans, dates, multi-line strings, and comments.

The syntax is intentionally very simple. It looks a lot like INI files, with the advantage that it is strongly typed and has a proper specification. Unlike YAML, indentation in TOML is purely cosmetic. And, unlike JSON, it supports comments.

[layers]

  [layers.database] # Indentation (tabs and/or spaces) is allowed but not required
  server = "10.110.1.12"
  ports = [ 1910, 1911 ]

  [layers.web]
  server = "10.110.1.24"
  ports =  [ 8080, 8081 ]

TOML is backward-compatible with INI, so INI files can be parsed as valid TOML documents.

General comparison

Characteristic Properties INI XML JSON YAML TOML
Human-readable
Standardized -- --
Strongly typed -- --
Allows comments --
Allows arrays/sequences -- --
Allows complex objects -- -- --
Allows minification -- -- -- --