Skip to content

Deserialization Modes

James Courtney edited this page Sep 15, 2021 · 11 revisions

Deserialization Modes are the primary configuration knob that FlatSharp exposes, and the best way to adjust read performance. The choice of mode should depend on data access patterns, size of data, and other benchmarking. This page aims to demystify deserialization modes.

FlatSharp: How it works

Before we get into a full breakdown of the different performance options, it's useful to have some context on how FlatSharp actually works. FlatSharp treats tables and structs similiarly internally. Let's pretend we have this struct:

[FlatBufferStruct]
public class Location
{
  [FlatBufferItem(0)] public virtual float X { get; set; }
  [FlatBufferItem(1)] public virtual float Y { get; set; }
  [FlatBufferItem(2)] public virtual float Z { get; set; }
}

When serializing, FlatSharp will generate some code that looks approximately like this:

public static void WriteLocation<TSpanWriter>(
  TSpanWriter spanWriter, 
  Span<byte> span, 
  Location value, 
  int offset, 
  SerializationContext context) where TSpanWriter : ISpanWriter
{
  spanWriter.WriteFloat(span, value.X, (offset + 0), context);
  spanWriter.WriteFloat(span, value.Y, (offset + 4), context);
  spanWriter.WriteFloat(span, value.Z, (offset + 8), context);
}

Reasonably simple: we're writing each field of the struct at the predefined offset relative to the base offset.

Deserializing is more interesting. When deserializing, FlatSharp will generate a subclass of Location that overrides X, Y, and Z:

public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
  ...

  public LocationReader(TInputBuffer buffer, int offset) { ... }
  
  public override float X
  {
    get => ...
    set => ...
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}

The deserialization code is generated differently depending on which Deserialization option is selected. However, when you parse an object with FlatSharp, you will get back a subclass of the type you requested. How that subclass is implemented depends upon the deserialization option that you select.

GreedyMutable Deserialization

GreedyMutable deserialization is the simplest to understand. The full object graph is deserialized at once, and the input buffer is not needed after the fact. Code for a GreedyMutable deserializer looks like this:

public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
  private float index0Value;

  public LocationReader(TInputBuffer buffer, int offset)
  {
    this.index0Value = ReadIndex0Value(buffer, (offset + 0));
  }
  
  public override float X
  {
    get => this.index0Value;
    
    // When using Greedy instead of GreedyMutable, setters throw a NotMutableException.
    set => this.index0Value = value;
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}

Notably, the buffer parameter is not retained after the constructor has finished, which means you are free to reuse it immediately after the deserialization operation has concluded.

Lazy Deserialization

Lazy deserialization is the opposite of Greedy. In Greedy mode, everything is preallocated and stored. In Lazy mode, nothing is preallocated or stored:

public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
  private readonly TInputBuffer buffer;
  private readonly int offset;

  public LocationReader(TInputBuffer buffer, int offset)
  {
    this.buffer = buffer;
    this.offset = offset;
  }
  
  public override float X
  {
    get => ReadIndex0Value(this.buffer, this.offset + 0);
    
    // Lazy is always immutable (with the exception of the WriteThrough attribute)
    set => throw new NotMutableException();
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}

As we see here, Lazy is as advertised. Properties will only be read as they are accessed. Repeated accesses of the same property result in repeated trips to the InputBuffer. Crucially, Lazy maintains a reference to the InputBuffer. If your access patterns are sparse, Lazy deserialization can be very effective, since cycles are not wasted reading data that isn't used.

Progressive Deserialization

Progressive can be thought of as Lazy-with-caching. The difference between Lazy and Progressive mode is that Progressive will memoize the results of the reads from the underlying buffer.

public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
  private readonly TInputBuffer buffer;
  private readonly int offset;
  
  private bool hasIndex0Value;
  private float index0Value;

  public LocationReader(TInputBuffer buffer, int offset)
  {
    this.buffer = buffer;
    this.offset = offset;
  }
  
  public override float X
  {
    get
    {
      if (!this.hasIndex0Value)
      {
        this.index0Value = ReadIndex0Value(this.buffer, this.offset + 0);
        this.hasIndex0Value = true;
      }
      
      return this.index0Value;
    }
    
    // Progressive is always immutable (unless using write through)
    set => throw new NotMutableException();
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}

So we see the primary difference between Progressive and Lazy is the addition of two fields in the generated class, as well as an if statement inside the getter.

Progressive is a great choice when you cannot anticipate the access patterns of your deserialized FlatBuffer. It usually won't be the fastest (though it is possible to contrive a case where it is), but it will never be the slowest. Greedy is not performant when only a small slice of your buffer is accessed, and Lazy deteriorates when elements in the buffer are accessed repeatedly.

For repeated accesses, Progressive is faster than Lazy at the expense of more memory. For situations where fields are accessed at most once, Progressive will be slower than Lazy.

Virtual / Non-Virtual Properties

Beginning in version 4.1.0, FlatSharp supports non-virtual properties:

[FlatBufferStruct]
public class 2DLocation
{
  [FlatBufferItem(0)] public virtual float X { get; set; }
  [FlatBufferItem(1)] public float Y { get; set; }
}

How does FlatSharp generate code for this scenario? The rules are:

  • Any non-virtual properties are deserialized greedily. There is no way around this since FlatSharp cannot override these properties.
  • Any virtual properties are deserialized according to the deserialization option. In the example below, we're assuming Progressive was used.
public class LocationReader<TInputBuffer> : 2DLocation where TInputBuffer : IInputBuffer
{
  private readonly TInputBuffer buffer;
  private readonly int offset;
  
  private bool hasIndex0Value;
  private float index0Value;

  public LocationReader(TInputBuffer buffer, int offset)
  {
    this.buffer = buffer;
    this.offset = offset;
    
    // Flatsharp just sets the base value. No extra fields or properties generated since the Y property is non-virtual.
    base.Y = ReadIndex1Value(buffer, (offset + 4));
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
  
  public override float X
  {
    get
    {
      if (!this.hasIndex0Value)
      {
        this.index0Value = ReadIndex0Value(this.buffer, this.offset + 0);
        this.hasIndex0Value = true;
      }
      
      return this.index0Value;
    }
    
    set => throw new NotMutableException();
  }
  
  [MethodImpl(MethodImplOptions.AggressiveInlining)]
  private static float ReadIndex1Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}

As you can see here, it's entirely possible to mix and match virtual and non-virtual properties. In addition to performance benefits of non-virtual methods, it also allows you to mix-and-match greedy vs non-greedy deserialization.

Performance Implications

So, we've seen what kind of code Flatsharp will generate for you depending on your configuration. When should you use which options? The best answer is, of course, to benchmark. However, answers to the following should help inform your choices.

Question 1: Are the default settings not fast enough?

FlatSharp is really fast, even with the default Greedy settings. Don't preemptively optimize. Greedy also works well because it guarantees you can immediately recycle your InputBuffer object. Using greedy deserialization on buffers with lots of data can cause spikes in the Garbage Collection since all of the objects are allocated at once, rather than getting amortized out as you use the buffer. Greedy is left as the default because it is the most straightforward and most like other serialization libraries.

Question 2: Do you serialize or parse more often?

Some services are read-mostly, and some are write-mostly. If you're doing more serializing than parsing, consider making your properties non-virtual:

public int Foobar { get; set; }

There is some overhead to virtual dispatches, so non-virtual properties are faster to access than virtual ones. This means that virtual properties are slower on serialization paths and some deserialization paths. This is because when you invoke a virtual method, the CLR needs to look at the vtable of the object to figure out which method to invoke, so there is one level of indirection. When a non-virtual method is invoked, the control flow jumps directly to that method.

However, non-virtual properties by their nature can only be deserialized Greedily. Using non-virtual properties is an interesting way to mix lazy and greedy deserialization, so if there is some data that is always read (or read in an inner loop), non-virtual properties can be a good optimization.

Question 3: When should I consider using Lazy deserialization?

Lazy is great when your access patterns are sparse and at-most-once, or your buffers are enormous. If you're touching individual properties more than once, then lazy will likely be slower than other options. Lazy also means that the deserialized objects carry references to the source buffer.

Question 4: What about Progressive?

Data is read at-most-once, which is nice when access patterns cannot be anticipated, but full greedy mode is not appropriate. For repeated accesses, Progressive mode approaches the speed of Greedy and is much faster than Lazy. For sparse accesses, it is only a small bit slower than Lazy and much faster than Greedy. These characteristics make it a great choice for nearly all scenarios.

Question 5: Do you have (very) large FlatBuffers?

When dealing with large FlatBuffers, it can be very helpful to use Lazy. Lazy allows progressively reading data, so that there are no allocation spikes at once as there are with Greedy. This allows the GC to exist at a steady state and scoop up most things in Generation 0. Combining Lazy with WriteThrough or by-value structs on struct members can be particularly useful if you need to update large FlatBuffers, as this can be done in place without a Parse -> Update -> Reserialize flow that consumes memory and copies way more data than is necessary. More information can be found in the write through sample.

Clone this wiki locally