-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments Requested: Reducing FlatSharp Allocations #339
Comments
Hello James, Yes, GC is real bad - for our game's client especially. We try to keep temporary allocations to an absolute minimum, which in C# is a lot of work. So, maybe just let us do the allocations and change the Parse API to accept a reference to an object and fill that out. That way, no need for Flatsharp to get into the pooling business. We and others probably already have a pooling solution. |
It is a little more nuanced than that, unfortunately. I'm not sure how familiar you are, but FlatSharp returns subclasses of your data definitions when you deserialize. These are sometimes not public and/or otherwise have mangled names. We could certainly do something along the lines of: public interface IObjectPool
{
bool TryGet<T>(out T? item);
void Return<T>(T item);
} If the methods were generic, then FlatSharp could supply the concrete type in terms of The other thing I'm not sure about is how configurable the |
Zero allocation would be amazing! I have a use case where I merge (add) objects. For simplicity I'll demonstrate in JSON. {
"who": "cat"
} mergeWith() {
"where": [{ "address" : "a street" }]
} mergeWith() {
"where": [{ "phone" : "1234" }]
} produces: {
"who": "cat"
"where": [{ "phone" : "1234"}, {"address" : "a street" }]
} So I find my self scanning through all the properties to see where they differ and building a new object. Being able to perform that merge operation without allocating for each entry in the object would be great! And is something valuable enough that I've already contemplated a few times on how such an extension to FlatBuffers could be done. I was imagining it could work in a similar way to a This isn't working, but close as I could get in the time I had. I'm sure I'm miss-using ref structs here: /*
SharpLab tools in Run mode:
• value.Inspect()
• Inspect.Heap(object)
• Inspect.Stack(value)
• Inspect.MemoryGraph(value1, value2, …)
*/
using System;
readonly ref struct TraverseNode{
public readonly int type;
public readonly ReadOnlySpan<byte> value;
public TraverseNode(int type, ReadOnlySpan<byte> value){
this.type = type;
this.value = value;
}
}
struct Thing {
byte[] root;
public static Thing GetRootAsThing(byte[] buffer){
Thing t;
t.root = buffer;
return t;
}
public ThingEnumerable GetRefEnumerator() {
return new ThingEnumerable(root.AsSpan());
}
public ref struct ThingEnumerable {
int currentPos;
ReadOnlySpan<byte> root;
TraverseNode traverseNode;
public ThingEnumerable(ReadOnlySpan<byte> root){
currentPos = 0;
this.root = root;
}
public bool MoveNext(){
// todo: fix
traverseNode = new TraverseNode(0, root.Slice(0));
return false;
}
public TraverseNode Current() { return this.traverseNode; }
}
}
class Example{
static void Main(){
Merge(new byte[10], new byte[10]);
}
static byte[] Merge(Byte[] current, Byte[] extra){
var thing1 = Thing.GetRootAsThing(current);
var thing2 = Thing.GetRootAsThing(extra);
var itr1 = thing1.GetRefEnumerator();
var itr2 = thing2.GetRefEnumerator();
// todo: don't alloc if current.legth can fit all entries we need to merge in
// instead shift some bytes around inside current and return that
var merged = new byte[current.Length];
while(itr1.MoveNext() && itr2.MoveNext()){
var entry1 = itr1.Current();
var entry2 = itr2.Current();
if(entry1.type != entry2.type || !entry1.value.SequenceEqual(entry2.value)){
// do fun stuff like copy both
} else {
// only copy one
}
}
if(false){
return current;
}
return merged;
}
} |
Long time, @Astn! Glad to see that you're still around :)
Let's not go too crazy. I'm thinking of this as substantially-reduced allocations. For example, I can't keep from allocating in some situations like reading a string.
Are you using FlatSharp for this? I sincerely have no idea how it will behave if you try to overwrite the same buffer that's in use by the object. I assume the serialization and the parse would race with each other (unless you're not using
Are the allocations you're seeing coming from the traversal of the object? Or from the serialization to a new object? I think the latter case is something you can handle today if you pool lists, buffers, and FlatSharp "top level" objects. The allocate-on-parse is the scenario that I'm really trying to think about, since those happen internal to the library and are often of different types than you know about as the user, since they're generally subclasses of the objects you actually work with. I think I'm leaning the direction of pooling rather than traversal APIs at this time, since it seems to gel a bit better with what C#/Unity devs expect and fits in better with what FlatSharp already does. Do you think that a pooling solution here would effectively address your problems? From prototypes, object pooling does add considerable overhead, often approaching half the speed of non-pooled mode (on .NET 7 at least...), which is really a bummer. However, for Unity scenarios where GC is non-generational, I get the sense that they'd probably rather have predictable performance with minimal GC rather than absolute highest straight-line speed. |
Could there be static "Create" function that allocates the internal sub-class and returns parent-class? |
I'm actually cheating a bit right now cause I couldn't get it to perform like I wanted with just FlatSharp. Right now I'm taking the objects that need to be merged and if they both can fit in the byte[] already allocated, then I serialize them both to it one after the other. This way I can do the merge lazily during a later read operation instead of incurring that cost during save. Makes the read operation slower, but that's better when writes are more frequent than reads. Then during a later read operation I merge them before returning them to the caller. And if I did a merge, then I take that byte[] and then store it back into FASTER with an in-place update. FASTER RMW operation using |
The other thing I've observed after some prototyping with Pooling is that the difference between
This obviously scales with the number of items going in and out of the pool. FlatSharp generally tries really hard not to have any virtual method calls that aren't necessary. Frequent virtual dispatches can tank performance. Note that This is a tricky call for me. Is there a super compelling case for plugging in a custom object pool? What is different about it than, say, |
@Astn -- that makes some degree of sense. Thanks! I'm trying to parse your responses, but I haven't gotten a clear read on whether you have a preference for object pooling or for some other option? Object Pooling is definitely more natural for FlatSharp since I can just add |
I don't think I would use an object pool in my case. I would look for a completely stack based solution. If I'm force to allocate then I'd prefer to do it only once to resize the underlying byte[]. I think I would lean towards some kind of visitor approach (for ex: ExpressionVisitor). |
The official Google C# implementation is entirely stack-based if you want it to be. While FlatSharp might be able to do some things better, I'm quite skeptical of reinventing that approach here. What benefits over the official library do you see FlatSharp being able to provide if it were to go down that road? |
I started with the Google implementation, and eventually switched to FlatSharp because I felt like 90% of my code was dealing building buffers and that was getting quite tiring. I really haven't looked at the Google library recently though. Some things I think would be useful to add as some kind of extension. I'm not suggesting you change the way your library works. I think it's really great for 98% of what I want to do. I'm thinking of an side car type of feature set.
|
So, building buffers with the official library can be tedious. But actually using them might do what you're hoping. I'd encourage you to give it a shot, because you'll get an API that looks how you're expecting. They're also all value types. So you could do something where you use FlatSharp for serialization and "general purpose" traversal, but fall back to Flatc for cases where you need to zoom to a specific field or whatnot. I'm not trying to drive you away from my library of course, but I don't want to reinvent what they've done. One thing I could consider doing would be emitting "Flatc" code as well from FlatSharp, since Flatsharp already uses Flatc, this wouldn't be difficult. Addtionally, FlatSharp could probably go through and adjust the namespaces for you so there aren't conflicts.
That's something I can definitely think about. Today, validation happens "on demand" mainly through array bounds checks as you actually use the buffer. |
@pattymack and @joncham -- I have a prototype of object pooling if you'd like to provide some feedback. The nuget packages are attached here: There are a few things to keep in mind:
Finally, some benchmarks: FlatSharp 7 without pooling:
FlatSharp 7 with pooling:
The allocated column tells the story. Pooling definitely works, and it's also definitely slower (on .NET 7 at least). The allocations you do see are for reading strings (which are not poolable and do still need allocations). |
Version 7.0.0 is published with experimental support for object pools. Feedback welcome. https://github.com/jamescourtney/FlatSharp/wiki/Object-Pooling |
Over the years, FlatSharp has gotten better at reducing allocations. Initially, everything was a class: Tables, Structs, Unions, and Vectors. Now, structs are optionally value types and unions are always value types. However, Tables and Vectors are still always reference types.
The tension with Tables and Vectors is that FlatSharp uses virtual property overrides to fulfill many of the FlatBuffers contracts. This gives tremendous flexibility, and is really the reason that FlatSharp can provide 4 different deserialization modes. However, it also means that FlatSharp is stuck with classes for Tables (virtual properties) and Vectors (
IList<T>
implementations).This post is to gather some feedback about what users would like to see in a potential "allocation-reduction" feature.
Current State & Background
FlatSharp's stated goals are usability and speed, in that order. FlatSharp tries to be idiomatic, which is to say that it tries to look like a "normal" .NET serialization library, which is why the current abstractions exist. For better or worse, the idiomatic thing to do in C# is to create new objects and let the GC deal with the cleanup.
IDisposable
can also be used, but the real intent there is to free up native resources.However, many users of FlatBuffers have incredibly high performance cases in mind. In such cases, even small stops for GC can be problematic. So reducing allocations has a double benefit:
Object Pooling
Object pooling is often a solution for avoiding allocations. If FlatSharp had an object pool, the general idea is that these managed objects could be "reset" to point at a new buffer or a new spot in an existing buffer. However, there are some problems with object pooling generally, and some FlatSharp specific problems:
FlatSharp has some concerns as well, since pooling and object recycling looks very different depending upon the deserialization mode:
For example, returning a
Lazy
object to a pool is no problem at all, since there are no internal dependencies. The next time that object is needed, it can just be reconstituted from the underlying buffer.Returning a
Progressive
object to a pool is more difficult; InProgressive
mode, there are Parent --> Child references, which means that when a Child is returned to a pool, the parent references also need to be invalidated. However, Parents can also be returned to a pool, so now references need to be invalidated in both directions. Over-invalidating is not ideal, but does not breakProgressive
mode since it still has the capability to consult the underlying buffer to reconstitute the correct nodes.Greedy
is even more challenging, since those objects do not maintain a reference to the underlying buffer, which means that it would be incorrect to return a child to the pool before its parent.One could imagine a "top-down" approach to object pooling:
This code would implicitly work for all serialization approaches:
Lazy
never has aparentLink
, returning to the pool is always immediateProgressive
andGreedy
always haveparentLink
s, return to the pool happens when the root node is disposed, recursively. Dispose would be a No-Op for all but the root node, which would return the full graph to the pool.However, there are some concerns for this:
.Dispose()
?IList<T>
does not implementIDisposable
.Value Type Traversal Options
Expose some value-type semantics that allow traversing a FlatBuffer using value types. There is a lot to unpack here:
Questions
If you have an opinion here, please respond to this issue. I'd like to get a good amount of feedback before implementing a solution. Both of these are interesting. I'd like feedback on:
The text was updated successfully, but these errors were encountered: