Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use intrusive reference counting #128

Merged
merged 19 commits into from
Aug 2, 2024

Conversation

fhackett
Copy link
Contributor

@fhackett fhackett commented Jul 24, 2024

This proposed change relates to the ongoing discussion of #115, regarding how Trieste can reduce its impact on binary size.

One thing we identified was the footprint of shared_ptr destructors, which will be included whenever the ubiquitous Node goes out of scope. We considered switching to a customized intrusive reference counted pointer implementation, which might be a bit more efficient, and would give us more control over code generation of such a common case in code that uses Trieste.

The resulting pointer implementation, and its non-inline refcount decrement, saves 300KB on the yamlc executable, and saved 1.3MB on rego-cpp's main executable, or about 20% of its original size (6.8MB). Keeping the refcount decrement inlineable (but otherwise the same) saves some space still, but less (200KB for yamlc).

Notes:

  • there is no support for multithreading in this changeset so far. We don't really need it for now, but perhaps best to add it as a compile-time configuration option. not any more
  • question: the thread-safe intrusive ptr is disabled right now; what kind of option would make sense for enabling it? answer after discussion: always enable thread-safety. Implemented.
  • this is a breaking change wherever a client was aware of aliases like Node being shared_ptr specifically. One notable example is the shared_from_this() method, which has an equivalent in the new version but is renamed. We could simplify the change by keeping the "shared" name, or we could go with a generic name like ptr_from_this to be robust against this problem in the future. answer after discussion: current name is ok, change behavior of ->parent() in other PR

@fhackett fhackett marked this pull request as draft July 24, 2024 15:05
@fhackett fhackett marked this pull request as ready for review July 29, 2024 16:21
@fhackett fhackett requested a review from mjp41 July 29, 2024 16:21
Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of bits.

Comment on lines 126 to 128
dec_ref();
ptr = other.ptr;
inc_ref();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code might be wrong. The other is effectively borrowed as you are using a reference. So I think it is possible for the call to decref to run a destructor, which in turn deallocates the underlying object that other is borrowed from.

You really should do

Suggested change
dec_ref();
ptr = other.ptr;
inc_ref();
T* tmp = ptr;
other.ptr.intrusive_inc_ref();
ptr = other.ptr;
tmp.intrusive_dec_ref();

This should cover the self-assignment case. You might need null-checks too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... yes, I think I see it.

All you need is (in some weird syntax that mixes up C++, Scala, and probably Go, sorry):

let foo = X
let bar = Cons(foo, Nil)

foo := Nil // bar indirectly has only ref to what foo had
bar := bar.head // destroys Cons(...) -> destroys X -> bar now points to bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my previous version was basically right then, in that what you're suggesting here is equivalent up to some syntax.
The complaint was that it was doing a redundant inc_ref on self-assign, and I introduced this bug when removing the self-assign handling.

So, I put my existing solution back but with the self-assign "optimization" still there, since it also gets all the null checks right by construction (rather than give me more pointer manipulation to shoot myself in the foot with).

(working on the other comment, then committing)

@@ -533,13 +534,13 @@ namespace trieste
inline detail::Rule
operator>>(detail::Located<const std::string&> s, detail::ParseEffect effect)
{
return std::make_shared<detail::RuleDef>(s, effect);
return intrusive_ptr(new detail::RuleDef(s, effect));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this pattern. Raw new is generally considered bad form. Could we wrap is and expose something like:

template<typename T, typename... Args>
intrusive_ptr<T> intrusive_refcounted<T>::make(Args&&... args) {
   return intrusive_ptr(new T(std::forward<Args>(args)...);
}

Or a make_intrusive function if the template and inheritance don't work out with it being a method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got intrusive_ptr::make working.

I could use it almost everywhere, except where it was never possible due to NodeDef's private constructors.

Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor bits.

};

using Source = intrusive_ptr<SourceDef>;
using Node = intrusive_ptr<NodeDef>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Node need to be declared this early? Could this code be in Ast.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question that I largely avoided for this PR up to this point (didn't want more compile errors because I broke something that wasn't the ptr implementation).

It can be moved one file "down" to token.h, but it stops there because there is a circular relationship between TokenDef and NodeDef.

That is probably cleaner, and it does work. I'll do that.

include/trieste/source.h Outdated Show resolved Hide resolved
struct intrusive_refcounted
{
private:
detail::copyable_refcount intrusive_refcount = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment can end up calling a copy constructor, it is better to use the initialisation syntax {foo} rather than = foo.

Suggested change
detail::copyable_refcount intrusive_refcount = 0;
detail::copyable_refcount intrusive_refcount{1};

Also, a more common pattern is to start with 1, and then you don't need an incref on construction. The incref should could get optimised, but if it is using atomics, then it is probably not allowed to optimise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so my existing design comes from the idea that we should support "ingesting" a raw pointer into an intrusive_ptr, whether it's owned or not.

If we start at refcount 1 instead, then we have to know if we're on the first construction (in which case no increment), or if we're taking joint ownership (in which case, do increment) (yes via a raw pointer, yes rare).

This is not a problem for other refcounts where we control the initialization more. For example, shared_ptr completely hides its control block's lifecycle, so it can probably just start at 1 with no problem. Our control block is just the object itself, and we let users allocate one whenever they want.

The trouble is, if we ever write intrusive_ptr{new X}, and X's constructor started its refcount at 1, then we might end up getting a refcount of 2. Conversely, if we assume the raw ptr construction must have a refcount of 1 (though that is assertable I suppose), then it's possible to mess up and not increment in some cases (which admittedly are usually in utility methods like intrusive_ptr_from_this or intrusive_ptr casting, but it is complexity nonetheless).

For context, there are unavoidable intrusive_ptr{new NodeDef} instances that are needed because the NodeDef constructor is private and make can't call it.

So basically I can implement start-at-1, but it will have some code complexity cost. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved via voice call:

  • keep starting value at 0
  • comment why, summarizing this conversation
  • make sure the 0 only shows up once (likely, remove copyable_refcount's default constructor)

// built in dec_ref with null checks below.
tmp.ptr = ptr;
ptr = other.ptr;
inc_ref();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more common to put the inc_ref above the assignment. The implementation does not handle the read-reclaim race, so is not too important. How about?

other.incref();
// Move old value into a temporary, so gets dec_ref after assignment.
intrusive_ptr<T> old{ptr};
ptr = other.ptr;

Note this assumes initial rc = 1 change proposed belove

FYI, I initially didn't like the use of the destructor, but I am coming around to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(leaving out the rc init consideration which I wrote about a bit above)

Interesting, hadn't thought about that dimension. (read-reclaim race... I can imagine what it means but not familiar)

I can do that, just need to sprinkle an extra const on inc_ref (which I think is fine because const semantics).


constexpr explicit intrusive_ptr(T* ptr_) : ptr{ptr_}
{
inc_ref();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inc_ref();

If we start with rc=1, then this can be removed.


constexpr intrusive_ptr<T>& operator=(intrusive_ptr<T>&& other)
{
std::swap(ptr, other.ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a swap? Should we be putting ptr into other?

Suggested change
std::swap(ptr, other.ptr);
intrusive_ptr<T> old{ptr};
ptr = other.ptr;
other.ptr = nullptr;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this done as a swap somewhere, but nulling other is fine too. Just trying to be "fancy" and I'm not too attached to it.

I can do it the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::exchange would probably be the correct "fancy". Please don't try to be fancy.

Copy link
Contributor Author

@fhackett fhackett Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a specific thing I remember from an undergrad C++ course actually, something about re-using constructor/destructor to implement assignment. Yes, specifically with swap... but I either misremembered it, or it's not useful here.

[re-reading my original response, I guess I misspoke... "trying out a potentially good idea that ended up being unnecessarily fancy" would be more what I was trying to say]

@mjp41 mjp41 merged commit d51038b into microsoft:main Aug 2, 2024
21 checks passed
@fhackett fhackett deleted the fhackett-refcount-optimization branch August 2, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants