Minor performances fix and a small bug fix #45

fdagenais-cinesite · 2021-02-08T17:08:58Z

First of all, this branch contains some minor performance fixes:

For this one, the credits go to Alexander who pointed it out. In most cases, an agent id's will match its index in the agents list. When fetching an agent id from its index, we can save time by first looking if the index matches the id. This avoids having to loop over all entries in the vector, which does not scale well as the number of agents increases.
Most of the time is generally spent computing the skin deformation. During that operation, roughly 5% of the time was spent on non required normalization of 4D vectors. These extra computations have been removed. In my test with ~750 agents, this was roughly ~2s, which is not much, but as the number of agents and/or their mesh complexity increases, this could save some more time.
Loading the scene hierarchy of the AtomsVariationReader or AtomsCrowdGenerator nodes in Gaffer, it can take a few seconds to process. Most of the time is spent on loading the atoms meshes in AtomsVariationReader. To speed things up, all meshes from one agent are now loaded in parallel, which significantly improves responsiveness. Note that the gain is limited when processing more that the atoms node, since other nodes might be processed on other cores, e.g., during a render. I wasn't sure this was really needed, but it does improve things when looking directly at the Atoms nodes. I can remove Also note that this is a quick fix, but a better solution, which would take more time to implement, would be to change the way we load atoms data. Instead of loading everything at once, we could load only the required information to generate the child names, bbox, etc., and then load the meshes only when requested.

Also, this branch contains a small bug fix I found while looking at the skin deformation code in AtomsCrowdGenerator. Instead of using the normals matrix's inverse to transform the normals back to local space, the normals matrix (local to world) is used instead. I believe this has not been detected before because the agent's transformation matrices only contained translation information, with the inner 3x3 matrix being the identity.

- Faster agent Id retrieval in most cases - Removed unnecessary vector normalizations when computing skin deformations

Updating the variations scene (childnames, sets, etc.), requires to load all atoms data in memory. This includes all meshes, which are read in series. Meshes from one agent type are now loaded in parallel to improve speed.

andrewkaufman

FWIW, we don't usually load N in the first place, so the normalization computations never occur. Might be worth looking into whether you need the deformed N from Atoms or can get away without it (eg if your meshes are subdivided anyways, or even if they are linear you would fallback to winding-order at rendertime).

andrewkaufman · 2021-02-19T18:59:03Z

src/AtomsGaffer/AtomsCrowdGenerator.cpp

+    }
+    else
+    {
+        std::cout << "Index did not match... Doing a linear search!" << std::endl;


We shouldn't leave the debug print in here

Oh, I missed that one... I will remove it, thanks!

andrewkaufman · 2021-02-19T19:19:37Z

src/AtomsGaffer/AtomsVariationReader.cpp

-            if ( !inGeoMap )
+        // Load atoms meshes in parallel
+        tbb::blocked_range<size_t> tbb_range(0, nbGeos, 1);
+        tbb::parallel_for(tbb_range, [&](const tbb::blocked_range<size_t>& range) {


You have to be careful with tbb parallelization inside gaffer nodes because of the way the compute cache policies work. Here's an example:

https://github.com/GafferHQ/gaffer/blob/15f6530800a99028fd60dbfc3cc1541d0bc139c8/src/GafferScene/PrimitiveSampler.cpp#L371

There are two different considerations here :

TBB exception handling. Unless told otherwise, TBB will silently cancel your parallel_for if an exception is thrown in another task inside the same task tree - which as far as you are concerned could be completely unrelated to what you are doing. This is what you need the task_group_context that Andrew linked for - it tells TBB that your parallel_for is a distinct operation that shouldn't be affected by exceptions thrown in any others.

Deadlock due to TBB task stealing. TBB loves to wander off and do whatever TBB tasks it can squeeze in, so the wait for your parallel_for can cause the execution of completely arbitrary TBB tasks elsewhere in Gaffer. Gaffer performs does some operations behind mutexes, so if your parallel_for is behind that mutex, and the arbitrary task TBB chooses to do needs that mutex, you get deadlock. This is what the cache policies Andrew mentioned are for. By declaring a cache policy of either TaskIsolation or TaskCollaboration, you avoid the deadlock by making sure TBB won't do any outer tasks unrelated to your parallel_for. TaskCollaboration has more overhead but is smart enough to allow multiple waiting threads to collaborate on the construction of the engine if they all pull on the engine plug concurrently. There's an example of this in FilterResults.

Yes, there is an unfortunate overloading of the term "isolation" here. This comes from TBB, where task_group_context::isolated means "not cancelled by exceptions elsewhere" and this_task_arena::isolate means "won't wander off and perform unrelated tasks".

I was not aware of that. Thank you for the clear explanation. :) Just to be sure, that means that setting the cache policy on the engine data output to TaskCollaboration will allow the currently running threads (the ones from which the engineDataPlug compute method are called from) to wait and pick up the subtasks (atoms mesh loading) while they are waiting for the output to be computed? Does that mean that setting it to TaskIsolation will prevent the subtasks to have access to all cores, or that the subtasks will simply need to create more threads instead of reusing the ones waiting, which causes more overhead?

Just to be sure, that means that setting the cache policy on the engine data output to TaskCollaboration will allow the currently running threads (the ones from which the engineDataPlug compute method are called from) to wait and pick up the subtasks (atoms mesh loading) while they are waiting for the output to be computed?

Yes.

Does that mean that setting it to TaskIsolation will prevent the subtasks to have access to all cores?

Yes, if other tasks/threads also need the result from engineDataPlug(), they would just wait uselessly for it in the TaskIsolation case. TaskIsolation solves the deadlock problem, but does nothing to aid performance. In an ideal world, TaskCollaboration would be the only option, but it has additional overhead so we are keeping TaskIsolation around as an option until we're happy that collaboration is suited for everything - more recent TBBs seem to have a new mechanism that might help us implement it more efficiently.

In your case, I would expect the engine loading to be heavy enough and infrequent enough that TaskCollaboration would be the way to go.

Ok, thanks!

- Removed a forgotten cout - Fixed a few issues related to TBB in EngineData::loadAgentTypeMesh (AtomsVariationReader.cpp)

fdagenais-cinesite · 2021-02-23T15:58:09Z

Reviewers comments have been handled in latest commit:

The std::cout call that was left over has been removed.
The engineDataPlug() cache policy has been changed to TaskCollaboration.
The atoms mesh loading tbb loop is now run in it's own isolated context group.

Note: I did not add a way to disable Normals transformation during skin deformation because I was not sure if there was a safe way to guarantee that it won't be used. If you want, I can still add an option that can be toggled by the user to enable/disable normals transformation.

andrewkaufman · 2021-02-23T16:36:51Z

Sorry, to clarify about my skipping N comment, I mistakenly thought we had an opt-out plug on AtomsVariationReader which would skip loading them here, but checking now I see we only have that for Pref & Nref... If we had an additional toggle for N then the AtomsCrowdGenerator wouldn't need any changes because it already returns early if they are missing.

In any case, its not related to this PR, just a thought for the future.

johnhaddon · 2021-02-23T16:44:13Z

Latest changes look good to me.

Another thing you might want to consider for the future is adding support for cancellation. This is the mechanism Gaffer uses to abort long running operations and make the UI responsive again. You do that by calling IECore::Canceller::check( context->canceller() ) periodically in your long compute. If cancellation has been requested, that'll throw an IECore::Cancelled exception which will abort the computation all the way back to the caller. Your only other responsibility is to make sure your code is exception-safe. Here's a super simpler example :

https://github.com/GafferHQ/gaffer/blob/c908425b6d2e82105cd92a1398fae8eeb0ca9600/src/GafferImage/Resample.cpp#L552

fdagenais-cinesite added 3 commits February 8, 2021 11:28

Minor performance improvements in CrowdGenerator

1ab519e

- Faster agent Id retrieval in most cases - Removed unnecessary vector normalizations when computing skin deformations

Fix bug in skin deformation normals update

bd47497

Improve UI responsiveness with AtomsVariationsReader

7d64e37

Updating the variations scene (childnames, sets, etc.), requires to load all atoms data in memory. This includes all meshes, which are read in series. Meshes from one agent type are now loaded in parallel to improve speed.

andrewkaufman reviewed Feb 19, 2021

View reviewed changes

Handle reviewers comments

0cf0c0d

- Removed a forgotten cout - Fixed a few issues related to TBB in EngineData::loadAgentTypeMesh (AtomsVariationReader.cpp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor performances fix and a small bug fix #45

Minor performances fix and a small bug fix #45

fdagenais-cinesite commented Feb 8, 2021 •

edited

Loading

andrewkaufman left a comment

andrewkaufman Feb 19, 2021

fdagenais-cinesite Feb 22, 2021

andrewkaufman Feb 19, 2021

johnhaddon Feb 22, 2021

fdagenais-cinesite Feb 22, 2021

johnhaddon Feb 23, 2021

fdagenais-cinesite Feb 23, 2021 •

edited

Loading

fdagenais-cinesite commented Feb 23, 2021

andrewkaufman commented Feb 23, 2021

johnhaddon commented Feb 23, 2021

Minor performances fix and a small bug fix #45

Are you sure you want to change the base?

Minor performances fix and a small bug fix #45

Conversation

fdagenais-cinesite commented Feb 8, 2021 • edited Loading

andrewkaufman left a comment

Choose a reason for hiding this comment

andrewkaufman Feb 19, 2021

Choose a reason for hiding this comment

fdagenais-cinesite Feb 22, 2021

Choose a reason for hiding this comment

andrewkaufman Feb 19, 2021

Choose a reason for hiding this comment

johnhaddon Feb 22, 2021

Choose a reason for hiding this comment

fdagenais-cinesite Feb 22, 2021

Choose a reason for hiding this comment

johnhaddon Feb 23, 2021

Choose a reason for hiding this comment

fdagenais-cinesite Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

fdagenais-cinesite commented Feb 23, 2021

andrewkaufman commented Feb 23, 2021

johnhaddon commented Feb 23, 2021

fdagenais-cinesite commented Feb 8, 2021 •

edited

Loading

fdagenais-cinesite Feb 23, 2021 •

edited

Loading