-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor performances fix and a small bug fix #45
base: master
Are you sure you want to change the base?
Minor performances fix and a small bug fix #45
Conversation
- Faster agent Id retrieval in most cases - Removed unnecessary vector normalizations when computing skin deformations
Updating the variations scene (childnames, sets, etc.), requires to load all atoms data in memory. This includes all meshes, which are read in series. Meshes from one agent type are now loaded in parallel to improve speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we don't usually load N in the first place, so the normalization computations never occur. Might be worth looking into whether you need the deformed N from Atoms or can get away without it (eg if your meshes are subdivided anyways, or even if they are linear you would fallback to winding-order at rendertime).
} | ||
else | ||
{ | ||
std::cout << "Index did not match... Doing a linear search!" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't leave the debug print in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that one... I will remove it, thanks!
if ( !inGeoMap ) | ||
// Load atoms meshes in parallel | ||
tbb::blocked_range<size_t> tbb_range(0, nbGeos, 1); | ||
tbb::parallel_for(tbb_range, [&](const tbb::blocked_range<size_t>& range) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to be careful with tbb parallelization inside gaffer nodes because of the way the compute cache policies work. Here's an example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different considerations here :
- TBB exception handling. Unless told otherwise, TBB will silently cancel your
parallel_for
if an exception is thrown in another task inside the same task tree - which as far as you are concerned could be completely unrelated to what you are doing. This is what you need thetask_group_context
that Andrew linked for - it tells TBB that yourparallel_for
is a distinct operation that shouldn't be affected by exceptions thrown in any others. - Deadlock due to TBB task stealing. TBB loves to wander off and do whatever TBB tasks it can squeeze in, so the wait for your
parallel_for
can cause the execution of completely arbitrary TBB tasks elsewhere in Gaffer. Gaffer performs does some operations behind mutexes, so if yourparallel_for
is behind that mutex, and the arbitrary task TBB chooses to do needs that mutex, you get deadlock. This is what the cache policies Andrew mentioned are for. By declaring a cache policy of eitherTaskIsolation
orTaskCollaboration
, you avoid the deadlock by making sure TBB won't do any outer tasks unrelated to yourparallel_for
.TaskCollaboration
has more overhead but is smart enough to allow multiple waiting threads to collaborate on the construction of the engine if they all pull on the engine plug concurrently. There's an example of this inFilterResults
.
Yes, there is an unfortunate overloading of the term "isolation" here. This comes from TBB, where task_group_context::isolated
means "not cancelled by exceptions elsewhere" and this_task_arena::isolate
means "won't wander off and perform unrelated tasks".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of that. Thank you for the clear explanation. :) Just to be sure, that means that setting the cache policy on the engine data output to TaskCollaboration will allow the currently running threads (the ones from which the engineDataPlug compute method are called from) to wait and pick up the subtasks (atoms mesh loading) while they are waiting for the output to be computed? Does that mean that setting it to TaskIsolation will prevent the subtasks to have access to all cores, or that the subtasks will simply need to create more threads instead of reusing the ones waiting, which causes more overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, that means that setting the cache policy on the engine data output to TaskCollaboration will allow the currently running threads (the ones from which the engineDataPlug compute method are called from) to wait and pick up the subtasks (atoms mesh loading) while they are waiting for the output to be computed?
Yes.
Does that mean that setting it to TaskIsolation will prevent the subtasks to have access to all cores?
Yes, if other tasks/threads also need the result from engineDataPlug()
, they would just wait uselessly for it in the TaskIsolation case. TaskIsolation solves the deadlock problem, but does nothing to aid performance. In an ideal world, TaskCollaboration would be the only option, but it has additional overhead so we are keeping TaskIsolation around as an option until we're happy that collaboration is suited for everything - more recent TBBs seem to have a new mechanism that might help us implement it more efficiently.
In your case, I would expect the engine loading to be heavy enough and infrequent enough that TaskCollaboration would be the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks!
- Removed a forgotten cout - Fixed a few issues related to TBB in EngineData::loadAgentTypeMesh (AtomsVariationReader.cpp)
Reviewers comments have been handled in latest commit:
Note: I did not add a way to disable Normals transformation during skin deformation because I was not sure if there was a safe way to guarantee that it won't be used. If you want, I can still add an option that can be toggled by the user to enable/disable normals transformation. |
Sorry, to clarify about my skipping In any case, its not related to this PR, just a thought for the future. |
Latest changes look good to me. Another thing you might want to consider for the future is adding support for cancellation. This is the mechanism Gaffer uses to abort long running operations and make the UI responsive again. You do that by calling |
First of all, this branch contains some minor performance fixes:
Also, this branch contains a small bug fix I found while looking at the skin deformation code in AtomsCrowdGenerator. Instead of using the normals matrix's inverse to transform the normals back to local space, the normals matrix (local to world) is used instead. I believe this has not been detected before because the agent's transformation matrices only contained translation information, with the inner 3x3 matrix being the identity.