Llu/ln bwd #207

liqiangxl · 2023-04-22T01:15:50Z

This PR is copy of csarofeen/pytorch#2400
Benchmark results compared with Apex is:

naoyam · 2023-04-22T02:24:26Z

test/test_gpu_combined_inner_outer_reduction.cpp

Let's rename the test file to test_combined_inner_outer_reduction.cpp

naoyam · 2023-04-22T02:25:10Z

test/test_gpu_combined_inner_outer_reduction.cpp

+// This case is to test the correctness of the combined inner and outer
+// scheduler used in layer norm backward. It can also be configured to test the
+// performance using different data types.
+TEST_F(NVFuserTest, FusionCombinedSchedulerLayerNormBackward_CUDA) {


We are simplifying test names and the Fusion prefix is no longer used.

naoyam

LGTM. Just added comments on test naming

liqiangxl · 2023-04-24T15:31:16Z

!build

liqiangxl · 2023-04-24T19:55:14Z

!build

liqiangxl · 2023-04-25T14:09:39Z

!build

liqiangxl · 2023-04-25T15:57:14Z

!build

liqiangxl · 2023-04-25T16:26:54Z

!build

liqiangxl · 2023-04-25T19:58:38Z

!build

csarofeen

@liqiangxl @naoyam why wasn't this implemented as a different scheduler instead of stuffing it into normalization scheduler. You guys know we don't need to schedule every type of fusion in a single scheduler right?

naoyam · 2023-08-05T20:33:13Z

The normalization schedulers are becoming like a collection of related but significantly different schedulers: block-parallel inner normalization, grid-parallel inner normalization (we don't have this yet), block-parallel outer normalization, grid-parallel outer normalization, grid-parallel inner-and-outer normalization (special case for layernorm backward). All of them can be an independent scheduler instead of making them aggregated as the single normalization scheduler.

I haven't thought about potential implications of the two approaches, but the current aggregated approach seems to make a little more sense for the scheduling performance. If they were all individual schedulers, we would need to call the canSchedule of all individual schedulers until the successful scheduler is found, and the canSchedule functions of all the normalization scheduler variants would likely to have some common analyses, which would be redundantly executed, like finding persistent buffers and analyzing their sizes. The common analyses are executed just once in our current design since all of them are just branched out from the single normalization scheduler. The downside is of course the latter definitely makes the scheduler look more unstructured.

I'd say we should keep them as is for now. We would need to rethink about the whole scheduler and segmentation design for more flexibility and composability.

csarofeen · 2023-08-06T18:08:02Z

@liqiangxl could you pull some timing info from some benchmarks to understand how much time is spent in normalization can schedule? I understand the concern @naoyam but it seems like really messy code for the sake of maybe some performance at compilation time.

csarofeen · 2023-08-06T18:08:28Z

PS, you can still have "a scheduler" from the registry perspective that calls into multiple heuristic and scheduling functions.

liqiangxl · 2023-08-07T15:50:34Z

@liqiangxl could you pull some timing info from some benchmarks to understand how much time is spent in normalization can schedule? I understand the concern @naoyam but it seems like really messy code for the sake of maybe some performance at compilation time.

Sure. The current normalization canSchedule will return true if scheduling can be achieved using one of the inner, outer, or combined heuristics. The system is already aware of the specific heuristic to utilize. However, rather than directly accessing the appropriate heuristic, it employs a broader interface, getPersistentHeuristics. This interface conducts further analysis and then directs to one of the inner, outer, or combined heuristics. This is also the case for schedulePersistentKernel, which serves as a universal interface for the three distinct heuristics. By transitioning from these general interfaces to individual interfaces tailored for each heuristic, we might achieve a cleaner code structure. The change will be like:

liqiangxl requested a review from naoyam April 22, 2023 01:15

liqiangxl force-pushed the llu/ln_bwd branch from a109222 to 68278d2 Compare April 22, 2023 01:34

naoyam reviewed Apr 22, 2023

View reviewed changes

test/test_gpu_combined_inner_outer_reduction.cpp Outdated

Copy link

Collaborator

naoyam Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename the test file to test_combined_inner_outer_reduction.cpp

naoyam reviewed Apr 22, 2023

View reviewed changes

naoyam approved these changes Apr 22, 2023

View reviewed changes

liqiangxl added 4 commits April 25, 2023 12:55

patch from old repo

c7da17f

rename test

6a88960

reduce test size

988b07e

bump tolerance

a097f77

liqiangxl force-pushed the llu/ln_bwd branch from 31b626c to a097f77 Compare April 25, 2023 19:55

liqiangxl merged commit 0250132 into main Apr 25, 2023

liqiangxl deleted the llu/ln_bwd branch April 25, 2023 21:44

csarofeen reviewed Aug 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llu/ln bwd #207

Llu/ln bwd #207

liqiangxl commented Apr 22, 2023

naoyam Apr 22, 2023

naoyam Apr 22, 2023

naoyam left a comment

liqiangxl commented Apr 24, 2023

liqiangxl commented Apr 24, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

csarofeen left a comment

naoyam commented Aug 5, 2023

csarofeen commented Aug 6, 2023

csarofeen commented Aug 6, 2023

liqiangxl commented Aug 7, 2023

Llu/ln bwd #207

Llu/ln bwd #207

Conversation

liqiangxl commented Apr 22, 2023

naoyam Apr 22, 2023

Choose a reason for hiding this comment

naoyam Apr 22, 2023

Choose a reason for hiding this comment

naoyam left a comment

Choose a reason for hiding this comment

liqiangxl commented Apr 24, 2023

liqiangxl commented Apr 24, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

liqiangxl commented Apr 25, 2023

csarofeen left a comment

Choose a reason for hiding this comment

naoyam commented Aug 5, 2023

csarofeen commented Aug 6, 2023

csarofeen commented Aug 6, 2023

liqiangxl commented Aug 7, 2023