Ifu 2024 01 05 #54

liligwu · 2024-01-16T18:06:03Z

No description provided.

Summary: Pull Request resolved: pytorch#2215 Fix Could not find any similar ops to fbgemm::new_unified_tensor.: P876042425 At model loading using IntNBitTableBatchedEmbeddingBagsCodegen, we got Could not find any similar ops to fbgemm::new_unified_tensor. P876042425 The error line https://fburl.com/code/j41vcjg1 means we lack dependency of new_unified_tensor in full cpu predictor build. This diff adds the dep. Reviewed By: jiayisuse Differential Revision: D52176309 fbshipit-source-id: a8cf6c077d0df20566d9ab877dc32411fe065402

Summary: - Support installing PyTorch packages from different channels Pull Request resolved: pytorch#2219 Reviewed By: spcyppt Differential Revision: D52188887 Pulled By: q10 fbshipit-source-id: ec74a400ead52d76284d04c351d3059435eb25aa

…ghted_cuda (pytorch#2205) Summary: Pull Request resolved: pytorch#2205 Title. Also the split_embedding_codegen_forward_[un]weighted_cuda ops as PT2 compliant (and mark split_embedding_codegen_lookup_{} functions I may have missed). Reviewed By: zou3519 Differential Revision: D52067413 fbshipit-source-id: bffde107a4ee6b42260b58c5f4530b23e7af34ef

Summary: - Clean up PIP intstall scripts Pull Request resolved: pytorch#2220 Reviewed By: spcyppt Differential Revision: D52223334 Pulled By: q10 fbshipit-source-id: 2c3021bfb570cd71061e320f2aa784eadf890184

Summary: Pull Request resolved: pytorch#2225 It passes all tests. Reviewed By: williamwen42 Differential Revision: D52256116 fbshipit-source-id: 0effe78581a78b439da0e4c59d55081fbdca0c17

Summary: Pull Request resolved: pytorch#2224 It needed an abstract impl. Reviewed By: williamwen42 Differential Revision: D52256098 fbshipit-source-id: 0bd7a37c13b23f42e0695a94307e1cbe90c5fac0

Summary: Pull Request resolved: pytorch#2223 This macro checks a macro in torch/library.h. We need to import torch.library.h first, otherwise we erroneously set the macro to nothing. Reviewed By: williamwen42 Differential Revision: D52256752 fbshipit-source-id: 50a8697509d88a07381a05152aea3516145b99b9

Summary: Pull Request resolved: pytorch#2228 Default values are not set for scheduled case, causing error https://github.com/pytorch/FBGEMM/actions/runs/723279023. `github.event.inputs` are available to workflows triggered by the `workflow_dispatch` event only (https://stackoverflow.com/questions/72539900/schedule-trigger-github-action-workflow-with-input-parameters). Reviewed By: q10 Differential Revision: D52279882 fbshipit-source-id: 11b4dae8942450e849ab38d5a9045eb333f9b661

Summary: Pull Request resolved: pytorch#2226 FBGEMM kernel implementation for CowClip optimizer (https://arxiv.org/pdf/2204.06240.pdf). It is based on counter-sgd to reuse the counter state. {F1183660363} Reviewed By: sryap Differential Revision: D52268946 fbshipit-source-id: 65378409c02957baccaaf710a319c4885068e39f

…h#2221) Summary: Pull Request resolved: pytorch#2221 We need a new buck mode for fbgemm to specify fbgemm inference mode and then include dependency based on this and not include training related dependcies. To enable fbgemm inference *only* mode, we can pass this in buck command line: -c fbcode.fbgemm_inference_mode=True Reviewed By: sryap, jianyuh Differential Revision: D52231398 fbshipit-source-id: 6bd27718aadf0d8a52320fea85e07755f73da9de

Summary: - Move general build, installation, and test documentation into Sphinx Pull Request resolved: pytorch#2227 Reviewed By: spcyppt Differential Revision: D52323411 Pulled By: q10 fbshipit-source-id: acf3f71af2241d1da7cd5092d1f3520afa14d367

…pliant (pytorch#2231) Summary: Pull Request resolved: pytorch#2231 The previous abstract impl was completely bogus. This diff fixes it. Reviewed By: williamwen42 Differential Revision: D52265254 fbshipit-source-id: 93d630c57c862030d9afa333dfedd4dcd33013d0

Summary: Post-script on Nova was not updated to match recent changes to OSS build and test scripts, so testings were not executed on Nova. This diff fixes such that testings are run correctly. Pull Request resolved: pytorch#2233 Reviewed By: q10 Differential Revision: D52377515 fbshipit-source-id: d38605ccfff8f94f0d02d0a96697e73a45ece39a

Summary: - Update documentation on adding Python and C++ documentation - Add extensive documentation for `cumem_utils` Pull Request resolved: pytorch#2232 Reviewed By: spcyppt Differential Revision: D52393909 Pulled By: q10 fbshipit-source-id: 8d4561135b79d1e5b791e1e9204d8c8b81d3be4e

Summary: ROCm builds failed with the following errors on CI - https://github.com/pytorch/FBGEMM/actions/runs/7329180569 - https://github.com/pytorch/FBGEMM/actions/runs/7308329287 ``` /__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:55:15: error: expected ')' "%04" PRIu64 ":%02" PRIu64 ":%02" PRIu64 ".%0" PRIu64, ^ /__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:53:12: note: to match this '(' sprintf( ^ 1 error generated when compiling for gfx908. CMake Error at fbgemm_gpu_py_generated_topology_utils_hip.cpp.o.cmake:200 (message): Error generating file /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.8/cmake-build/CMakeFiles/fbgemm_gpu_py.dir/src/./fbgemm_gpu_py_generated_topology_utils_hip.cpp.o ``` This is probably due to a header being removed in the latest torch nightly. This diff explicitly adds the header. Reviewed By: q10 Differential Revision: D52420862 fbshipit-source-id: 0ac49b3f32536f4f57638b34ab84459d925b039b

Summary: Pull Request resolved: pytorch#2218 Pull Request resolved: pytorch#2187 Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Reviewed By: sryap, spcyppt Differential Revision: D51865590 fbshipit-source-id: 176b4ff457a392d3f04cfe167f70bd2300cea044

Summary: Pull Request resolved: pytorch#2235 Unblock of fbgemm TBE (inference, training) usages on AMD GPUs . Reviewed By: zoranzhao, houseroad Differential Revision: D52425243 fbshipit-source-id: e5cf49222945f091b89e2690ea210b97f1c2e1f5

Summary: Pull Request resolved: pytorch#2236 - Switch to hip related TARGETS (w/ _hip suffix) when AMD GPU build is used. - Add "supports_python_dlopen = True," to support dlopen on related deps. - Add missing deps like `"//deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_hip",` Reviewed By: q10, zoranzhao Differential Revision: D52435932 fbshipit-source-id: 7ad845f294b49c4bf69f120ed26a0e6742b6ce48

Summary: Pull Request resolved: pytorch#2238 For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898 fbshipit-source-id: 4f63ca98fbcbe2dbbeb75021d06c74ea54a66375

Summary: - Add overview documentation for Jagged Tensor Ops - Add more docstrings for quantize ops Pull Request resolved: pytorch#2237 Test Plan: https://deploy-preview-2237--pytorch-fbgemm-docs.netlify.app/ Reviewed By: spcyppt Differential Revision: D52452267 Pulled By: q10 fbshipit-source-id: 3430e09859b2b5e8dcb20ce82aad8596523b41cc

Summary: Pull Request resolved: pytorch#2240 Reviewed By: sryap Differential Revision: D52469670 fbshipit-source-id: ebad4580a4b653967cbf0fcd15c8ebd4908aa80d

Summary: - Re-structure the Python documentation Pull Request resolved: pytorch#2239 Reviewed By: spcyppt Differential Revision: D52495567 Pulled By: q10 fbshipit-source-id: a46406c8755c61cee0dae6d6e06805f5f31f6afd

Summary: Pull Request resolved: pytorch#2243 Add `WeightDecayMode.COWCLIP` to activate CowClip from front end. Other related hyperparameters are also added to the interface. Reviewed By: sryap Differential Revision: D52495246 fbshipit-source-id: fee14060ad4f4af5ba28544b7a9173737380c8d0

Summary: Pull Request resolved: pytorch#2245 Enable VBE for `rowwise_adagrad_with_counter` Reviewed By: sryap Differential Revision: D52517415 fbshipit-source-id: 75daf25ec85f9eff96030d9ef4f955ff91b84e9c

Summary: - Append FBGEMM CPU documentation to the generated Sphinx docs - Re-organize the documentation in the front page Pull Request resolved: pytorch#2244 Reviewed By: spcyppt Differential Revision: D52528266 Pulled By: q10 fbshipit-source-id: 36ab286795a01d3ce1a83dc7ca5d674069e81132

Zheng Yan and others added 26 commits December 15, 2023 09:02

Enhancements to PIP install scripts, pt 2 (pytorch#2220)

a535f22

Summary: - Clean up PIP intstall scripts Pull Request resolved: pytorch#2220 Reviewed By: spcyppt Differential Revision: D52223334 Pulled By: q10 fbshipit-source-id: 2c3021bfb570cd71061e320f2aa784eadf890184

Mark fbgemm:bounds_check_indices as pt2_compliant (pytorch#2225)

f259f44

Summary: Pull Request resolved: pytorch#2225 It passes all tests. Reviewed By: williamwen42 Differential Revision: D52256116 fbshipit-source-id: 0effe78581a78b439da0e4c59d55081fbdca0c17

make fbgemm::tbe_input_combine_with_length PT2 compliant (pytorch#2224)

eaa2db7

Summary: Pull Request resolved: pytorch#2224 It needed an abstract impl. Reviewed By: williamwen42 Differential Revision: D52256098 fbshipit-source-id: 0bd7a37c13b23f42e0695a94307e1cbe90c5fac0

Fix fbgemm TBE building issues (pytorch#2235)

a3b44fd

Summary: Pull Request resolved: pytorch#2235 Unblock of fbgemm TBE (inference, training) usages on AMD GPUs . Reviewed By: zoranzhao, houseroad Differential Revision: D52425243 fbshipit-source-id: e5cf49222945f091b89e2690ea210b97f1c2e1f5

Address half->int build error (pytorch#2240)

905ff66

Summary: Pull Request resolved: pytorch#2240 Reviewed By: sryap Differential Revision: D52469670 fbshipit-source-id: ebad4580a4b653967cbf0fcd15c8ebd4908aa80d

Re-structure the Python documentation (pytorch#2239)

441697c

Summary: - Re-structure the Python documentation Pull Request resolved: pytorch#2239 Reviewed By: spcyppt Differential Revision: D52495567 Pulled By: q10 fbshipit-source-id: a46406c8755c61cee0dae6d6e06805f5f31f6afd

support VBE (pytorch#2245)

7c4e944

Summary: Pull Request resolved: pytorch#2245 Enable VBE for `rowwise_adagrad_with_counter` Reviewed By: sryap Differential Revision: D52517415 fbshipit-source-id: 75daf25ec85f9eff96030d9ef4f955ff91b84e9c

Merge remote-tracking branch 'upstream/main' into IFU-2024-01-05

9aefa3f

liligwu self-assigned this Jan 16, 2024

liligwu merged commit f53b42e into main Jan 16, 2024
24 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ifu 2024 01 05 #54

Ifu 2024 01 05 #54

liligwu commented Jan 16, 2024

Ifu 2024 01 05 #54

Ifu 2024 01 05 #54

Conversation

liligwu commented Jan 16, 2024