forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifu 2024 01 05 #54
Merged
Merged
Ifu 2024 01 05 #54
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Pull Request resolved: pytorch#2215 Fix Could not find any similar ops to fbgemm::new_unified_tensor.: P876042425 At model loading using IntNBitTableBatchedEmbeddingBagsCodegen, we got Could not find any similar ops to fbgemm::new_unified_tensor. P876042425 The error line https://fburl.com/code/j41vcjg1 means we lack dependency of new_unified_tensor in full cpu predictor build. This diff adds the dep. Reviewed By: jiayisuse Differential Revision: D52176309 fbshipit-source-id: a8cf6c077d0df20566d9ab877dc32411fe065402
Summary: - Support installing PyTorch packages from different channels Pull Request resolved: pytorch#2219 Reviewed By: spcyppt Differential Revision: D52188887 Pulled By: q10 fbshipit-source-id: ec74a400ead52d76284d04c351d3059435eb25aa
…ghted_cuda (pytorch#2205) Summary: Pull Request resolved: pytorch#2205 Title. Also the split_embedding_codegen_forward_[un]weighted_cuda ops as PT2 compliant (and mark split_embedding_codegen_lookup_{} functions I may have missed). Reviewed By: zou3519 Differential Revision: D52067413 fbshipit-source-id: bffde107a4ee6b42260b58c5f4530b23e7af34ef
Summary: - Clean up PIP intstall scripts Pull Request resolved: pytorch#2220 Reviewed By: spcyppt Differential Revision: D52223334 Pulled By: q10 fbshipit-source-id: 2c3021bfb570cd71061e320f2aa784eadf890184
Summary: Pull Request resolved: pytorch#2225 It passes all tests. Reviewed By: williamwen42 Differential Revision: D52256116 fbshipit-source-id: 0effe78581a78b439da0e4c59d55081fbdca0c17
Summary: Pull Request resolved: pytorch#2224 It needed an abstract impl. Reviewed By: williamwen42 Differential Revision: D52256098 fbshipit-source-id: 0bd7a37c13b23f42e0695a94307e1cbe90c5fac0
Summary: Pull Request resolved: pytorch#2223 This macro checks a macro in torch/library.h. We need to import torch.library.h first, otherwise we erroneously set the macro to nothing. Reviewed By: williamwen42 Differential Revision: D52256752 fbshipit-source-id: 50a8697509d88a07381a05152aea3516145b99b9
Summary: Pull Request resolved: pytorch#2228 Default values are not set for scheduled case, causing error https://github.com/pytorch/FBGEMM/actions/runs/723279023. `github.event.inputs` are available to workflows triggered by the `workflow_dispatch` event only (https://stackoverflow.com/questions/72539900/schedule-trigger-github-action-workflow-with-input-parameters). Reviewed By: q10 Differential Revision: D52279882 fbshipit-source-id: 11b4dae8942450e849ab38d5a9045eb333f9b661
Summary: Pull Request resolved: pytorch#2226 FBGEMM kernel implementation for CowClip optimizer (https://arxiv.org/pdf/2204.06240.pdf). It is based on counter-sgd to reuse the counter state. {F1183660363} Reviewed By: sryap Differential Revision: D52268946 fbshipit-source-id: 65378409c02957baccaaf710a319c4885068e39f
…h#2221) Summary: Pull Request resolved: pytorch#2221 We need a new buck mode for fbgemm to specify fbgemm inference mode and then include dependency based on this and not include training related dependcies. To enable fbgemm inference *only* mode, we can pass this in buck command line: -c fbcode.fbgemm_inference_mode=True Reviewed By: sryap, jianyuh Differential Revision: D52231398 fbshipit-source-id: 6bd27718aadf0d8a52320fea85e07755f73da9de
Summary: - Move general build, installation, and test documentation into Sphinx Pull Request resolved: pytorch#2227 Reviewed By: spcyppt Differential Revision: D52323411 Pulled By: q10 fbshipit-source-id: acf3f71af2241d1da7cd5092d1f3520afa14d367
…pliant (pytorch#2231) Summary: Pull Request resolved: pytorch#2231 The previous abstract impl was completely bogus. This diff fixes it. Reviewed By: williamwen42 Differential Revision: D52265254 fbshipit-source-id: 93d630c57c862030d9afa333dfedd4dcd33013d0
Summary: Post-script on Nova was not updated to match recent changes to OSS build and test scripts, so testings were not executed on Nova. This diff fixes such that testings are run correctly. Pull Request resolved: pytorch#2233 Reviewed By: q10 Differential Revision: D52377515 fbshipit-source-id: d38605ccfff8f94f0d02d0a96697e73a45ece39a
Summary: - Update documentation on adding Python and C++ documentation - Add extensive documentation for `cumem_utils` Pull Request resolved: pytorch#2232 Reviewed By: spcyppt Differential Revision: D52393909 Pulled By: q10 fbshipit-source-id: 8d4561135b79d1e5b791e1e9204d8c8b81d3be4e
Summary: ROCm builds failed with the following errors on CI - https://github.com/pytorch/FBGEMM/actions/runs/7329180569 - https://github.com/pytorch/FBGEMM/actions/runs/7308329287 ``` /__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:55:15: error: expected ')' "%04" PRIu64 ":%02" PRIu64 ":%02" PRIu64 ".%0" PRIu64, ^ /__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:53:12: note: to match this '(' sprintf( ^ 1 error generated when compiling for gfx908. CMake Error at fbgemm_gpu_py_generated_topology_utils_hip.cpp.o.cmake:200 (message): Error generating file /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.8/cmake-build/CMakeFiles/fbgemm_gpu_py.dir/src/./fbgemm_gpu_py_generated_topology_utils_hip.cpp.o ``` This is probably due to a header being removed in the latest torch nightly. This diff explicitly adds the header. Reviewed By: q10 Differential Revision: D52420862 fbshipit-source-id: 0ac49b3f32536f4f57638b34ab84459d925b039b
Summary: Pull Request resolved: pytorch#2218 Pull Request resolved: pytorch#2187 Rewrite the kernel to use cache_hit_rate enum as template argument. We first check if the cache is empty and pass that value as a template argument. Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work. We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation. This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode. Reviewed By: sryap, spcyppt Differential Revision: D51865590 fbshipit-source-id: 176b4ff457a392d3f04cfe167f70bd2300cea044
Summary: Pull Request resolved: pytorch#2235 Unblock of fbgemm TBE (inference, training) usages on AMD GPUs . Reviewed By: zoranzhao, houseroad Differential Revision: D52425243 fbshipit-source-id: e5cf49222945f091b89e2690ea210b97f1c2e1f5
Summary: Pull Request resolved: pytorch#2236 - Switch to hip related TARGETS (w/ _hip suffix) when AMD GPU build is used. - Add "supports_python_dlopen = True," to support dlopen on related deps. - Add missing deps like `"//deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_hip",` Reviewed By: q10, zoranzhao Differential Revision: D52435932 fbshipit-source-id: 7ad845f294b49c4bf69f120ed26a0e6742b6ce48
Summary: Pull Request resolved: pytorch#2238 For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16): ``` #if !( \ ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \ (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) ``` macro. For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages. Reviewed By: houseroad, jiawenliu64 Differential Revision: D52438898 fbshipit-source-id: 4f63ca98fbcbe2dbbeb75021d06c74ea54a66375
Summary: - Add overview documentation for Jagged Tensor Ops - Add more docstrings for quantize ops Pull Request resolved: pytorch#2237 Test Plan: https://deploy-preview-2237--pytorch-fbgemm-docs.netlify.app/ Reviewed By: spcyppt Differential Revision: D52452267 Pulled By: q10 fbshipit-source-id: 3430e09859b2b5e8dcb20ce82aad8596523b41cc
Summary: Pull Request resolved: pytorch#2240 Reviewed By: sryap Differential Revision: D52469670 fbshipit-source-id: ebad4580a4b653967cbf0fcd15c8ebd4908aa80d
Summary: - Re-structure the Python documentation Pull Request resolved: pytorch#2239 Reviewed By: spcyppt Differential Revision: D52495567 Pulled By: q10 fbshipit-source-id: a46406c8755c61cee0dae6d6e06805f5f31f6afd
Summary: Pull Request resolved: pytorch#2243 Add `WeightDecayMode.COWCLIP` to activate CowClip from front end. Other related hyperparameters are also added to the interface. Reviewed By: sryap Differential Revision: D52495246 fbshipit-source-id: fee14060ad4f4af5ba28544b7a9173737380c8d0
Summary: Pull Request resolved: pytorch#2245 Enable VBE for `rowwise_adagrad_with_counter` Reviewed By: sryap Differential Revision: D52517415 fbshipit-source-id: 75daf25ec85f9eff96030d9ef4f955ff91b84e9c
Summary: - Append FBGEMM CPU documentation to the generated Sphinx docs - Re-organize the documentation in the front page Pull Request resolved: pytorch#2244 Reviewed By: spcyppt Differential Revision: D52528266 Pulled By: q10 fbshipit-source-id: 36ab286795a01d3ce1a83dc7ca5d674069e81132
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.