Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ifu 2024 01 05 #54

Merged
merged 26 commits into from
Jan 16, 2024
Merged

Ifu 2024 01 05 #54

merged 26 commits into from
Jan 16, 2024

Conversation

liligwu
Copy link
Collaborator

@liligwu liligwu commented Jan 16, 2024

No description provided.

Zheng Yan and others added 26 commits December 15, 2023 09:02
Summary:
Pull Request resolved: pytorch#2215

Fix Could not find any similar ops to fbgemm::new_unified_tensor.: P876042425
At model loading using IntNBitTableBatchedEmbeddingBagsCodegen, we got  Could not find any similar ops to fbgemm::new_unified_tensor.
P876042425
The error line https://fburl.com/code/j41vcjg1 means we lack dependency of new_unified_tensor in full cpu predictor build. This diff adds the dep.

Reviewed By: jiayisuse

Differential Revision: D52176309

fbshipit-source-id: a8cf6c077d0df20566d9ab877dc32411fe065402
Summary:
- Support installing PyTorch packages from different channels

Pull Request resolved: pytorch#2219

Reviewed By: spcyppt

Differential Revision: D52188887

Pulled By: q10

fbshipit-source-id: ec74a400ead52d76284d04c351d3059435eb25aa
…ghted_cuda (pytorch#2205)

Summary:
Pull Request resolved: pytorch#2205

Title. Also the split_embedding_codegen_forward_[un]weighted_cuda ops as PT2 compliant (and mark split_embedding_codegen_lookup_{} functions I may have missed).

Reviewed By: zou3519

Differential Revision: D52067413

fbshipit-source-id: bffde107a4ee6b42260b58c5f4530b23e7af34ef
Summary:
- Clean up PIP intstall scripts

Pull Request resolved: pytorch#2220

Reviewed By: spcyppt

Differential Revision: D52223334

Pulled By: q10

fbshipit-source-id: 2c3021bfb570cd71061e320f2aa784eadf890184
Summary:
Pull Request resolved: pytorch#2225

It passes all tests.

Reviewed By: williamwen42

Differential Revision: D52256116

fbshipit-source-id: 0effe78581a78b439da0e4c59d55081fbdca0c17
Summary:
Pull Request resolved: pytorch#2224

It needed an abstract impl.

Reviewed By: williamwen42

Differential Revision: D52256098

fbshipit-source-id: 0bd7a37c13b23f42e0695a94307e1cbe90c5fac0
Summary:
Pull Request resolved: pytorch#2223

This macro checks a macro in torch/library.h.
We need to import torch.library.h first, otherwise we erroneously set the macro
to nothing.

Reviewed By: williamwen42

Differential Revision: D52256752

fbshipit-source-id: 50a8697509d88a07381a05152aea3516145b99b9
Summary:
Pull Request resolved: pytorch#2228

Default values are not set for scheduled case, causing error https://github.com/pytorch/FBGEMM/actions/runs/723279023.

`github.event.inputs` are available to workflows triggered by the `workflow_dispatch` event only
(https://stackoverflow.com/questions/72539900/schedule-trigger-github-action-workflow-with-input-parameters).

Reviewed By: q10

Differential Revision: D52279882

fbshipit-source-id: 11b4dae8942450e849ab38d5a9045eb333f9b661
Summary:
Pull Request resolved: pytorch#2226

FBGEMM kernel implementation for CowClip optimizer (https://arxiv.org/pdf/2204.06240.pdf). It is based on counter-sgd to reuse the counter state.

 {F1183660363}

Reviewed By: sryap

Differential Revision: D52268946

fbshipit-source-id: 65378409c02957baccaaf710a319c4885068e39f
…h#2221)

Summary:
Pull Request resolved: pytorch#2221

We need a new buck mode for fbgemm to specify fbgemm inference mode and then include dependency based on this and not include training related dependcies.

To enable fbgemm inference *only* mode, we can pass this in buck command line:
   -c fbcode.fbgemm_inference_mode=True

Reviewed By: sryap, jianyuh

Differential Revision: D52231398

fbshipit-source-id: 6bd27718aadf0d8a52320fea85e07755f73da9de
Summary:
- Move general build, installation, and test documentation into Sphinx

Pull Request resolved: pytorch#2227

Reviewed By: spcyppt

Differential Revision: D52323411

Pulled By: q10

fbshipit-source-id: acf3f71af2241d1da7cd5092d1f3520afa14d367
…pliant (pytorch#2231)

Summary:
Pull Request resolved: pytorch#2231

The previous abstract impl was completely bogus. This diff fixes it.

Reviewed By: williamwen42

Differential Revision: D52265254

fbshipit-source-id: 93d630c57c862030d9afa333dfedd4dcd33013d0
Summary:
Post-script on Nova was not updated to match recent changes to OSS build and test scripts, so testings were not executed on Nova. This diff fixes such that testings are run correctly.

Pull Request resolved: pytorch#2233

Reviewed By: q10

Differential Revision: D52377515

fbshipit-source-id: d38605ccfff8f94f0d02d0a96697e73a45ece39a
Summary:
- Update documentation on adding Python and C++ documentation
- Add extensive documentation for `cumem_utils`

Pull Request resolved: pytorch#2232

Reviewed By: spcyppt

Differential Revision: D52393909

Pulled By: q10

fbshipit-source-id: 8d4561135b79d1e5b791e1e9204d8c8b81d3be4e
Summary:
ROCm builds failed with the following errors on CI
- https://github.com/pytorch/FBGEMM/actions/runs/7329180569
- https://github.com/pytorch/FBGEMM/actions/runs/7308329287

```
/__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:55:15: error: expected ')'
        "%04" PRIu64 ":%02" PRIu64 ":%02" PRIu64 ".%0" PRIu64,
              ^
/__w/FBGEMM/FBGEMM/fbgemm_gpu/src/topology_utils_hip.cpp:53:12: note: to match this '('
    sprintf(
           ^
1 error generated when compiling for gfx908.
CMake Error at fbgemm_gpu_py_generated_topology_utils_hip.cpp.o.cmake:200 (message):
  Error generating file
  /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.8/cmake-build/CMakeFiles/fbgemm_gpu_py.dir/src/./fbgemm_gpu_py_generated_topology_utils_hip.cpp.o
```

This is probably due to a header being removed in the latest torch nightly. This diff explicitly adds the header.

Reviewed By: q10

Differential Revision: D52420862

fbshipit-source-id: 0ac49b3f32536f4f57638b34ab84459d925b039b
Summary:
Pull Request resolved: pytorch#2218

Pull Request resolved: pytorch#2187

Rewrite the kernel to use cache_hit_rate enum as template argument.  We first check if the cache is empty and pass that value as a template argument.  Inside the first kernel, we then determine the cache conflict miss rate, and use this value to as a template parameter when invoking the second kernel, which performs the actual lookup work.

We pass in uvm_cache_stats as a run-time argument here instead of passing the cache miss rate as a compile-time argument, because uvm_cache_stats data is only available on the GPU, and incoking a templatized kernel with the cache miss rate as a template argument requires the cache misse information to first be passed back to the host, which is an expensive operation.

This is based on the earlier work in stacks D48937380 and D49675672, which have been based on very outdated branches of fbcode.

Reviewed By: sryap, spcyppt

Differential Revision: D51865590

fbshipit-source-id: 176b4ff457a392d3f04cfe167f70bd2300cea044
Summary:
Pull Request resolved: pytorch#2235

Unblock of fbgemm TBE (inference, training) usages on AMD GPUs .

Reviewed By: zoranzhao, houseroad

Differential Revision: D52425243

fbshipit-source-id: e5cf49222945f091b89e2690ea210b97f1c2e1f5
Summary:
Pull Request resolved: pytorch#2236

- Switch to hip related TARGETS (w/ _hip suffix) when AMD GPU build is used.
- Add "supports_python_dlopen = True," to support dlopen on related deps.
- Add missing deps like `"//deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_hip",`

Reviewed By: q10, zoranzhao

Differential Revision: D52435932

fbshipit-source-id: 7ad845f294b49c4bf69f120ed26a0e6742b6ce48
Summary:
Pull Request resolved: pytorch#2238

For bf16 related cuda code, we have the following macro to distinguish between v100 vs. a100 (pre-a100 cuda/NV GPU doesn't support BF16):
```
#if !(                                                  \
    ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || \
     (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800))))
```
macro.

For AMD GPU (rocm), it will lead to always false. However, on the MI250 / MI300 GPU we have in house, they have BF16 supports. We re-enable BF16 for RoCM related usages.

Reviewed By: houseroad, jiawenliu64

Differential Revision: D52438898

fbshipit-source-id: 4f63ca98fbcbe2dbbeb75021d06c74ea54a66375
Summary:
- Add overview documentation for Jagged Tensor Ops
- Add more docstrings for quantize ops

Pull Request resolved: pytorch#2237

Test Plan: https://deploy-preview-2237--pytorch-fbgemm-docs.netlify.app/

Reviewed By: spcyppt

Differential Revision: D52452267

Pulled By: q10

fbshipit-source-id: 3430e09859b2b5e8dcb20ce82aad8596523b41cc
Summary: Pull Request resolved: pytorch#2240

Reviewed By: sryap

Differential Revision: D52469670

fbshipit-source-id: ebad4580a4b653967cbf0fcd15c8ebd4908aa80d
Summary:
- Re-structure the Python documentation

Pull Request resolved: pytorch#2239

Reviewed By: spcyppt

Differential Revision: D52495567

Pulled By: q10

fbshipit-source-id: a46406c8755c61cee0dae6d6e06805f5f31f6afd
Summary:
Pull Request resolved: pytorch#2243

Add `WeightDecayMode.COWCLIP` to activate CowClip from front end. Other related hyperparameters are also added to the interface.

Reviewed By: sryap

Differential Revision: D52495246

fbshipit-source-id: fee14060ad4f4af5ba28544b7a9173737380c8d0
Summary:
Pull Request resolved: pytorch#2245

Enable VBE for `rowwise_adagrad_with_counter`

Reviewed By: sryap

Differential Revision: D52517415

fbshipit-source-id: 75daf25ec85f9eff96030d9ef4f955ff91b84e9c
Summary:
- Append FBGEMM CPU documentation to the generated Sphinx docs

- Re-organize the documentation in the front page

Pull Request resolved: pytorch#2244

Reviewed By: spcyppt

Differential Revision: D52528266

Pulled By: q10

fbshipit-source-id: 36ab286795a01d3ce1a83dc7ca5d674069e81132
@liligwu liligwu self-assigned this Jan 16, 2024
@liligwu liligwu merged commit f53b42e into main Jan 16, 2024
24 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants