From de846c595db312af348c18177794db313948f6c2 Mon Sep 17 00:00:00 2001
From: Christopher Bate <cbate@nvidia.com>
Date: Wed, 18 Dec 2024 12:15:26 -0700
Subject: [PATCH] Move internal changes (#455)

## NFC: Simplify some aspects of options management (OptionsContext)

- Adds a convenience 'OptionsContext::Option' class that simplifies how
  options are declared.

- Closes a loophole where tuples of structs containing options can cause
  crashes if they populate options in their constructor. Due to how the
external storage mechanism works, we can no longer use direct
`std::tuple`
  of aggregate objects which invoke `addOption`. Instead, one must use
`unique_ptr` to wrap those types when used as elements of a
`std::tuple`.

- To help enforce this, we explicitly delete the move constructor of
  `OptionsProvider`.

## [compiler|python] Update how cached pipelines/"Compiler Tasks" are
registered

This change updates how registration functions for  "compilation tasks"
invoked. We now expose a C API method that can be invoked within the
Pybind11 module initializer. This decouples compiler task registration
from
pass or dialect registration.

This change also cleans up the C API function naming for pass/dialect
registration functions.

## [python] Add more robust CMake logic for fixing missing CAPI
dependency in core MLIR PyBind module

Adds CMake logic to ensure that the Core '_mlir' pybind extension has
the
correct CAPI dependencies declared until the upstream CMake declarations
can be
fixed.

## NFC: Remove unnecessary PyCapsule <-> CAPI casters in compiler and
runtime bindings

Removes unnecessary custom PyBind11 capsule -> C API object casters.
These cast functions are only required when it is desired to allow
PyBind11 to extract the C API object from the C++ python wrapper type
automatically.

## [tensorrt|compiler] Drop "layer metadata callback" utility from
TensorRT translation

This change removes the "layer metadata callback" feature from the
MLIR-to-TensorRT translation. It also removes the relevant APIs from the
MLIR-TensorRT compiler's C++ and Python APIs.

This capability was originally offered as a bridge for populating the
generated TensorRT ILayers with custom metadata. However, the mechanism
prevents caching of pass pipelines and therefore is too expensive to
use.

In the future, any metadata passed to TensorRT should be derived from
the MLIR operations' location information.

## NFC: update various uses of "Stablehlo" in class and function names
to have consistent capitalization

## NFC: Reorganize some directories

This change:

- Moves the top-level 'tools' to 'compiler/tools'
- Moves the top-level 'test' to 'compiler/test'
- Moves the 'mlir-tensorrt-tblgen' tool under 'tensorrt/tools'
  since the 'tensorrt' project is supposed to be independent.
- Similarly move TensorRT-specific python definitions under
`tensorrt/python`.

## [executor]: Add a missing guard for builds without CUDA enabled.

Wrapping the makeCudaStringError function with MLIR_EXECUTOR_ENABLE_CUDA
fixes builds without CUDA enabled.

## [executor] Use Lua locals for block arguments

Previously, the Executor MLIR-to-Lua translator used Lua globals for
block arguments outside of the entry block since the values that
represent block arguments need to be passed between blocks. On the
other hand, the scope of Lua local variables are restricted to their
block. It is almost never a good idea to use Lua global variables in
our translation strategy, however -- for coroutine functions, a
translation that uses globals is obviously incorrect since all Lua
coroutines in a single Lua environment share the same set of globals.

This change declares all block arguments up front as locals in the
"entry block" and just sets them to `nil` initially. Since we don't
declare a block scope for the entry block, all the following Lua block
scopes will have these locals in scope. This allows us to retain the
use of locals for all block arguments.

GitOrigin-RevId: e9dd03c47eab6145e889ea8ff56fd1c71181f72a
---
 .github/workflows/mlir-tensorrt-ci.yml        |   4 +-
 mlir-tensorrt/CMakeLists.txt                  |   6 -
 .../build_tools/cmake/ManagedLLVM.cmake       |  15 ++
 mlir-tensorrt/compiler/CMakeLists.txt         |   4 +
 .../mlir-tensorrt-c/Compiler/Compiler.h       |  14 --
 .../Registration/RegisterAllDialects.h        |   7 +-
 .../Compiler/StableHloToExecutable.h          |  36 ++---
 .../TensorRTExtension/TensorRTExtension.h     |   4 +-
 .../Registration/RegisterMlirTensorRtPasses.h |   1 -
 .../compiler/lib/CAPI/Compiler/Compiler.cpp   |  46 ++-----
 .../CAPI/Compiler/Registration/CMakeLists.txt |   1 -
 .../Registration/RegisterAllDialects.cpp      |   9 +-
 .../lib/Compiler/StableHloToExecutable.cpp    |  68 ++++------
 .../TensorRTExtension/TensorRTExtension.cpp   |   6 +-
 .../{ => compiler}/test/CMakeLists.txt        |   0
 .../CUDAToExecutor/cublas-to-executor.mlir    |   0
 .../CUDAToExecutor/cuda-to-executor.mlir      |   0
 .../chlo-to-stablehlo-ext.mlir                |   0
 .../ChloToStablehloExt/lit.local.cfg          |   0
 .../PlanToExecutor/plan-to-executor.mlir      |   0
 .../stablehlo-scalar-to-arith.mlir            |   0
 .../Conversion/StablehloToScf/lit.local.cfg   |   0
 .../StablehloToScf/stablehlo-to-scf.mlir      |   0
 .../arith-to-tensorrt.mlir                    |   0
 .../StablehloToTensorRT/chlo-to-tensorrt.mlir |   0
 .../StablehloToTensorRT/lit.local.cfg         |   0
 .../stablehlo-control-flow.mlir               |   0
 .../StablehloToTensorRT/stablehlo-conv.mlir   |   0
 .../StablehloToTensorRT/stablehlo-gather.mlir |   0
 .../stablehlo-scatter.mlir                    |   0
 .../stablehlo-to-tensorrt-invalid-trt8.mlir   |   0
 .../stablehlo-to-tensorrt-invalid.mlir        |   0
 .../stablehlo-to-tensorrt-trt10.mlir          |   0
 .../stablehlo-to-tensorrt.mlir                |   0
 .../stablehlo-to-trtsoftmax.mlir              |   0
 .../tensorrt-runtime-to-executor.mlir         |   0
 .../Conversion/TensorRTToEmitC/lit.local.cfg  |   0
 .../TensorRTToEmitC/resnet50.trt.elided.mlir  |   0
 .../TensorRTToEmitC/tensorrt-to-emitc.mlir    |   0
 .../tensorrt-to-tensorrt-runtime.mlir         |   0
 .../test/Dialect/CUDA/invalid.mlir            |   0
 .../test/Dialect/CUDA/roundtrip.mlir          |   0
 .../test/Dialect/CUDA/side-effects.mlir       |   0
 .../test/Dialect/Plan/bounds-analysis.mlir    |   0
 .../cluster-and-outline-scalarizable-ops.mlir |   0
 .../Dialect/Plan/create-closed-regions.mlir   |   0
 .../Dialect/Plan/eliminate-shape-ops.mlir     |   0
 .../test/Dialect/Plan/invalid.mlir            |   0
 ...erialize-shape-calculations-composite.mlir |   0
 .../Plan/materialize-shape-calculations.mlir  |   0
 .../test/Dialect/Plan/plan-alloc-tensors.mlir |   0
 .../test/Dialect/Plan/plan-bufferize.mlir     |   0
 .../Plan/populate-func-bounds-attrs.mlir      |   0
 .../Plan/post-clustering-validation.mlir      |   0
 .../test/Dialect/Plan/refine-types.mlir       |   0
 .../test/Dialect/Plan/roundtrip.mlir          |   0
 .../Dialect/Plan/segmentation-pipeline.mlir   |   0
 .../Plan/stablehlo-clustering-invalid.mlir    |   0
 .../Dialect/Plan/stablehlo-clustering.mlir    |   0
 .../Dialect/Plan/tensor-kind-analysis.mlir    |   0
 .../canonicalize-convolution.mlir             |   0
 .../canonicalize-dot-general.mlir             |   0
 .../StableHloExt/canonicalize-gather.mlir     |   0
 .../StableHloExt/canonicalize-scatter-nd.mlir |   0
 .../StableHloExt/canonicalize-scatter.mlir    |   0
 .../constant-folding-bitwise.mlir             |   0
 .../StableHloExt/constant-folding-elided.mlir |   0
 .../constant-folding-invalid.mlir             |   0
 .../StableHloExt/constant-folding.mlir        |   0
 .../Dialect/StableHloExt/expand-tuples.mlir   |   0
 .../Dialect/StableHloExt/gather-to-slice.mlir |   0
 .../raise-qdq-block-dequantize.mlir           |   0
 .../raise-qdq-block-quantize.mlir             |   0
 .../raise-qdq-per-channel-dequantize.mlir     |   0
 .../raise-qdq-per-channel-quantize.mlir       |   0
 .../raise-qdq-per-tensor-dequantize.mlir      |   0
 .../raise-qdq-per-tensor-quantize.mlir        |   0
 .../Dialect/StableHloExt/refine-shapes.mlir   |   0
 .../reify-ranked-shaped-type.mlir             |   0
 .../StableHloExt/tensor-kind-analysis.mlir    |   0
 .../test/Dialect/TensorRTRuntime/inliner.mlir |   0
 .../test/Dialect/TensorRTRuntime/invalid.mlir |   0
 .../TensorRTRuntime/one-shot-bufferize.mlir   |   0
 .../Dialect/TensorRTRuntime/roundtrip.mlir    |   0
 .../TensorRTRuntime/tensor-kind-analysis.mlir |   0
 .../canonicalizer-stress-test.mlir            |   0
 .../dynamic-shape-refinement.mlir             |   0
 .../dynamic-shape-simplification.mlir         |   0
 .../end-to-end-binary.mlir                    |   0
 .../end-to-end-unary.mlir                     |   0
 .../ClusteringDynamicShape/lit.local.cfg      |   0
 .../Lua/IntegrationTests/buffer-ops-bf16.mlir |   0
 .../IntegrationTests/buffer-ops-dynamic.mlir  |   0
 .../Lua/IntegrationTests/buffer-ops-f16.mlir  |   0
 .../Lua/IntegrationTests/buffer-ops-f32.mlir  |   0
 .../IntegrationTests/buffer-ops-f8E4M3FN.mlir |   0
 .../Lua/IntegrationTests/buffer-ops-i1.mlir   |   0
 .../Lua/IntegrationTests/buffer-ops-i4.mlir   |   0
 .../Target/Lua/IntegrationTests/lit.local.cfg |   0
 .../Lua/IntegrationTests/memcpy-strided.mlir  |   0
 .../Target/Lua/IntegrationTests/memcpy.mlir   |   0
 .../test/Target/Lua/lit.local.cfg             |   0
 .../drop-nested-modules.mlir                  |   0
 ...plicate-function-elimination-upstream.mlir |   0
 ...nc-ext-duplicate-function-elimination.mlir |   0
 .../memref-cast-elimination.mlir              |   0
 .../scf-detensorize-loops.mlir                |   0
 .../StablehloMatchers/lit.local.cfg           |   0
 .../test_StablehloSoftmaxMatcher.mlir         |   0
 .../test/lib/Analysis/CMakeLists.txt          |   0
 .../test/lib/Analysis/TestBoundsAnalysis.cpp  |   0
 .../lib/Analysis/TestTensorKindAnalysis.cpp   |   0
 .../{ => compiler}/test/lib/CMakeLists.txt    |   0
 .../test/lib/Transforms/CMakeLists.txt        |   0
 .../lib/Transforms/Clustering/CMakeLists.txt  |   0
 .../Clustering/ClusteringBenchmarkMain.cpp    |   0
 mlir-tensorrt/{ => compiler}/test/lit.cfg.py  |   2 +-
 .../{ => compiler}/test/lit.site.cfg.py.in    |   4 +-
 .../test/models/bert.stablehlo.elided.mlir    |   0
 .../models/gpt2.stablehlo.bs2.elided.mlir     |   0
 .../test/models/gpt2.stablehlo.elided.mlir    |   0
 .../models/llama-68m.stablehlo.elided.mlir    |   0
 .../models/llama-v2.stablehlo.elided.mlir     |   0
 .../models/resnet50.stablehlo.elided.mlir     |   0
 .../test/models/single-relu.onnx              |   0
 .../test/models/swin.stablehlo.elided.mlir    |   0
 .../models/whisper-jax.stablehlo.elided.mlir  |   0
 .../IntegrationTests/TRT10/lit.local.cfg      |   0
 .../TRT10/test_stablehlo_add.py               |   0
 .../TRT10/test_stablehlo_dynamic_iota.py      |   0
 .../python/IntegrationTests/lit.local.cfg     |   0
 .../IntegrationTests/test_call_validation.py  |   0
 .../test_executable_serialize.py              |   0
 .../IntegrationTests/test_stablehlo_add.py    |   0
 .../test_stablehlo_dynamic.py                 |   0
 .../IntegrationTests/test_type_interop.py     |   0
 .../compiler_api/test_compiler_api.py         |   0
 .../compiler_api/test_compiler_debug_dump.py  |   0
 .../compiler_api/test_options_context.py      |   4 +-
 .../compiler_api/test_plugin_schema_api.py    |   0
 .../dialects/test_stablehlo.py                |   0
 .../dialects/test_tensorrt.py                 |   0
 .../dialects/test_upstream.py                 |   0
 .../mlir_tensorrt_compiler/lit.local.cfg      |   0
 .../mlir_tensorrt_runtime/lit.local.cfg       |   0
 .../test_create_memref.py                     |   0
 .../mlir_tensorrt_runtime/test_runtime_api.py |   8 --
 .../test_runtime_debug_dump.py                |   0
 .../{ => compiler}/tools/CMakeLists.txt       |  23 +---
 .../tools/mlir-tensorrt-lsp-server.cpp}       |   0
 .../tools/mlir-tensorrt-opt.cpp}              |   0
 .../tools/mlir-tensorrt-runner.cpp}           |   0
 .../tools/mlir-tensorrt-translate.cpp}        |   0
 .../executor/lib/Support/DeviceInfo.cpp       |   2 +
 .../lib/Target/Lua/TranslateToLua.cpp         |  32 +++--
 .../IntegrationTests/control-flow-nested.mlir | 128 ++++++++++++++++++
 .../test/Translation/translate-to-lua.mlir    |  30 ++--
 mlir-tensorrt/python/CompilerPackage.cmake    |  26 +---
 .../python/bindings/CPyBindInterop.h          |  32 ++++-
 .../bindings/Compiler/CompilerPyBind.cpp      |  55 +-------
 .../bindings/Compiler/SiteInitializer.cpp     |   5 +-
 .../python/bindings/Runtime/RuntimePyBind.cpp |  92 ++-----------
 mlir-tensorrt/python/bindings/Utils.h         |   6 +-
 mlir-tensorrt/tensorrt/CMakeLists.txt         |   6 +-
 .../tensorrt/cmake/TensorRTFunctions.cmake    |   9 +-
 .../CMakeLists.txt                            |   4 +-
 .../NetworkEncoder.h                          |  13 +-
 .../Target/TranslateToTensorRT.h              |  13 +-
 .../lib/Bindings/Python}/DialectTensorRT.cpp  |   5 +-
 .../NetworkEncoder.cpp                        |   2 -
 .../lib/Target/TranslateToTensorRT.cpp        |  34 ++---
 mlir-tensorrt/tensorrt/python/CMakeLists.txt  |  67 +++++++++
 .../dialects/PythonTensorRTOps.td}            |   7 +-
 .../mlir_tensorrt}/dialects/tensorrt.py       |   0
 mlir-tensorrt/tensorrt/tools/CMakeLists.txt   |   2 +
 .../{ => tools}/tensorrt-opt/CMakeLists.txt   |   0
 .../{ => tools}/tensorrt-opt/tensorrt-opt.cpp |   0
 .../tools/tensorrt-tblgen/CMakeLists.txt      |  11 ++
 .../tensorrt-tblgen/tensorrt-tblgen.cpp}      |   4 +-
 .../test_layer_metadata_callback.py           |  95 -------------
 180 files changed, 436 insertions(+), 506 deletions(-)
 rename mlir-tensorrt/{ => compiler}/test/CMakeLists.txt (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/CUDAToExecutor/cublas-to-executor.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/CUDAToExecutor/cuda-to-executor.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/ChloToStablehloExt/chlo-to-stablehlo-ext.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/ChloToStablehloExt/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/PlanToExecutor/plan-to-executor.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloScalarToArith/stablehlo-scalar-to-arith.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToScf/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/arith-to-tensorrt.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/chlo-to-tensorrt.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-control-flow.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-conv.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-gather.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-scatter.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid-trt8.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/StablehloToTensorRT/stablehlo-to-trtsoftmax.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/TensorRTRuntimeToExecutor/tensorrt-runtime-to-executor.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/TensorRTToEmitC/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/TensorRTToEmitC/resnet50.trt.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/TensorRTToEmitC/tensorrt-to-emitc.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Conversion/TensorRTToTensorRTRuntime/tensorrt-to-tensorrt-runtime.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/CUDA/invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/CUDA/roundtrip.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/CUDA/side-effects.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/bounds-analysis.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/cluster-and-outline-scalarizable-ops.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/create-closed-regions.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/eliminate-shape-ops.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/materialize-shape-calculations-composite.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/materialize-shape-calculations.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/plan-alloc-tensors.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/plan-bufferize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/populate-func-bounds-attrs.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/post-clustering-validation.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/refine-types.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/roundtrip.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/segmentation-pipeline.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/stablehlo-clustering-invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/stablehlo-clustering.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/Plan/tensor-kind-analysis.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/canonicalize-convolution.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/canonicalize-dot-general.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/canonicalize-gather.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/canonicalize-scatter-nd.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/canonicalize-scatter.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/constant-folding-bitwise.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/constant-folding-elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/constant-folding-invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/constant-folding.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/expand-tuples.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/gather-to-slice.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-block-dequantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-block-quantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-per-channel-dequantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-per-channel-quantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-per-tensor-dequantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/raise-qdq-per-tensor-quantize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/refine-shapes.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/reify-ranked-shaped-type.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/StableHloExt/tensor-kind-analysis.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/TensorRTRuntime/inliner.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/TensorRTRuntime/invalid.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/TensorRTRuntime/one-shot-bufferize.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/TensorRTRuntime/roundtrip.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Dialect/TensorRTRuntime/tensor-kind-analysis.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Pipelines/StableHloInputPipeline/canonicalizer-stress-test.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Pipelines/StableHloInputPipeline/dynamic-shape-refinement.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Pipelines/StableHloInputPipeline/dynamic-shape-simplification.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/memcpy-strided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/IntegrationTests/memcpy.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Target/Lua/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/DropNestedModules/drop-nested-modules.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/DuplicateFunctionElimination/duplicate-function-elimination-upstream.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/DuplicateFunctionElimination/func-ext-duplicate-function-elimination.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/MemRefCastElimination/memref-cast-elimination.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/StablehloMatchers/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/Transforms/StablehloMatchers/test_StablehloSoftmaxMatcher.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Analysis/CMakeLists.txt (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Analysis/TestBoundsAnalysis.cpp (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Analysis/TestTensorKindAnalysis.cpp (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/CMakeLists.txt (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Transforms/CMakeLists.txt (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Transforms/Clustering/CMakeLists.txt (100%)
 rename mlir-tensorrt/{ => compiler}/test/lib/Transforms/Clustering/ClusteringBenchmarkMain.cpp (100%)
 rename mlir-tensorrt/{ => compiler}/test/lit.cfg.py (99%)
 rename mlir-tensorrt/{ => compiler}/test/lit.site.cfg.py.in (88%)
 rename mlir-tensorrt/{ => compiler}/test/models/bert.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/gpt2.stablehlo.bs2.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/gpt2.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/llama-68m.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/llama-v2.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/resnet50.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/single-relu.onnx (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/swin.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/models/whisper-jax.stablehlo.elided.mlir (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/TRT10/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/TRT10/test_stablehlo_add.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/TRT10/test_stablehlo_dynamic_iota.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/test_call_validation.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/test_executable_serialize.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/test_stablehlo_add.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/test_stablehlo_dynamic.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/IntegrationTests/test_type_interop.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py (89%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/dialects/test_stablehlo.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/dialects/test_tensorrt.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/dialects/test_upstream.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_compiler/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_runtime/lit.local.cfg (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_runtime/test_create_memref.py (100%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_runtime/test_runtime_api.py (92%)
 rename mlir-tensorrt/{ => compiler}/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py (100%)
 rename mlir-tensorrt/{ => compiler}/tools/CMakeLists.txt (85%)
 rename mlir-tensorrt/{tools/MlirTensorRtLspServer.cpp => compiler/tools/mlir-tensorrt-lsp-server.cpp} (100%)
 rename mlir-tensorrt/{tools/MlirTensorRtOpt.cpp => compiler/tools/mlir-tensorrt-opt.cpp} (100%)
 rename mlir-tensorrt/{tools/MlirTensorRtRunner.cpp => compiler/tools/mlir-tensorrt-runner.cpp} (100%)
 rename mlir-tensorrt/{tools/MlirTensorRtTranslate.cpp => compiler/tools/mlir-tensorrt-translate.cpp} (100%)
 create mode 100644 mlir-tensorrt/executor/test/IntegrationTests/control-flow-nested.mlir
 rename mlir-tensorrt/{python/bindings/Compiler/Dialects => tensorrt/lib/Bindings/Python}/DialectTensorRT.cpp (95%)
 create mode 100644 mlir-tensorrt/tensorrt/python/CMakeLists.txt
 rename mlir-tensorrt/{python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/TensorRTOps.td => tensorrt/python/mlir_tensorrt/dialects/PythonTensorRTOps.td} (82%)
 rename mlir-tensorrt/{python/mlir_tensorrt_compiler/mlir_tensorrt/compiler => tensorrt/python/mlir_tensorrt}/dialects/tensorrt.py (100%)
 create mode 100644 mlir-tensorrt/tensorrt/tools/CMakeLists.txt
 rename mlir-tensorrt/tensorrt/{ => tools}/tensorrt-opt/CMakeLists.txt (100%)
 rename mlir-tensorrt/tensorrt/{ => tools}/tensorrt-opt/tensorrt-opt.cpp (100%)
 create mode 100644 mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/CMakeLists.txt
 rename mlir-tensorrt/{tools/MlirTensorRtTblgen.cpp => tensorrt/tools/tensorrt-tblgen/tensorrt-tblgen.cpp} (99%)
 delete mode 100644 mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_layer_metadata_callback.py

diff --git a/.github/workflows/mlir-tensorrt-ci.yml b/.github/workflows/mlir-tensorrt-ci.yml
index e6b7babf9..3298a5bf2 100644
--- a/.github/workflows/mlir-tensorrt-ci.yml
+++ b/.github/workflows/mlir-tensorrt-ci.yml
@@ -101,8 +101,8 @@ jobs:
             cat > run_format_check.sh <<EOF
             #!/bin/bash
             set -e
-            python3 -m black --check --exclude='.*\.pyi' mlir-tensorrt/test/
-            python3 -m black --check --exclude='.*\.pyi' mlir-tensorrt/python/
+            python3 -m black --check --extend-exclude='.*\.pyi' mlir-tensorrt/compiler/
+            python3 -m black --check --extend-exclude='.*\.pyi' mlir-tensorrt/python/
             git clang-format HEAD~1 --diff
             EOF
 
diff --git a/mlir-tensorrt/CMakeLists.txt b/mlir-tensorrt/CMakeLists.txt
index edd9512ea..c56c1f2ab 100644
--- a/mlir-tensorrt/CMakeLists.txt
+++ b/mlir-tensorrt/CMakeLists.txt
@@ -262,9 +262,3 @@ include_directories(${CMAKE_CURRENT_BINARY_DIR}/tensorrt/include)
 
 add_subdirectory(compiler)
 add_subdirectory(python)
-
-if(MLIR_TRT_ENABLE_TESTING)
-  add_subdirectory(test)
-endif()
-
-add_subdirectory(tools)
diff --git a/mlir-tensorrt/build_tools/cmake/ManagedLLVM.cmake b/mlir-tensorrt/build_tools/cmake/ManagedLLVM.cmake
index e52e559a4..4d43daf9f 100644
--- a/mlir-tensorrt/build_tools/cmake/ManagedLLVM.cmake
+++ b/mlir-tensorrt/build_tools/cmake/ManagedLLVM.cmake
@@ -13,4 +13,19 @@ macro(mtrt_llvm_project)
 
   set(LLVM_RUNTIME_OUTPUT_INTDIR ${CMAKE_BINARY_DIR}/bin)
   set(LLVM_LIBRARY_OUTPUT_INTDIR ${CMAKE_BINARY_DIR}/lib)
+
+  # The 'MLIRPythonExtensions.Core' target upstream is missing an
+  # EMBED_CAPI_LINK_LIBS argument on 'MLIRCAPITransforms'. Instead, it's
+  # declared on the '_mlirRegisterEverything' extension, which appears to be wrong.
+  # TODO: fix this upstream.
+  if(MLIR_TRT_ENABLE_PYTHON)
+    get_property(mlir_core_pybind_capi_embed
+      TARGET MLIRPythonExtension.Core
+      PROPERTY mlir_python_EMBED_CAPI_LINK_LIBS)
+    list(FIND mlir_core_pybind_capi_embed MLIRCAPITransforms item_index)
+    if(item_index EQUAL -1)
+      set_property(TARGET MLIRPythonExtension.Core
+        APPEND PROPERTY mlir_python_EMBED_CAPI_LINK_LIBS MLIRCAPITransforms)
+    endif()
+  endif()
 endmacro()
diff --git a/mlir-tensorrt/compiler/CMakeLists.txt b/mlir-tensorrt/compiler/CMakeLists.txt
index 124cece96..d77fbaff6 100644
--- a/mlir-tensorrt/compiler/CMakeLists.txt
+++ b/mlir-tensorrt/compiler/CMakeLists.txt
@@ -1,3 +1,5 @@
+set(MLIR_TENSORRT_COMPILER_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
+
 include_directories(${CMAKE_CURRENT_LIST_DIR}/include)
 include_directories(${CMAKE_CURRENT_BINARY_DIR}/include)
 include_directories(${MLIR_TENSORRT_ROOT_DIR}/executor/include)
@@ -5,4 +7,6 @@ include_directories(${MLIR_TENSORRT_ROOT_BINARY_DIR}/executor/include)
 
 add_subdirectory(include)
 add_subdirectory(lib)
+add_subdirectory(test)
+add_subdirectory(tools)
 
diff --git a/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Compiler.h b/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Compiler.h
index a21a2125f..929bf4efc 100644
--- a/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Compiler.h
+++ b/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Compiler.h
@@ -84,13 +84,6 @@ typedef struct MTRT_StableHLOToExecutableOptions {
   void *ptr;
 } MTRT_StableHLOToExecutableOptions;
 
-/// A callback that allows the user to customize the metadata set for layers
-/// corresponding to each MLIR operation. The callback should invoke the
-/// provided append function in order to manipulate the result string.
-typedef void (*MTRT_MetadataCallback)(MlirOperation op,
-                                      MlirStringCallback append,
-                                      void *appendCtx, void *userData);
-
 MLIR_CAPI_EXPORTED MTRT_Status mtrtStableHloToExecutableOptionsCreate(
     MTRT_CompilerClient client, MTRT_StableHLOToExecutableOptions *options,
     int32_t tensorRTBuilderOptLevel, bool tensorRTStronglyTyped);
@@ -108,13 +101,6 @@ MLIR_CAPI_EXPORTED MTRT_Status mtrtStableHloToExecutableOptionsSetDebugOptions(
     const char **debugTypes, size_t debugTypeSizes,
     const char *dumpIrTreeDir = nullptr, const char *dumpTensorRTDir = nullptr);
 
-/// Sets the layer metadata callback. The `userData` argument is passed along
-/// to the callback when it is invoked.
-MLIR_CAPI_EXPORTED MTRT_Status
-mtrtStableHloToExecutableOptionsSetTensorRTTranslationMetadataCallback(
-    MTRT_StableHLOToExecutableOptions options, MTRT_MetadataCallback callback,
-    void *userData);
-
 MLIR_CAPI_EXPORTED MTRT_Status mtrtStableHloToExecutableOptionsDestroy(
     MTRT_StableHLOToExecutableOptions options);
 
diff --git a/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Registration/RegisterAllDialects.h b/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Registration/RegisterAllDialects.h
index d2b985ff6..88381cc17 100644
--- a/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Registration/RegisterAllDialects.h
+++ b/mlir-tensorrt/compiler/include/mlir-tensorrt-c/Compiler/Registration/RegisterAllDialects.h
@@ -32,10 +32,13 @@ extern "C" {
 
 /// Add all the dialects used by MLIR-TensorRT to the registry.
 MLIR_CAPI_EXPORTED void
-mlirTensorRTRegisterAllDialects(MlirDialectRegistry registry);
+mtrtCompilerRegisterDialects(MlirDialectRegistry registry);
 
 /// Register all the compiler passes used by MLIR-TensorRT.
-MLIR_CAPI_EXPORTED void mlirTensorRTRegisterAllPasses();
+MLIR_CAPI_EXPORTED void mtrtCompilerRegisterPasses();
+
+/// Register all the compiler task types (pass manager types).
+MLIR_CAPI_EXPORTED void mtrtCompilerRegisterTasks();
 
 #ifdef __cplusplus
 }
diff --git a/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/StableHloToExecutable.h b/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/StableHloToExecutable.h
index 946a8d6f2..e67b07bf2 100644
--- a/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/StableHloToExecutable.h
+++ b/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/StableHloToExecutable.h
@@ -49,17 +49,13 @@ namespace mlirtrt::compiler {
 // StableHLOToExecutableOptions
 //===----------------------------------------------------------------------===//
 
-class StableHloToExecutableTask;
+class StablehloToExecutableTask;
 
-struct StableHLOToExecutableOptions
+struct StablehloToExecutableOptions
     : public mlir::OptionsBundle<DebugOptions, ExecutorOptions, DeviceOptions> {
   /// Initializes the options. The extensions in the provided registry
   /// must be extensions for the StableHloToExecutable task.
-  StableHLOToExecutableOptions(TaskExtensionRegistry extensions);
-
-  /// Return the hash of the options. Returns `nullopt` when the TensorRT
-  /// layer metadata callback is set since that can't be reliably hashed.
-  std::optional<llvm::hash_code> getHash() const override;
+  StablehloToExecutableOptions(TaskExtensionRegistry extensions);
 
   /// Whether to disallow host tensors in TensorRT clusters.
   bool disallowHostTensorsInTensorRTClusters = false;
@@ -71,18 +67,16 @@ struct StableHLOToExecutableOptions
   /// Entrypoint function name.
   std::string entrypoint = "main";
 
-  std::function<std::string(mlir::Operation *)> layerMetadataCallback{nullptr};
-
   /// Base class for extensions associated with StableHloToExecutableTask.
   class ExtensionBase : public TaskExtensionBase {
   public:
     ExtensionBase(mlir::TypeID typeID)
         : TaskExtensionBase(typeID,
-                            mlir::TypeID::get<StableHloToExecutableTask>()) {}
+                            mlir::TypeID::get<StablehloToExecutableTask>()) {}
 
     static bool classof(const TaskExtensionBase *extension) {
       return extension->getTaskID() ==
-             mlir::TypeID::get<StableHloToExecutableTask>();
+             mlir::TypeID::get<StablehloToExecutableTask>();
     }
 
     enum class Phase {
@@ -98,7 +92,7 @@ struct StableHLOToExecutableOptions
     /// relative to each other (yet).
     virtual void
     populatePasses(mlir::OpPassManager &pm, Phase phase,
-                   const StableHLOToExecutableOptions &options) const = 0;
+                   const StablehloToExecutableOptions &options) const = 0;
   };
 
   /// A StableHLOToExecutableOptions::Extension is an extension that must
@@ -120,39 +114,39 @@ struct StableHLOToExecutableOptions
 /// A StableHloToExecutableTask is a concrete CompilationTask (PassManager) that
 /// accepts StableHLO input IR and lowers it down to Executor IR which can be
 /// translated into a MLIR-TensorRT executable.
-class StableHloToExecutableTask
-    : public CompilationTask<StableHloToExecutableTask,
-                             StableHLOToExecutableOptions> {
+class StablehloToExecutableTask
+    : public CompilationTask<StablehloToExecutableTask,
+                             StablehloToExecutableOptions> {
 public:
   using Base::Base;
 
   /// Build the clustering pipeline that occurs on Stablehlo Ops.
   static void
   buildStablehloClusteringPipeline(mlir::OpPassManager &pm,
-                                   const StableHLOToExecutableOptions &options);
+                                   const StablehloToExecutableOptions &options);
 
   /// Build the pipeline (bufferization and lowering) that runs after
   /// clustering.
   static void
   buildPostClusteringPipeline(mlir::OpPassManager &pm,
-                              const StableHLOToExecutableOptions &options);
+                              const StablehloToExecutableOptions &options);
 
   static void populatePassManager(mlir::PassManager &pm,
-                                  const StableHLOToExecutableOptions &options);
+                                  const StablehloToExecutableOptions &options);
 
   /// Compile a StableHLO module into a MLIR-TensorRT Runtime executable.
   /// This is the "functional" entrypoint that will allocate a new PassManager
   /// for a single run.
   static mlirtrt::StatusOr<std::unique_ptr<runtime::Executable>>
   compileStableHLOToExecutable(mlir::ModuleOp module,
-                               const StableHLOToExecutableOptions &options);
+                               const StablehloToExecutableOptions &options);
 
   /// Compile a StableHLO module into a MLIR-TensorRT Runtime executable.
   /// This is the "functional" entrypoint that will allocate a new PassManager
   /// for a single run.
   static mlirtrt::StatusOr<std::unique_ptr<runtime::Executable>>
   compileStableHLOToExecutable(CompilerClient &client, mlir::ModuleOp module,
-                               const StableHLOToExecutableOptions &options);
+                               const StablehloToExecutableOptions &options);
 };
 
 /// Register the task/options with the client's registry.
@@ -175,7 +169,7 @@ void registerStablehloClusteringPipelines();
 
 } // namespace mlirtrt::compiler
 
-MLIR_DECLARE_EXPLICIT_TYPE_ID(mlirtrt::compiler::StableHloToExecutableTask)
+MLIR_DECLARE_EXPLICIT_TYPE_ID(mlirtrt::compiler::StablehloToExecutableTask)
 
 #endif // MLIR_TRT_ENABLE_HLO
 #endif // MLIR_TENSORRT_COMPILER_STABLEHLOTOEXECUTABLE
diff --git a/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/TensorRTExtension/TensorRTExtension.h b/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/TensorRTExtension/TensorRTExtension.h
index f1fc5bec4..184481a9d 100644
--- a/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/TensorRTExtension/TensorRTExtension.h
+++ b/mlir-tensorrt/compiler/include/mlir-tensorrt/Compiler/TensorRTExtension/TensorRTExtension.h
@@ -34,7 +34,7 @@ namespace mlirtrt::compiler {
 //===----------------------------------------------------------------------===//
 
 class StableHLOToExecutableTensorRTExtension
-    : public StableHLOToExecutableOptions::Extension<
+    : public StablehloToExecutableOptions::Extension<
           StableHLOToExecutableTensorRTExtension> {
 public:
   StableHLOToExecutableTensorRTExtension();
@@ -45,7 +45,7 @@ class StableHLOToExecutableTensorRTExtension
   /// It is not guarunteed the order in which different extensions are run
   /// relative to each other (yet).
   void populatePasses(mlir::OpPassManager &pm, Phase phase,
-                      const StableHLOToExecutableOptions &options) const final;
+                      const StablehloToExecutableOptions &options) const final;
 
   /// Allows the extension to hook into the option parsing infrastructure.
   void addToOptions(mlir::OptionsContext &context) final {
diff --git a/mlir-tensorrt/compiler/include/mlir-tensorrt/Registration/RegisterMlirTensorRtPasses.h b/mlir-tensorrt/compiler/include/mlir-tensorrt/Registration/RegisterMlirTensorRtPasses.h
index c1440248d..b63b83d7f 100644
--- a/mlir-tensorrt/compiler/include/mlir-tensorrt/Registration/RegisterMlirTensorRtPasses.h
+++ b/mlir-tensorrt/compiler/include/mlir-tensorrt/Registration/RegisterMlirTensorRtPasses.h
@@ -53,7 +53,6 @@ inline void registerAllMlirTensorRtPasses() {
   mlir::registerConvertPDLToPDLInterp();
 
 #ifdef MLIR_TRT_ENABLE_HLO
-  mlirtrt::compiler::registerStableHloToExecutableTask();
   mlirtrt::compiler::registerStablehloClusteringPipelines();
   registerStableHloInputPipelines();
   stablehlo_ext::registerStableHloExtPasses();
diff --git a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Compiler.cpp b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Compiler.cpp
index 2a52eef11..c5d390f0f 100644
--- a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Compiler.cpp
+++ b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Compiler.cpp
@@ -47,7 +47,7 @@ using namespace mlir;
 #endif
 DEFINE_C_API_PTR_METHODS(MTRT_CompilerClient, CompilerClient)
 DEFINE_C_API_PTR_METHODS(MTRT_StableHLOToExecutableOptions,
-                         StableHLOToExecutableOptions)
+                         StablehloToExecutableOptions)
 DEFINE_C_API_PTR_METHODS(MTRT_OptionsContext, OptionsContext)
 #if defined(__GNUC__) || defined(__clang__)
 #pragma GCC diagnostic pop
@@ -84,7 +84,7 @@ MTRT_Status mtrtCompilerClientCreate(MlirContext context,
       ctx->getOrLoadDialect<mlir::plan::PlanDialect>();
   assert(planDialect && "expected loaded PlanDialect");
   if (failed(planDialect->extensionConstructors.addCheckedExtensionConstructor<
-             compiler::StableHloToExecutableTask,
+             compiler::StablehloToExecutableTask,
              compiler::StableHLOToExecutableTensorRTExtension>()))
     emitWarning(mlir::UnknownLoc::get(ctx))
         << "ignoring duplicate extension load request; TensorRTExtension is "
@@ -156,7 +156,7 @@ MTRT_Status mtrtStableHloToExecutableOptionsCreate(
       context->getLoadedDialect<mlir::plan::PlanDialect>();
   compiler::TaskExtensionRegistry extensions =
       planDialect->extensionConstructors
-          .getExtensionRegistryForTask<compiler::StableHloToExecutableTask>();
+          .getExtensionRegistryForTask<compiler::StablehloToExecutableTask>();
 
   // Check that default extension set is loaded and set options on the TRT
   // extension.
@@ -168,7 +168,7 @@ MTRT_Status mtrtStableHloToExecutableOptionsCreate(
   trtExtension->setOptions(translationOpts);
 
   auto result =
-      std::make_unique<StableHLOToExecutableOptions>(std::move(extensions));
+      std::make_unique<StablehloToExecutableOptions>(std::move(extensions));
 
   llvm::Error finalizeStatus = result->finalize();
 
@@ -194,7 +194,7 @@ MTRT_Status mtrtStableHloToExecutableOptionsCreateFromArgs(
       context->getLoadedDialect<mlir::plan::PlanDialect>();
   compiler::TaskExtensionRegistry extensions =
       planDialect->extensionConstructors
-          .getExtensionRegistryForTask<compiler::StableHloToExecutableTask>();
+          .getExtensionRegistryForTask<compiler::StablehloToExecutableTask>();
 
   // Check that default extension set is loaded.
   assert(
@@ -203,7 +203,7 @@ MTRT_Status mtrtStableHloToExecutableOptionsCreateFromArgs(
       "expected valid StableHLOToExecutableTensorRTExtension");
 
   auto result =
-      std::make_unique<StableHLOToExecutableOptions>(std::move(extensions));
+      std::make_unique<StablehloToExecutableOptions>(std::move(extensions));
   std::vector<llvm::StringRef> argvStrRef(argc);
   for (unsigned i = 0; i < argc; i++)
     argvStrRef[i] = llvm::StringRef(argv[i].data, argv[i].length);
@@ -234,7 +234,7 @@ MTRT_Status mtrtStableHloToExecutableOptionsSetDebugOptions(
     const char **debugTypes, size_t debugTypeSizes, const char *dumpIrTreeDir,
     const char *dumpTensorRTDir) {
 
-  StableHLOToExecutableOptions *cppOpts = unwrap(options);
+  StablehloToExecutableOptions *cppOpts = unwrap(options);
   cppOpts->get<DebugOptions>().enableLLVMDebugFlag = enableDebugging;
   for (unsigned i = 0; i < debugTypeSizes; i++)
     cppOpts->get<DebugOptions>().llvmDebugTypes.emplace_back(debugTypes[i]);
@@ -245,35 +245,9 @@ MTRT_Status mtrtStableHloToExecutableOptionsSetDebugOptions(
   return mtrtStatusGetOk();
 }
 
-MTRT_Status
-mtrtStableHloToExecutableOptionsSetTensorRTTranslationMetadataCallback(
-    MTRT_StableHLOToExecutableOptions options, MTRT_MetadataCallback callback,
-    void *userData) {
-  StableHLOToExecutableOptions *cppOpts = unwrap(options);
-
-  // Construct the append callback which we will pass to the callback provided
-  // by the user. We do it this way to avoid needing a string construct in the C
-  // API.
-  auto appendFunc = [](MlirStringRef str, void *appendCtx) {
-    std::string &accum = *reinterpret_cast<std::string *>(appendCtx);
-    accum += std::string(str.data, str.length);
-  };
-
-  // Capturing by reference here will cause `callback` to point to the wrong
-  // place at the time this callback is invoked.
-  cppOpts->layerMetadataCallback = [=](Operation *op) {
-    std::string accum;
-    void *appendCtx = reinterpret_cast<void *>(&accum);
-    callback(wrap(op), appendFunc, appendCtx, userData);
-    return accum;
-  };
-
-  return mtrtStatusGetOk();
-}
-
 MTRT_Status mtrtStableHloToExecutableOptionsDestroy(
     MTRT_StableHLOToExecutableOptions options) {
-  delete reinterpret_cast<StableHLOToExecutableOptions *>(options.ptr);
+  delete reinterpret_cast<StablehloToExecutableOptions *>(options.ptr);
   return mtrtStatusGetOk();
 }
 
@@ -288,7 +262,7 @@ mtrtStableHloPipelineGetCached(MTRT_CompilerClient client,
 
   mlir::PassManager *runner{};
   if (unwrap(options)->getHash()) {
-    runner = &unwrap(client)->getOrCreatePassManager<StableHloToExecutableTask>(
+    runner = &unwrap(client)->getOrCreatePassManager<StablehloToExecutableTask>(
         *unwrap(options));
     result->ptr = runner;
     return mtrtStatusGetOk();
@@ -340,7 +314,7 @@ MTRT_Status mtrtCompilerStableHLOToExecutable(
         "StableHLO-to-Executable compilation expects a ModuleOp");
 
   StatusOr<std::unique_ptr<mlirtrt::runtime::Executable>> exe =
-      compiler::StableHloToExecutableTask::compileStableHLOToExecutable(
+      compiler::StablehloToExecutableTask::compileStableHLOToExecutable(
           *unwrap(client), moduleOp, *unwrap(stableHloToExecutableOptions));
   if (!exe.isOk())
     return mtrtStatusCreate(MTRT_StatusCode::MTRT_StatusCode_InternalError,
diff --git a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/CMakeLists.txt b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/CMakeLists.txt
index 8498bc402..4462ce99c 100644
--- a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/CMakeLists.txt
+++ b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/CMakeLists.txt
@@ -5,6 +5,5 @@ add_mlir_tensorrt_public_c_api_library(MLIRTensorRTCAPIRegisterAllDialects
   LINK_LIBS PUBLIC
     MLIRTensorRTRegistration
     MLIRCAPIIR
-    MLIRCAPITransforms
     MLIRTensorRTCompilerStableHloToExecutable
 )
diff --git a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/RegisterAllDialects.cpp b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/RegisterAllDialects.cpp
index 53358c53b..987e96b0f 100644
--- a/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/RegisterAllDialects.cpp
+++ b/mlir-tensorrt/compiler/lib/CAPI/Compiler/Registration/RegisterAllDialects.cpp
@@ -23,14 +23,19 @@
 //===----------------------------------------------------------------------===//
 
 #include "mlir-tensorrt-c/Compiler/Registration/RegisterAllDialects.h"
+#include "mlir-tensorrt/Compiler/StableHloToExecutable.h"
 #include "mlir-tensorrt/Registration/RegisterMlirTensorRtDialects.h"
 #include "mlir-tensorrt/Registration/RegisterMlirTensorRtPasses.h"
 #include "mlir/CAPI/IR.h"
 
-void mlirTensorRTRegisterAllDialects(MlirDialectRegistry registry) {
+void mtrtCompilerRegisterDialects(MlirDialectRegistry registry) {
   mlir::registerAllMlirTensorRtDialects(*unwrap(registry));
 }
 
-void mlirTensorRTRegisterAllPasses() {
+void mtrtCompilerRegisterPasses() {
   mlir::tensorrt::registerAllMlirTensorRtPasses();
 }
+
+void mtrtCompilerRegisterTasks() {
+  mlirtrt::compiler::registerStableHloToExecutableTask();
+}
diff --git a/mlir-tensorrt/compiler/lib/Compiler/StableHloToExecutable.cpp b/mlir-tensorrt/compiler/lib/Compiler/StableHloToExecutable.cpp
index 0443208b5..3d609a93e 100644
--- a/mlir-tensorrt/compiler/lib/Compiler/StableHloToExecutable.cpp
+++ b/mlir-tensorrt/compiler/lib/Compiler/StableHloToExecutable.cpp
@@ -149,7 +149,7 @@ class HloToStdPass
 // StableHLOToExecutableOptions
 //===----------------------------------------------------------------------===//
 
-StableHLOToExecutableOptions::StableHLOToExecutableOptions(
+StablehloToExecutableOptions::StablehloToExecutableOptions(
     TaskExtensionRegistry extensions)
     : extensions(std::move(extensions)) {
 
@@ -167,30 +167,22 @@ StableHLOToExecutableOptions::StableHLOToExecutableOptions(
             llvm::cl::desc("entrypoint function name"));
 }
 
-std::optional<llvm::hash_code> StableHLOToExecutableOptions::getHash() const {
-  // If a callback is provided, we have no way of reliably hashing it.
-  if (layerMetadataCallback)
-    return std::nullopt;
-
-  return OptionsContext::getHash();
-}
-
 //===----------------------------------------------------------------------===//
 // StableHloToExecutableTask
 //===----------------------------------------------------------------------===//
 
 static void populateExtensionPasses(
-    mlir::OpPassManager &pm, const StableHLOToExecutableOptions &options,
-    StableHLOToExecutableOptions::ExtensionBase::Phase phase) {
+    mlir::OpPassManager &pm, const StablehloToExecutableOptions &options,
+    StablehloToExecutableOptions::ExtensionBase::Phase phase) {
   for (auto &[key, ext] : options.extensions) {
-    llvm::cast<StableHLOToExecutableOptions::ExtensionBase>(ext.get())
+    llvm::cast<StablehloToExecutableOptions::ExtensionBase>(ext.get())
         ->populatePasses(pm, phase, options);
   }
 }
 
-void StableHloToExecutableTask::buildStablehloClusteringPipeline(
-    OpPassManager &pm, const StableHLOToExecutableOptions &opts) {
-  using Phase = StableHLOToExecutableOptions::ExtensionBase::Phase;
+void StablehloToExecutableTask::buildStablehloClusteringPipeline(
+    OpPassManager &pm, const StablehloToExecutableOptions &opts) {
+  using Phase = StablehloToExecutableOptions::ExtensionBase::Phase;
   pm.addPass(createConvertStablehloToScfPass());
 
   // Add pre-clustering extension passes
@@ -223,9 +215,9 @@ void StableHloToExecutableTask::buildStablehloClusteringPipeline(
   pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
 }
 
-void StableHloToExecutableTask::buildPostClusteringPipeline(
-    OpPassManager &pm, const StableHLOToExecutableOptions &opts) {
-  using Phase = StableHLOToExecutableOptions::ExtensionBase::Phase;
+void StablehloToExecutableTask::buildPostClusteringPipeline(
+    OpPassManager &pm, const StablehloToExecutableOptions &opts) {
+  using Phase = StablehloToExecutableOptions::ExtensionBase::Phase;
   populateExtensionPasses(pm, opts, Phase::PreBufferization);
 
   // Perform bufferization.
@@ -259,8 +251,8 @@ void StableHloToExecutableTask::buildPostClusteringPipeline(
   pm.addPass(createDropNestedModulesPass());
 }
 
-void StableHloToExecutableTask::populatePassManager(
-    mlir::PassManager &pm, const StableHLOToExecutableOptions &options) {
+void StablehloToExecutableTask::populatePassManager(
+    mlir::PassManager &pm, const StablehloToExecutableOptions &options) {
   if (failed(setupPassManager(pm, options.get<DebugOptions>()))) {
     /// TODO: Ignored. This can fail if pass manager static CL options were not
     /// registered/initialized. This happens through invocation of e.g. this
@@ -292,7 +284,7 @@ void StableHloToExecutableTask::populatePassManager(
 /// (TensorRT + host clusters).
 static void
 maybePopulateDefaultClusterKinds(mlir::ModuleOp module,
-                                 const StableHLOToExecutableOptions &options) {
+                                 const StablehloToExecutableOptions &options) {
   if (!module->hasAttr(plan::PlanDialect::kModuleClusterKindsAttrName)) {
     SmallVector<Attribute> clusterKinds;
     clusterKinds.push_back(mlir::plan::TensorRTClusterKindAttr::get(
@@ -306,8 +298,8 @@ maybePopulateDefaultClusterKinds(mlir::ModuleOp module,
 }
 
 StatusOr<std::unique_ptr<runtime::Executable>>
-StableHloToExecutableTask::compileStableHLOToExecutable(
-    mlir::ModuleOp module, const StableHLOToExecutableOptions &options) {
+StablehloToExecutableTask::compileStableHLOToExecutable(
+    mlir::ModuleOp module, const StablehloToExecutableOptions &options) {
   LLVM_DEBUG({
     DBGS() << "compiling with options:\n";
     options.print(llvm::dbgs());
@@ -334,7 +326,7 @@ StableHloToExecutableTask::compileStableHLOToExecutable(
   // Setup pass manager
   //===----------------------------------------------------------------------===//
 
-  StableHloToExecutableTask runner(module->getContext(), options);
+  StablehloToExecutableTask runner(module->getContext(), options);
   if (failed(setupPassManager(runner, options.get<DebugOptions>()))) {
     /// TODO: Ignored. This can fail if pass manager static CL options were not
     /// registered/initialized. This happens through invocation of e.g. this
@@ -369,9 +361,9 @@ StableHloToExecutableTask::compileStableHLOToExecutable(
 }
 
 mlirtrt::StatusOr<std::unique_ptr<runtime::Executable>>
-StableHloToExecutableTask::compileStableHLOToExecutable(
+StablehloToExecutableTask::compileStableHLOToExecutable(
     CompilerClient &client, mlir::ModuleOp module,
-    const StableHLOToExecutableOptions &options) {
+    const StablehloToExecutableOptions &options) {
   if (client.getContext() != module->getContext())
     return getInternalErrorStatus("CompilerClient has a MLIRContext that is "
                                   "different from the ModuleOp's MLIRContext");
@@ -396,12 +388,12 @@ StableHloToExecutableTask::compileStableHLOToExecutable(
 #endif
 
   mlir::PassManager *runner;
-  std::unique_ptr<StableHloToExecutableTask> pm{};
+  std::unique_ptr<StablehloToExecutableTask> pm{};
 
   if (options.getHash())
-    runner = &client.getOrCreatePassManager<StableHloToExecutableTask>(options);
+    runner = &client.getOrCreatePassManager<StablehloToExecutableTask>(options);
   else {
-    pm.reset(new StableHloToExecutableTask(client.getContext(), options));
+    pm.reset(new StablehloToExecutableTask(client.getContext(), options));
     CompilerClient::setupPassManagerLogging(*pm, options.get<DebugOptions>());
     runner = pm.get();
   }
@@ -459,14 +451,14 @@ struct ClusteringPipelineCliOpts
 
 /// Convert a `ClusteringPipelineCliOpts` into a
 /// `StablehloClusteringPipelineOpts`.
-static StableHLOToExecutableOptions populateStablehloClusteringPipelineOpts(
+static StablehloToExecutableOptions populateStablehloClusteringPipelineOpts(
     const ClusteringPipelineCliOpts &cliOpts) {
   // Load a default extension set since we don't have access to MLIRContext at
   // this point.
   TaskExtensionRegistry extensions;
   extensions.getOrCreateExtension<StableHLOToExecutableTensorRTExtension>();
 
-  StableHLOToExecutableOptions opts(std::move(extensions));
+  StablehloToExecutableOptions opts(std::move(extensions));
   opts.get<DeviceOptions>().info.computeCapability =
       cliOpts.deviceComputeCapability;
   opts.get<DeviceOptions>().info.maxSharedMemoryPerBlockKb =
@@ -478,9 +470,9 @@ static StableHLOToExecutableOptions populateStablehloClusteringPipelineOpts(
 }
 
 void mlirtrt::compiler::registerStableHloToExecutableTask() {
-  registerOption("stable-hlo-to-executable",
-                 optionsCreateFromArgs<StableHLOToExecutableOptions,
-                                       StableHloToExecutableTask>);
+  registerOption("stablehlo-to-executable",
+                 optionsCreateFromArgs<StablehloToExecutableOptions,
+                                       StablehloToExecutableTask>);
 }
 
 void mlirtrt::compiler::registerStablehloClusteringPipelines() {
@@ -491,20 +483,20 @@ void mlirtrt::compiler::registerStablehloClusteringPipelines() {
       "stablehlo-clustering-pipeline",
       "apply clustering and initial transformations to stablehlo IR",
       [](OpPassManager &pm, const ClusteringPipelineCliOpts &opts) {
-        StableHloToExecutableTask::buildStablehloClusteringPipeline(
+        StablehloToExecutableTask::buildStablehloClusteringPipeline(
             pm, populateStablehloClusteringPipelineOpts(opts));
       });
 
   PassPipelineRegistration<ClusteringPipelineCliOpts>(
       "post-clustering-pipeline", "apply compilation post-clustering",
       [](OpPassManager &pm, const ClusteringPipelineCliOpts &opts) {
-        StableHLOToExecutableOptions finalOpts =
+        StablehloToExecutableOptions finalOpts =
             populateStablehloClusteringPipelineOpts(opts);
-        StableHloToExecutableTask::buildPostClusteringPipeline(pm, finalOpts);
+        StablehloToExecutableTask::buildPostClusteringPipeline(pm, finalOpts);
       });
 }
 
-MLIR_DEFINE_EXPLICIT_TYPE_ID(mlirtrt::compiler::StableHloToExecutableTask)
+MLIR_DEFINE_EXPLICIT_TYPE_ID(mlirtrt::compiler::StablehloToExecutableTask)
 
 #else
 
diff --git a/mlir-tensorrt/compiler/lib/Compiler/TensorRTExtension/TensorRTExtension.cpp b/mlir-tensorrt/compiler/lib/Compiler/TensorRTExtension/TensorRTExtension.cpp
index b9e8877b3..e453e5bf3 100644
--- a/mlir-tensorrt/compiler/lib/Compiler/TensorRTExtension/TensorRTExtension.cpp
+++ b/mlir-tensorrt/compiler/lib/Compiler/TensorRTExtension/TensorRTExtension.cpp
@@ -42,7 +42,7 @@ StableHLOToExecutableTensorRTExtension::
 
 void StableHLOToExecutableTensorRTExtension::populatePasses(
     mlir::OpPassManager &pm, Phase phase,
-    const StableHLOToExecutableOptions &options) const {
+    const StablehloToExecutableOptions &options) const {
   if (this->disabled)
     return;
 
@@ -64,8 +64,8 @@ void StableHLOToExecutableTensorRTExtension::populatePasses(
     auto &trtPM = pm.nest<tensorrt::TensorRTModuleOp>();
     tensorrt::buildTensorRTModuleTransformationPipeline(
         trtPM, translationOptions.enableStronglyTyped);
-    trtPM.addPass(tensorrt::createTranslateTensorRTPass(
-        nullptr, options.layerMetadataCallback, translationOptions));
+    trtPM.addPass(
+        tensorrt::createTranslateTensorRTPass(nullptr, translationOptions));
     return;
   }
 
diff --git a/mlir-tensorrt/test/CMakeLists.txt b/mlir-tensorrt/compiler/test/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/test/CMakeLists.txt
rename to mlir-tensorrt/compiler/test/CMakeLists.txt
diff --git a/mlir-tensorrt/test/Conversion/CUDAToExecutor/cublas-to-executor.mlir b/mlir-tensorrt/compiler/test/Conversion/CUDAToExecutor/cublas-to-executor.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/CUDAToExecutor/cublas-to-executor.mlir
rename to mlir-tensorrt/compiler/test/Conversion/CUDAToExecutor/cublas-to-executor.mlir
diff --git a/mlir-tensorrt/test/Conversion/CUDAToExecutor/cuda-to-executor.mlir b/mlir-tensorrt/compiler/test/Conversion/CUDAToExecutor/cuda-to-executor.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/CUDAToExecutor/cuda-to-executor.mlir
rename to mlir-tensorrt/compiler/test/Conversion/CUDAToExecutor/cuda-to-executor.mlir
diff --git a/mlir-tensorrt/test/Conversion/ChloToStablehloExt/chlo-to-stablehlo-ext.mlir b/mlir-tensorrt/compiler/test/Conversion/ChloToStablehloExt/chlo-to-stablehlo-ext.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/ChloToStablehloExt/chlo-to-stablehlo-ext.mlir
rename to mlir-tensorrt/compiler/test/Conversion/ChloToStablehloExt/chlo-to-stablehlo-ext.mlir
diff --git a/mlir-tensorrt/test/Conversion/ChloToStablehloExt/lit.local.cfg b/mlir-tensorrt/compiler/test/Conversion/ChloToStablehloExt/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Conversion/ChloToStablehloExt/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Conversion/ChloToStablehloExt/lit.local.cfg
diff --git a/mlir-tensorrt/test/Conversion/PlanToExecutor/plan-to-executor.mlir b/mlir-tensorrt/compiler/test/Conversion/PlanToExecutor/plan-to-executor.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/PlanToExecutor/plan-to-executor.mlir
rename to mlir-tensorrt/compiler/test/Conversion/PlanToExecutor/plan-to-executor.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloScalarToArith/stablehlo-scalar-to-arith.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloScalarToArith/stablehlo-scalar-to-arith.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloScalarToArith/stablehlo-scalar-to-arith.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloScalarToArith/stablehlo-scalar-to-arith.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToScf/lit.local.cfg b/mlir-tensorrt/compiler/test/Conversion/StablehloToScf/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToScf/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToScf/lit.local.cfg
diff --git a/mlir-tensorrt/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToScf/stablehlo-to-scf.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/arith-to-tensorrt.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/arith-to-tensorrt.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/arith-to-tensorrt.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/arith-to-tensorrt.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/chlo-to-tensorrt.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/chlo-to-tensorrt.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/chlo-to-tensorrt.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/chlo-to-tensorrt.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/lit.local.cfg b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/lit.local.cfg
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-control-flow.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-control-flow.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-control-flow.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-control-flow.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-conv.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-conv.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-conv.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-conv.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-gather.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-gather.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-gather.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-gather.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-scatter.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-scatter.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-scatter.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-scatter.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid-trt8.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid-trt8.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid-trt8.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid-trt8.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-invalid.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt-trt10.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-tensorrt.mlir
diff --git a/mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-trtsoftmax.mlir b/mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-trtsoftmax.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/StablehloToTensorRT/stablehlo-to-trtsoftmax.mlir
rename to mlir-tensorrt/compiler/test/Conversion/StablehloToTensorRT/stablehlo-to-trtsoftmax.mlir
diff --git a/mlir-tensorrt/test/Conversion/TensorRTRuntimeToExecutor/tensorrt-runtime-to-executor.mlir b/mlir-tensorrt/compiler/test/Conversion/TensorRTRuntimeToExecutor/tensorrt-runtime-to-executor.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/TensorRTRuntimeToExecutor/tensorrt-runtime-to-executor.mlir
rename to mlir-tensorrt/compiler/test/Conversion/TensorRTRuntimeToExecutor/tensorrt-runtime-to-executor.mlir
diff --git a/mlir-tensorrt/test/Conversion/TensorRTToEmitC/lit.local.cfg b/mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Conversion/TensorRTToEmitC/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/lit.local.cfg
diff --git a/mlir-tensorrt/test/Conversion/TensorRTToEmitC/resnet50.trt.elided.mlir b/mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/resnet50.trt.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/TensorRTToEmitC/resnet50.trt.elided.mlir
rename to mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/resnet50.trt.elided.mlir
diff --git a/mlir-tensorrt/test/Conversion/TensorRTToEmitC/tensorrt-to-emitc.mlir b/mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/tensorrt-to-emitc.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/TensorRTToEmitC/tensorrt-to-emitc.mlir
rename to mlir-tensorrt/compiler/test/Conversion/TensorRTToEmitC/tensorrt-to-emitc.mlir
diff --git a/mlir-tensorrt/test/Conversion/TensorRTToTensorRTRuntime/tensorrt-to-tensorrt-runtime.mlir b/mlir-tensorrt/compiler/test/Conversion/TensorRTToTensorRTRuntime/tensorrt-to-tensorrt-runtime.mlir
similarity index 100%
rename from mlir-tensorrt/test/Conversion/TensorRTToTensorRTRuntime/tensorrt-to-tensorrt-runtime.mlir
rename to mlir-tensorrt/compiler/test/Conversion/TensorRTToTensorRTRuntime/tensorrt-to-tensorrt-runtime.mlir
diff --git a/mlir-tensorrt/test/Dialect/CUDA/invalid.mlir b/mlir-tensorrt/compiler/test/Dialect/CUDA/invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/CUDA/invalid.mlir
rename to mlir-tensorrt/compiler/test/Dialect/CUDA/invalid.mlir
diff --git a/mlir-tensorrt/test/Dialect/CUDA/roundtrip.mlir b/mlir-tensorrt/compiler/test/Dialect/CUDA/roundtrip.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/CUDA/roundtrip.mlir
rename to mlir-tensorrt/compiler/test/Dialect/CUDA/roundtrip.mlir
diff --git a/mlir-tensorrt/test/Dialect/CUDA/side-effects.mlir b/mlir-tensorrt/compiler/test/Dialect/CUDA/side-effects.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/CUDA/side-effects.mlir
rename to mlir-tensorrt/compiler/test/Dialect/CUDA/side-effects.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/bounds-analysis.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/bounds-analysis.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/bounds-analysis.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/bounds-analysis.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/cluster-and-outline-scalarizable-ops.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/cluster-and-outline-scalarizable-ops.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/cluster-and-outline-scalarizable-ops.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/cluster-and-outline-scalarizable-ops.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/create-closed-regions.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/create-closed-regions.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/create-closed-regions.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/create-closed-regions.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/eliminate-shape-ops.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/eliminate-shape-ops.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/eliminate-shape-ops.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/eliminate-shape-ops.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/invalid.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/invalid.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/invalid.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/materialize-shape-calculations-composite.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/materialize-shape-calculations-composite.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/materialize-shape-calculations-composite.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/materialize-shape-calculations-composite.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/materialize-shape-calculations.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/materialize-shape-calculations.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/materialize-shape-calculations.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/materialize-shape-calculations.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/plan-alloc-tensors.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/plan-alloc-tensors.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/plan-alloc-tensors.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/plan-alloc-tensors.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/plan-bufferize.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/plan-bufferize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/plan-bufferize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/plan-bufferize.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/populate-func-bounds-attrs.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/populate-func-bounds-attrs.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/populate-func-bounds-attrs.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/populate-func-bounds-attrs.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/post-clustering-validation.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/post-clustering-validation.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/post-clustering-validation.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/post-clustering-validation.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/refine-types.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/refine-types.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/refine-types.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/refine-types.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/roundtrip.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/roundtrip.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/roundtrip.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/roundtrip.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/segmentation-pipeline.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/segmentation-pipeline.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/segmentation-pipeline.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/segmentation-pipeline.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/stablehlo-clustering-invalid.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/stablehlo-clustering-invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/stablehlo-clustering-invalid.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/stablehlo-clustering-invalid.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/stablehlo-clustering.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/stablehlo-clustering.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/stablehlo-clustering.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/stablehlo-clustering.mlir
diff --git a/mlir-tensorrt/test/Dialect/Plan/tensor-kind-analysis.mlir b/mlir-tensorrt/compiler/test/Dialect/Plan/tensor-kind-analysis.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/Plan/tensor-kind-analysis.mlir
rename to mlir-tensorrt/compiler/test/Dialect/Plan/tensor-kind-analysis.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-convolution.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-convolution.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-convolution.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-convolution.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-dot-general.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-dot-general.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-dot-general.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-dot-general.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-gather.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-gather.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-gather.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-gather.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-scatter-nd.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-scatter-nd.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-scatter-nd.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-scatter-nd.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-scatter.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-scatter.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/canonicalize-scatter.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/canonicalize-scatter.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-bitwise.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-bitwise.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-bitwise.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-bitwise.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-elided.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-elided.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-elided.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-invalid.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/constant-folding-invalid.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding-invalid.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/constant-folding.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/constant-folding.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/constant-folding.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/expand-tuples.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/expand-tuples.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/expand-tuples.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/expand-tuples.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/gather-to-slice.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/gather-to-slice.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/gather-to-slice.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/gather-to-slice.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-block-dequantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-block-dequantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-block-dequantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-block-dequantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-block-quantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-block-quantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-block-quantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-block-quantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-channel-dequantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-channel-dequantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-channel-dequantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-channel-dequantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-channel-quantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-channel-quantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-channel-quantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-channel-quantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-tensor-dequantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-tensor-dequantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-tensor-dequantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-tensor-dequantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-tensor-quantize.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-tensor-quantize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/raise-qdq-per-tensor-quantize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/raise-qdq-per-tensor-quantize.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/refine-shapes.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/refine-shapes.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/refine-shapes.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/refine-shapes.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/reify-ranked-shaped-type.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/reify-ranked-shaped-type.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/reify-ranked-shaped-type.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/reify-ranked-shaped-type.mlir
diff --git a/mlir-tensorrt/test/Dialect/StableHloExt/tensor-kind-analysis.mlir b/mlir-tensorrt/compiler/test/Dialect/StableHloExt/tensor-kind-analysis.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/StableHloExt/tensor-kind-analysis.mlir
rename to mlir-tensorrt/compiler/test/Dialect/StableHloExt/tensor-kind-analysis.mlir
diff --git a/mlir-tensorrt/test/Dialect/TensorRTRuntime/inliner.mlir b/mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/inliner.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/TensorRTRuntime/inliner.mlir
rename to mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/inliner.mlir
diff --git a/mlir-tensorrt/test/Dialect/TensorRTRuntime/invalid.mlir b/mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/invalid.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/TensorRTRuntime/invalid.mlir
rename to mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/invalid.mlir
diff --git a/mlir-tensorrt/test/Dialect/TensorRTRuntime/one-shot-bufferize.mlir b/mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/one-shot-bufferize.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/TensorRTRuntime/one-shot-bufferize.mlir
rename to mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/one-shot-bufferize.mlir
diff --git a/mlir-tensorrt/test/Dialect/TensorRTRuntime/roundtrip.mlir b/mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/roundtrip.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/TensorRTRuntime/roundtrip.mlir
rename to mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/roundtrip.mlir
diff --git a/mlir-tensorrt/test/Dialect/TensorRTRuntime/tensor-kind-analysis.mlir b/mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/tensor-kind-analysis.mlir
similarity index 100%
rename from mlir-tensorrt/test/Dialect/TensorRTRuntime/tensor-kind-analysis.mlir
rename to mlir-tensorrt/compiler/test/Dialect/TensorRTRuntime/tensor-kind-analysis.mlir
diff --git a/mlir-tensorrt/test/Pipelines/StableHloInputPipeline/canonicalizer-stress-test.mlir b/mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/canonicalizer-stress-test.mlir
similarity index 100%
rename from mlir-tensorrt/test/Pipelines/StableHloInputPipeline/canonicalizer-stress-test.mlir
rename to mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/canonicalizer-stress-test.mlir
diff --git a/mlir-tensorrt/test/Pipelines/StableHloInputPipeline/dynamic-shape-refinement.mlir b/mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/dynamic-shape-refinement.mlir
similarity index 100%
rename from mlir-tensorrt/test/Pipelines/StableHloInputPipeline/dynamic-shape-refinement.mlir
rename to mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/dynamic-shape-refinement.mlir
diff --git a/mlir-tensorrt/test/Pipelines/StableHloInputPipeline/dynamic-shape-simplification.mlir b/mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/dynamic-shape-simplification.mlir
similarity index 100%
rename from mlir-tensorrt/test/Pipelines/StableHloInputPipeline/dynamic-shape-simplification.mlir
rename to mlir-tensorrt/compiler/test/Pipelines/StableHloInputPipeline/dynamic-shape-simplification.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-binary.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/end-to-end-unary.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/lit.local.cfg b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/ClusteringDynamicShape/lit.local.cfg
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-bf16.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-dynamic.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f16.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f32.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-f8E4M3FN.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-i1.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/buffer-ops-i4.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/lit.local.cfg b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/lit.local.cfg
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/memcpy-strided.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/memcpy-strided.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/memcpy-strided.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/memcpy-strided.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/IntegrationTests/memcpy.mlir b/mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/memcpy.mlir
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/IntegrationTests/memcpy.mlir
rename to mlir-tensorrt/compiler/test/Target/Lua/IntegrationTests/memcpy.mlir
diff --git a/mlir-tensorrt/test/Target/Lua/lit.local.cfg b/mlir-tensorrt/compiler/test/Target/Lua/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Target/Lua/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Target/Lua/lit.local.cfg
diff --git a/mlir-tensorrt/test/Transforms/DropNestedModules/drop-nested-modules.mlir b/mlir-tensorrt/compiler/test/Transforms/DropNestedModules/drop-nested-modules.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/DropNestedModules/drop-nested-modules.mlir
rename to mlir-tensorrt/compiler/test/Transforms/DropNestedModules/drop-nested-modules.mlir
diff --git a/mlir-tensorrt/test/Transforms/DuplicateFunctionElimination/duplicate-function-elimination-upstream.mlir b/mlir-tensorrt/compiler/test/Transforms/DuplicateFunctionElimination/duplicate-function-elimination-upstream.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/DuplicateFunctionElimination/duplicate-function-elimination-upstream.mlir
rename to mlir-tensorrt/compiler/test/Transforms/DuplicateFunctionElimination/duplicate-function-elimination-upstream.mlir
diff --git a/mlir-tensorrt/test/Transforms/DuplicateFunctionElimination/func-ext-duplicate-function-elimination.mlir b/mlir-tensorrt/compiler/test/Transforms/DuplicateFunctionElimination/func-ext-duplicate-function-elimination.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/DuplicateFunctionElimination/func-ext-duplicate-function-elimination.mlir
rename to mlir-tensorrt/compiler/test/Transforms/DuplicateFunctionElimination/func-ext-duplicate-function-elimination.mlir
diff --git a/mlir-tensorrt/test/Transforms/MemRefCastElimination/memref-cast-elimination.mlir b/mlir-tensorrt/compiler/test/Transforms/MemRefCastElimination/memref-cast-elimination.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/MemRefCastElimination/memref-cast-elimination.mlir
rename to mlir-tensorrt/compiler/test/Transforms/MemRefCastElimination/memref-cast-elimination.mlir
diff --git a/mlir-tensorrt/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir b/mlir-tensorrt/compiler/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir
rename to mlir-tensorrt/compiler/test/Transforms/SCFDetensorizeLoops/scf-detensorize-loops.mlir
diff --git a/mlir-tensorrt/test/Transforms/StablehloMatchers/lit.local.cfg b/mlir-tensorrt/compiler/test/Transforms/StablehloMatchers/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/Transforms/StablehloMatchers/lit.local.cfg
rename to mlir-tensorrt/compiler/test/Transforms/StablehloMatchers/lit.local.cfg
diff --git a/mlir-tensorrt/test/Transforms/StablehloMatchers/test_StablehloSoftmaxMatcher.mlir b/mlir-tensorrt/compiler/test/Transforms/StablehloMatchers/test_StablehloSoftmaxMatcher.mlir
similarity index 100%
rename from mlir-tensorrt/test/Transforms/StablehloMatchers/test_StablehloSoftmaxMatcher.mlir
rename to mlir-tensorrt/compiler/test/Transforms/StablehloMatchers/test_StablehloSoftmaxMatcher.mlir
diff --git a/mlir-tensorrt/test/lib/Analysis/CMakeLists.txt b/mlir-tensorrt/compiler/test/lib/Analysis/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/test/lib/Analysis/CMakeLists.txt
rename to mlir-tensorrt/compiler/test/lib/Analysis/CMakeLists.txt
diff --git a/mlir-tensorrt/test/lib/Analysis/TestBoundsAnalysis.cpp b/mlir-tensorrt/compiler/test/lib/Analysis/TestBoundsAnalysis.cpp
similarity index 100%
rename from mlir-tensorrt/test/lib/Analysis/TestBoundsAnalysis.cpp
rename to mlir-tensorrt/compiler/test/lib/Analysis/TestBoundsAnalysis.cpp
diff --git a/mlir-tensorrt/test/lib/Analysis/TestTensorKindAnalysis.cpp b/mlir-tensorrt/compiler/test/lib/Analysis/TestTensorKindAnalysis.cpp
similarity index 100%
rename from mlir-tensorrt/test/lib/Analysis/TestTensorKindAnalysis.cpp
rename to mlir-tensorrt/compiler/test/lib/Analysis/TestTensorKindAnalysis.cpp
diff --git a/mlir-tensorrt/test/lib/CMakeLists.txt b/mlir-tensorrt/compiler/test/lib/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/test/lib/CMakeLists.txt
rename to mlir-tensorrt/compiler/test/lib/CMakeLists.txt
diff --git a/mlir-tensorrt/test/lib/Transforms/CMakeLists.txt b/mlir-tensorrt/compiler/test/lib/Transforms/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/test/lib/Transforms/CMakeLists.txt
rename to mlir-tensorrt/compiler/test/lib/Transforms/CMakeLists.txt
diff --git a/mlir-tensorrt/test/lib/Transforms/Clustering/CMakeLists.txt b/mlir-tensorrt/compiler/test/lib/Transforms/Clustering/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/test/lib/Transforms/Clustering/CMakeLists.txt
rename to mlir-tensorrt/compiler/test/lib/Transforms/Clustering/CMakeLists.txt
diff --git a/mlir-tensorrt/test/lib/Transforms/Clustering/ClusteringBenchmarkMain.cpp b/mlir-tensorrt/compiler/test/lib/Transforms/Clustering/ClusteringBenchmarkMain.cpp
similarity index 100%
rename from mlir-tensorrt/test/lib/Transforms/Clustering/ClusteringBenchmarkMain.cpp
rename to mlir-tensorrt/compiler/test/lib/Transforms/Clustering/ClusteringBenchmarkMain.cpp
diff --git a/mlir-tensorrt/test/lit.cfg.py b/mlir-tensorrt/compiler/test/lit.cfg.py
similarity index 99%
rename from mlir-tensorrt/test/lit.cfg.py
rename to mlir-tensorrt/compiler/test/lit.cfg.py
index 86127d9bc..69535714a 100644
--- a/mlir-tensorrt/test/lit.cfg.py
+++ b/mlir-tensorrt/compiler/test/lit.cfg.py
@@ -31,7 +31,7 @@
 config.test_source_root = os.path.dirname(__file__)
 config.gpu_tools_script = os.path.join(
     config.test_source_root,
-    "../python/mlir_tensorrt_tools/mlir_tensorrt/tools/gpu_tools.py",
+    "../../python/mlir_tensorrt_tools/mlir_tensorrt/tools/gpu_tools.py",
 )
 
 
diff --git a/mlir-tensorrt/test/lit.site.cfg.py.in b/mlir-tensorrt/compiler/test/lit.site.cfg.py.in
similarity index 88%
rename from mlir-tensorrt/test/lit.site.cfg.py.in
rename to mlir-tensorrt/compiler/test/lit.site.cfg.py.in
index e307dadb8..6f0e65931 100644
--- a/mlir-tensorrt/test/lit.site.cfg.py.in
+++ b/mlir-tensorrt/compiler/test/lit.site.cfg.py.in
@@ -3,7 +3,7 @@ config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@")
 config.mlir_src_root = "@MLIR_MAIN_SRC_DIR@"
 
 config.mlir_tensorrt_obj_root = "@CMAKE_BINARY_DIR@"
-config.mlir_tensorrt_root = "@MLIR_TENSORRT_ROOT_DIR@"
+config.mlir_tensorrt_root = "@MLIR_TENSORRT_COMPILER_DIR@"
 
 config.host_os = "@HOST_OS@"
 config.host_cc = "@HOST_CC@"
@@ -31,4 +31,4 @@ import lit.llvm
 lit.llvm.initialize(lit_config, config)
 
 # Let the main config do the real work.
-lit_config.load_config(config, "@CMAKE_SOURCE_DIR@/test/lit.cfg.py")
+lit_config.load_config(config, "@MLIR_TENSORRT_COMPILER_DIR@/test/lit.cfg.py")
diff --git a/mlir-tensorrt/test/models/bert.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/bert.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/bert.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/bert.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/gpt2.stablehlo.bs2.elided.mlir b/mlir-tensorrt/compiler/test/models/gpt2.stablehlo.bs2.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/gpt2.stablehlo.bs2.elided.mlir
rename to mlir-tensorrt/compiler/test/models/gpt2.stablehlo.bs2.elided.mlir
diff --git a/mlir-tensorrt/test/models/gpt2.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/gpt2.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/gpt2.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/gpt2.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/llama-68m.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/llama-68m.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/llama-68m.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/llama-68m.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/llama-v2.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/llama-v2.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/llama-v2.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/llama-v2.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/resnet50.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/resnet50.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/resnet50.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/resnet50.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/single-relu.onnx b/mlir-tensorrt/compiler/test/models/single-relu.onnx
similarity index 100%
rename from mlir-tensorrt/test/models/single-relu.onnx
rename to mlir-tensorrt/compiler/test/models/single-relu.onnx
diff --git a/mlir-tensorrt/test/models/swin.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/swin.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/swin.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/swin.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/models/whisper-jax.stablehlo.elided.mlir b/mlir-tensorrt/compiler/test/models/whisper-jax.stablehlo.elided.mlir
similarity index 100%
rename from mlir-tensorrt/test/models/whisper-jax.stablehlo.elided.mlir
rename to mlir-tensorrt/compiler/test/models/whisper-jax.stablehlo.elided.mlir
diff --git a/mlir-tensorrt/test/python/IntegrationTests/TRT10/lit.local.cfg b/mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/TRT10/lit.local.cfg
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/lit.local.cfg
diff --git a/mlir-tensorrt/test/python/IntegrationTests/TRT10/test_stablehlo_add.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/test_stablehlo_add.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/TRT10/test_stablehlo_add.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/test_stablehlo_add.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/TRT10/test_stablehlo_dynamic_iota.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/test_stablehlo_dynamic_iota.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/TRT10/test_stablehlo_dynamic_iota.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/TRT10/test_stablehlo_dynamic_iota.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/lit.local.cfg b/mlir-tensorrt/compiler/test/python/IntegrationTests/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/lit.local.cfg
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/lit.local.cfg
diff --git a/mlir-tensorrt/test/python/IntegrationTests/test_call_validation.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/test_call_validation.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/test_call_validation.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/test_call_validation.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/test_executable_serialize.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/test_executable_serialize.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/test_executable_serialize.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/test_executable_serialize.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/test_stablehlo_add.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/test_stablehlo_add.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/test_stablehlo_add.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/test_stablehlo_add.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/test_stablehlo_dynamic.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/test_stablehlo_dynamic.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/test_stablehlo_dynamic.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/test_stablehlo_dynamic.py
diff --git a/mlir-tensorrt/test/python/IntegrationTests/test_type_interop.py b/mlir-tensorrt/compiler/test/python/IntegrationTests/test_type_interop.py
similarity index 100%
rename from mlir-tensorrt/test/python/IntegrationTests/test_type_interop.py
rename to mlir-tensorrt/compiler/test/python/IntegrationTests/test_type_interop.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_api.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_compiler_debug_dump.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py
similarity index 89%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py
index fdb93b596..a86299417 100644
--- a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py
+++ b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_options_context.py
@@ -15,7 +15,7 @@
 
     opts = api.OptionsContext(
         client,
-        "stable-hlo-to-executable",
+        "stablehlo-to-executable",
         # Set some options explicitly so we can spot check the `print` output.
         [
             "--tensorrt-builder-opt-level=3",
@@ -27,5 +27,5 @@
     print(opts)
 
 
-# CHECK: InvalidArgument: InvalidArgument: non-existent-options-type is not a valid option type. Valid options were: stable-hlo-to-executable
+# CHECK: InvalidArgument: InvalidArgument: non-existent-options-type is not a valid option type. Valid options were: stablehlo-to-executable
 # CHECK: Options[{{.*--tensorrt-workspace-memory-pool-limit=1073741824.*--tensorrt-strongly-typed=false.*--tensorrt-builder-opt-level=3.*}}]
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/compiler_api/test_plugin_schema_api.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_stablehlo.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_stablehlo.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_stablehlo.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_stablehlo.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_tensorrt.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_tensorrt.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_tensorrt.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_tensorrt.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_upstream.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_upstream.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/dialects/test_upstream.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/dialects/test_upstream.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/lit.local.cfg b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_compiler/lit.local.cfg
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_compiler/lit.local.cfg
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_runtime/lit.local.cfg b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/lit.local.cfg
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_runtime/lit.local.cfg
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/lit.local.cfg
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_create_memref.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_create_memref.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_create_memref.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_create_memref.py
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_runtime_api.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_runtime_api.py
similarity index 92%
rename from mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_runtime_api.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_runtime_api.py
index da9b433ad..3021ffe2f 100644
--- a/mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_runtime_api.py
+++ b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_runtime_api.py
@@ -150,7 +150,6 @@ def test_host_memref():
             print(host_array.shape)
             print(host_array.strides)
             print(host_array.address_space)
-            print(host_array._CAPIPtr)
         except Exception as e:
             print("Exception caught: ", e)
 
@@ -160,37 +159,30 @@ def test_host_memref():
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype float32
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype float16
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype int64
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype int32
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype int16
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 #  CHECK-NEXT: testing dtype int8
 #  CHECK-NEXT: [2, 8]
 #  CHECK-NEXT: [8, 1]
 #  CHECK-NEXT: PointerType.host
-#  CHECK-NEXT:  mlir_tensorrt.compiler.api.MemRefValue._CAPIPtr
 
 if __name__ == "__main__":
     for t in TESTS:
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py b/mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py
similarity index 100%
rename from mlir-tensorrt/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py
rename to mlir-tensorrt/compiler/test/python/mlir_tensorrt_runtime/test_runtime_debug_dump.py
diff --git a/mlir-tensorrt/tools/CMakeLists.txt b/mlir-tensorrt/compiler/tools/CMakeLists.txt
similarity index 85%
rename from mlir-tensorrt/tools/CMakeLists.txt
rename to mlir-tensorrt/compiler/tools/CMakeLists.txt
index 450615d53..0882c9d20 100644
--- a/mlir-tensorrt/tools/CMakeLists.txt
+++ b/mlir-tensorrt/compiler/tools/CMakeLists.txt
@@ -20,21 +20,6 @@ set(LLVM_LINK_COMPONENTS
   Support
   )
 
-# ----------------------------------------------------------------------------
-# mlir-tensorrt-tblgen
-# ----------------------------------------------------------------------------
-add_mlir_tool(mlir-tensorrt-tblgen
-  MlirTensorRtTblgen.cpp
-  PARTIAL_SOURCES_INTENDED
-
-  DEPENDS
-  MLIRTblgenLib
-)
-target_link_libraries(mlir-tensorrt-tblgen PRIVATE MLIRTblgenLib)
-llvm_update_compile_flags(mlir-tensorrt-tblgen)
-mlir_check_all_link_libraries(mlir-tensorrt-tblgen)
-_mtrt_set_target_compile_defs(mlir-tensorrt-tblgen)
-
 # ----------------------------------------------------------------------------
 # mlir-tensorrt-translate
 # ----------------------------------------------------------------------------
@@ -47,7 +32,7 @@ if(MLIR_TRT_TARGET_CPP)
 endif()
 
 add_mlir_tool(mlir-tensorrt-translate
-  MlirTensorRtTranslate.cpp
+  mlir-tensorrt-translate.cpp
   PARTIAL_SOURCES_INTENDED
   )
 
@@ -62,7 +47,7 @@ _mtrt_set_target_compile_defs(mlir-tensorrt-translate)
 # mlir-tensorrt-lsp-server
 # ----------------------------------------------------------------------------
 add_mlir_tool(mlir-tensorrt-lsp-server
-  MlirTensorRtLspServer.cpp
+  mlir-tensorrt-lsp-server.cpp
   PARTIAL_SOURCES_INTENDED
 
   DEPENDS
@@ -83,7 +68,7 @@ set_target_properties(mlir-tensorrt-lsp-server PROPERTIES EXCLUDE_FROM_ALL TRUE)
 # mlir-tensorrt-opt
 # ----------------------------------------------------------------------------
 add_mlir_tool(mlir-tensorrt-opt
-  MlirTensorRtOpt.cpp
+  mlir-tensorrt-opt.cpp
   PARTIAL_SOURCES_INTENDED
 
   DEPENDS
@@ -111,7 +96,7 @@ export_executable_symbols_for_plugins(mlir-tensorrt-opt)
 if(MLIR_TRT_TARGET_LUA)
   add_llvm_executable(mlir-tensorrt-runner
     PARTIAL_SOURCES_INTENDED
-    MlirTensorRtRunner.cpp)
+    mlir-tensorrt-runner.cpp)
   _mtrt_set_target_compile_defs(mlir-tensorrt-runner)
   set(libs_
     MLIRExecutorRunnerLib
diff --git a/mlir-tensorrt/tools/MlirTensorRtLspServer.cpp b/mlir-tensorrt/compiler/tools/mlir-tensorrt-lsp-server.cpp
similarity index 100%
rename from mlir-tensorrt/tools/MlirTensorRtLspServer.cpp
rename to mlir-tensorrt/compiler/tools/mlir-tensorrt-lsp-server.cpp
diff --git a/mlir-tensorrt/tools/MlirTensorRtOpt.cpp b/mlir-tensorrt/compiler/tools/mlir-tensorrt-opt.cpp
similarity index 100%
rename from mlir-tensorrt/tools/MlirTensorRtOpt.cpp
rename to mlir-tensorrt/compiler/tools/mlir-tensorrt-opt.cpp
diff --git a/mlir-tensorrt/tools/MlirTensorRtRunner.cpp b/mlir-tensorrt/compiler/tools/mlir-tensorrt-runner.cpp
similarity index 100%
rename from mlir-tensorrt/tools/MlirTensorRtRunner.cpp
rename to mlir-tensorrt/compiler/tools/mlir-tensorrt-runner.cpp
diff --git a/mlir-tensorrt/tools/MlirTensorRtTranslate.cpp b/mlir-tensorrt/compiler/tools/mlir-tensorrt-translate.cpp
similarity index 100%
rename from mlir-tensorrt/tools/MlirTensorRtTranslate.cpp
rename to mlir-tensorrt/compiler/tools/mlir-tensorrt-translate.cpp
diff --git a/mlir-tensorrt/executor/lib/Support/DeviceInfo.cpp b/mlir-tensorrt/executor/lib/Support/DeviceInfo.cpp
index 9e9d4146f..aa417d95f 100644
--- a/mlir-tensorrt/executor/lib/Support/DeviceInfo.cpp
+++ b/mlir-tensorrt/executor/lib/Support/DeviceInfo.cpp
@@ -23,12 +23,14 @@
 
 using namespace mlirtrt;
 
+#ifdef MLIR_EXECUTOR_ENABLE_CUDA
 static Status makeCudaStringError(cudaError_t errCode,
                                   llvm::StringRef context) {
   // Create a detailed error message using llvm::createStringError
   return getInternalErrorStatus("{0}: {1}", context,
                                 cudaGetErrorString(errCode));
 }
+#endif
 
 StatusOr<DeviceInfo> mlirtrt::getDeviceInformationFromHost() {
 #ifdef MLIR_EXECUTOR_ENABLE_CUDA
diff --git a/mlir-tensorrt/executor/lib/Target/Lua/TranslateToLua.cpp b/mlir-tensorrt/executor/lib/Target/Lua/TranslateToLua.cpp
index cc85ff3f4..5cef45292 100644
--- a/mlir-tensorrt/executor/lib/Target/Lua/TranslateToLua.cpp
+++ b/mlir-tensorrt/executor/lib/Target/Lua/TranslateToLua.cpp
@@ -231,17 +231,11 @@ static LogicalResult emitAttribute(raw_ostream &os, Location loc,
 
 static LogicalResult printControlFlowOp(LuaEmitter &emitter, cf::BranchOp op) {
   Block *destBlock = op.getDest();
-  bool isEntry = op->getBlock()->isEntryBlock();
-
   // Declare non-local args to hold block arguments.
   for (auto [operand, blockArg] :
        llvm::zip(op.getDestOperands(), destBlock->getArguments())) {
     // If we are branching from a entry block, we can can use a local.
-    if (isEntry)
-      emitter << "local ";
-    emitter << (isEntry
-                    ? emitter.createLocalVariableName(blockArg, "barg")
-                    : emitter.getOrCreateGlobalVariableName(blockArg, "barg"));
+    emitter << emitter.getVariableName(blockArg);
     emitter << " = " << emitter.getVariableName(operand) << ";\n";
   }
   emitter << "goto " << emitter.getOrCreateLabel(*op.getDest()) << ";\n";
@@ -251,8 +245,6 @@ static LogicalResult printControlFlowOp(LuaEmitter &emitter, cf::BranchOp op) {
 static LogicalResult printControlFlowOp(LuaEmitter &emitter,
                                         cf::CondBranchOp op) {
 
-  bool isEntry = op->getBlock()->isEntryBlock();
-
   SmallVector<Value> trueOperands, falseOperands;
 
   // Assign variables for the destination Block's BlockArguments. If this is the
@@ -263,12 +255,7 @@ static LogicalResult printControlFlowOp(LuaEmitter &emitter,
     // Declare non-local args to hold block arguments.
     for (auto [operand, blockArg] :
          llvm::zip(operands, destBlock->getArguments())) {
-      // If we are branching from a entry block, we can can use a local.
-      if (isEntry)
-        emitter << "local ";
-      emitter << (isEntry ? emitter.createLocalVariableName(blockArg, "barg")
-                          : emitter.getOrCreateGlobalVariableName(blockArg,
-                                                                  "barg"));
+      emitter << emitter.getVariableName(blockArg);
       emitter << " = " << emitter.getVariableName(operand) << ";\n";
     }
   };
@@ -836,7 +823,22 @@ LogicalResult LuaEmitter::emitBlock(Block &block, bool isEntryBlock) {
   if (!isEntryBlock) {
     os << "::" << getOrCreateLabel(block) << ":: do\n";
     os.indent();
+  } else {
+    // In the entry block, declare all of the block arguments needed throughout
+    // the region as local variables. Initialize them all to nil. This avoids
+    // having to use ad-hoc globals at the branch points.
+    Region *region = block.getParent();
+    for (auto [idx, otherBlock] : llvm::enumerate(region->getBlocks())) {
+      // We don't need to declare block arguments for the entry block; those are
+      // e.g. function arguments and are handled by the parent op.
+      if (idx == 0)
+        continue;
+      for (BlockArgument arg : otherBlock.getArguments())
+        getStream() << "local " << createLocalVariableName(arg, "barg")
+                    << " = nil;\n";
+    }
   }
+
   for (Operation &op : block.getOperations()) {
     if (failed(emitOperation(op)))
       return failure();
diff --git a/mlir-tensorrt/executor/test/IntegrationTests/control-flow-nested.mlir b/mlir-tensorrt/executor/test/IntegrationTests/control-flow-nested.mlir
new file mode 100644
index 000000000..1ed802724
--- /dev/null
+++ b/mlir-tensorrt/executor/test/IntegrationTests/control-flow-nested.mlir
@@ -0,0 +1,128 @@
+// RUN: executor-opt %s -executor-lowering-pipeline \
+// RUN:   | executor-translate -mlir-to-lua \
+// RUN:   | executor-runner -input-type=lua | FileCheck %s
+
+func.func @test_for(%lb: index, %ub: index, %step: index) {
+  %c0 = executor.constant 0 : index
+  %c1 = executor.constant 1 : index
+  %0 = scf.for %i = %lb to %ub step %step iter_args(%iter = %c0) -> index {
+    %0 = scf.for %j = %lb to %ub step %step iter_args(%iter1 = %iter) -> index {
+      %acc = executor.addi %iter1, %c1 : index
+      executor.print "i = %d, j = %d, acc = %d"(%i, %j, %acc : index, index, index)
+      scf.yield %acc : index
+    }
+    scf.yield %0 : index
+  }
+  executor.print "test_for = %d"(%0 : index)
+  return
+}
+func.func @main() -> i64 {
+  %c0 = executor.constant 0 : i64
+  %c0_index = executor.constant 0 : index
+  %c10 = executor.constant 10 : index
+  %c1 = executor.constant 1 : index
+  func.call @test_for(%c0_index, %c10, %c1) : (index, index, index) -> ()
+  return %c0 : i64
+}
+
+// CHECK-LABEL: i = 0, j = 0, acc = 1
+// CHECK-NEXT: i = 0, j = 1, acc = 2
+// CHECK-NEXT: i = 0, j = 2, acc = 3
+// CHECK-NEXT: i = 0, j = 3, acc = 4
+// CHECK-NEXT: i = 0, j = 4, acc = 5
+// CHECK-NEXT: i = 0, j = 5, acc = 6
+// CHECK-NEXT: i = 0, j = 6, acc = 7
+// CHECK-NEXT: i = 0, j = 7, acc = 8
+// CHECK-NEXT: i = 0, j = 8, acc = 9
+// CHECK-NEXT: i = 0, j = 9, acc = 10
+// CHECK-NEXT: i = 1, j = 0, acc = 11
+// CHECK-NEXT: i = 1, j = 1, acc = 12
+// CHECK-NEXT: i = 1, j = 2, acc = 13
+// CHECK-NEXT: i = 1, j = 3, acc = 14
+// CHECK-NEXT: i = 1, j = 4, acc = 15
+// CHECK-NEXT: i = 1, j = 5, acc = 16
+// CHECK-NEXT: i = 1, j = 6, acc = 17
+// CHECK-NEXT: i = 1, j = 7, acc = 18
+// CHECK-NEXT: i = 1, j = 8, acc = 19
+// CHECK-NEXT: i = 1, j = 9, acc = 20
+// CHECK-NEXT: i = 2, j = 0, acc = 21
+// CHECK-NEXT: i = 2, j = 1, acc = 22
+// CHECK-NEXT: i = 2, j = 2, acc = 23
+// CHECK-NEXT: i = 2, j = 3, acc = 24
+// CHECK-NEXT: i = 2, j = 4, acc = 25
+// CHECK-NEXT: i = 2, j = 5, acc = 26
+// CHECK-NEXT: i = 2, j = 6, acc = 27
+// CHECK-NEXT: i = 2, j = 7, acc = 28
+// CHECK-NEXT: i = 2, j = 8, acc = 29
+// CHECK-NEXT: i = 2, j = 9, acc = 30
+// CHECK-NEXT: i = 3, j = 0, acc = 31
+// CHECK-NEXT: i = 3, j = 1, acc = 32
+// CHECK-NEXT: i = 3, j = 2, acc = 33
+// CHECK-NEXT: i = 3, j = 3, acc = 34
+// CHECK-NEXT: i = 3, j = 4, acc = 35
+// CHECK-NEXT: i = 3, j = 5, acc = 36
+// CHECK-NEXT: i = 3, j = 6, acc = 37
+// CHECK-NEXT: i = 3, j = 7, acc = 38
+// CHECK-NEXT: i = 3, j = 8, acc = 39
+// CHECK-NEXT: i = 3, j = 9, acc = 40
+// CHECK-NEXT: i = 4, j = 0, acc = 41
+// CHECK-NEXT: i = 4, j = 1, acc = 42
+// CHECK-NEXT: i = 4, j = 2, acc = 43
+// CHECK-NEXT: i = 4, j = 3, acc = 44
+// CHECK-NEXT: i = 4, j = 4, acc = 45
+// CHECK-NEXT: i = 4, j = 5, acc = 46
+// CHECK-NEXT: i = 4, j = 6, acc = 47
+// CHECK-NEXT: i = 4, j = 7, acc = 48
+// CHECK-NEXT: i = 4, j = 8, acc = 49
+// CHECK-NEXT: i = 4, j = 9, acc = 50
+// CHECK-NEXT: i = 5, j = 0, acc = 51
+// CHECK-NEXT: i = 5, j = 1, acc = 52
+// CHECK-NEXT: i = 5, j = 2, acc = 53
+// CHECK-NEXT: i = 5, j = 3, acc = 54
+// CHECK-NEXT: i = 5, j = 4, acc = 55
+// CHECK-NEXT: i = 5, j = 5, acc = 56
+// CHECK-NEXT: i = 5, j = 6, acc = 57
+// CHECK-NEXT: i = 5, j = 7, acc = 58
+// CHECK-NEXT: i = 5, j = 8, acc = 59
+// CHECK-NEXT: i = 5, j = 9, acc = 60
+// CHECK-NEXT: i = 6, j = 0, acc = 61
+// CHECK-NEXT: i = 6, j = 1, acc = 62
+// CHECK-NEXT: i = 6, j = 2, acc = 63
+// CHECK-NEXT: i = 6, j = 3, acc = 64
+// CHECK-NEXT: i = 6, j = 4, acc = 65
+// CHECK-NEXT: i = 6, j = 5, acc = 66
+// CHECK-NEXT: i = 6, j = 6, acc = 67
+// CHECK-NEXT: i = 6, j = 7, acc = 68
+// CHECK-NEXT: i = 6, j = 8, acc = 69
+// CHECK-NEXT: i = 6, j = 9, acc = 70
+// CHECK-NEXT: i = 7, j = 0, acc = 71
+// CHECK-NEXT: i = 7, j = 1, acc = 72
+// CHECK-NEXT: i = 7, j = 2, acc = 73
+// CHECK-NEXT: i = 7, j = 3, acc = 74
+// CHECK-NEXT: i = 7, j = 4, acc = 75
+// CHECK-NEXT: i = 7, j = 5, acc = 76
+// CHECK-NEXT: i = 7, j = 6, acc = 77
+// CHECK-NEXT: i = 7, j = 7, acc = 78
+// CHECK-NEXT: i = 7, j = 8, acc = 79
+// CHECK-NEXT: i = 7, j = 9, acc = 80
+// CHECK-NEXT: i = 8, j = 0, acc = 81
+// CHECK-NEXT: i = 8, j = 1, acc = 82
+// CHECK-NEXT: i = 8, j = 2, acc = 83
+// CHECK-NEXT: i = 8, j = 3, acc = 84
+// CHECK-NEXT: i = 8, j = 4, acc = 85
+// CHECK-NEXT: i = 8, j = 5, acc = 86
+// CHECK-NEXT: i = 8, j = 6, acc = 87
+// CHECK-NEXT: i = 8, j = 7, acc = 88
+// CHECK-NEXT: i = 8, j = 8, acc = 89
+// CHECK-NEXT: i = 8, j = 9, acc = 90
+// CHECK-NEXT: i = 9, j = 0, acc = 91
+// CHECK-NEXT: i = 9, j = 1, acc = 92
+// CHECK-NEXT: i = 9, j = 2, acc = 93
+// CHECK-NEXT: i = 9, j = 3, acc = 94
+// CHECK-NEXT: i = 9, j = 4, acc = 95
+// CHECK-NEXT: i = 9, j = 5, acc = 96
+// CHECK-NEXT: i = 9, j = 6, acc = 97
+// CHECK-NEXT: i = 9, j = 7, acc = 98
+// CHECK-NEXT: i = 9, j = 8, acc = 99
+// CHECK-NEXT: i = 9, j = 9, acc = 100
+// CHECK-NEXT: test_for = 100
diff --git a/mlir-tensorrt/executor/test/Translation/translate-to-lua.mlir b/mlir-tensorrt/executor/test/Translation/translate-to-lua.mlir
index 26620c8e7..18a16f68d 100644
--- a/mlir-tensorrt/executor/test/Translation/translate-to-lua.mlir
+++ b/mlir-tensorrt/executor/test/Translation/translate-to-lua.mlir
@@ -666,6 +666,7 @@ func.func @cf_if_op(%arg0: i64, %arg1: i64) -> i64 attributes {executor.function
 
 // CHECK-LABEL: cf_if_op
 //  CHECK-SAME: ([[v1:.+]], [[v2:.+]])
+//  CHECK-NEXT:   local [[barg2:.+]] = nil
 //  CHECK-NEXT:   local [[v3:.+]] <const> = _icmp_eq_i64([[v1]], [[v2]])
 //  CHECK-NEXT:   if ([[v3]] == 1) or ([[v3]] == true) then
 //  CHECK-NEXT:     goto label1;
@@ -674,17 +675,17 @@ func.func @cf_if_op(%arg0: i64, %arg1: i64) -> i64 attributes {executor.function
 //  CHECK-NEXT:   end
 //  CHECK-NEXT:   ::label1:: do
 //  CHECK-NEXT:     local [[v4:.+]] <const> = [[v1]] + [[v2]];
-//  CHECK-NEXT:     [[v5:.+]] = [[v4]];
+//  CHECK-NEXT:     [[barg2:.+]] = [[v4]];
 //  CHECK-NEXT:     goto label3;
 //  CHECK-NEXT:   end
 //  CHECK-NEXT:   ::label2:: do
 //  CHECK-NEXT:     local [[v6:.+]] <const> = [[v1]] - [[v2]];
 //  CHECK-NEXT:     local [[v7:.+]] <const> = [[v1]] * [[v6]];
-//  CHECK-NEXT:     [[v5:.+]] = [[v7]];
+//  CHECK-NEXT:     [[barg2:.+]] = [[v7]];
 //  CHECK-NEXT:     goto label3;
 //  CHECK-NEXT:   end
 //  CHECK-NEXT:   ::label3:: do
-//  CHECK-NEXT:     return[[v5]];
+//  CHECK-NEXT:     return[[barg2]];
 //  CHECK-NEXT: end
 
 // -----
@@ -705,27 +706,30 @@ func.func @cf_cond_br_forward_entry(%arg0: i64, %arg1: i64) -> i64 attributes {e
 
 // CHECK-LABEL: function cf_cond_br_forward_entry
 // CHECK-SAME: ([[arg0:.+]], [[arg1:.+]])
+// CHECK-NEXT:   local [[barg2:barg.+]] = nil
+// CHECK-NEXT:   local [[barg3:barg.+]] = nil
+// CHECK-NEXT:   local [[barg4:barg.+]] = nil
 // CHECK-NEXT:   local [[l2:.+]] <const>  = _icmp_eq_i64([[arg0]], [[arg1]]);
 // CHECK-NEXT:   local [[l3:.+]] <const>  = 1;
-// CHECK-NEXT:   local [[barg4:barg.+]] = [[arg0]];
-// CHECK-NEXT:   local [[barg5:barg.+]] = [[arg1]];
+// CHECK-NEXT:   [[barg2]] = [[arg0]]
+// CHECK-NEXT:   [[barg3]] = [[arg1]]
 // CHECK-NEXT:   if ([[l2]] == 1) or ([[l2]] == true) then
 // CHECK-NEXT:     goto label1;
 // CHECK-NEXT:   else
 // CHECK-NEXT:     goto label2;
 // CHECK-NEXT:   end
 // CHECK-NEXT:   ::label1:: do
-// CHECK-NEXT:     local [[l6:.+]] <const>  = l3 + barg4;
-// CHECK-NEXT:     [[barg1:barg.+]] = [[l6]];
+// CHECK-NEXT:     local [[l6:.+]] <const>  = [[l3]] + [[barg2]];
+// CHECK-NEXT:     [[barg4]] = [[l6]];
 // CHECK-NEXT:     goto label3;
 // CHECK-NEXT:   end
 // CHECK-NEXT:   ::label2:: do
-// CHECK-NEXT:     local [[l6:.+]] <const>  = [[l3]] - barg5;
-// CHECK-NEXT:     [[barg1:barg.+]] = [[l6]];
+// CHECK-NEXT:     local [[l6:.+]] <const>  = [[l3]] - [[barg3]];
+// CHECK-NEXT:     [[barg4]] = [[l6]];
 // CHECK-NEXT:     goto label3;
 // CHECK-NEXT:   end
 // CHECK-NEXT:   ::label3:: do
-// CHECK-NEXT:     return [[barg1]];
+// CHECK-NEXT:     return [[barg4]];
 // CHECK-NEXT:   end
 // CHECK-NEXT: end
 
@@ -769,9 +773,11 @@ func.func @cf_for_op(%arg0: i64, %arg1: i64, %arg2: i64) -> i64 attributes {exec
 
 // CHECK-LABEL: cf_for_op
 //  CHECK-SAME: ([[v1:.+]], [[v2:.+]], [[v3:.+]])
+//  CHECK-NEXT:   local [[v5:barg.+]] = nil
+//  CHECK-NEXT:   local [[v6:barg.+]] = nil
 //  CHECK-NEXT:   local [[v4:.+]] <const> = 0;
-//  CHECK-NEXT:   local [[v5:barg.+]] = [[v1]];
-//  CHECK-NEXT:   local [[v6:barg.+]] = [[v4]];
+//  CHECK-NEXT:   [[v5]] = [[v1]];
+//  CHECK-NEXT:   [[v6]] = [[v4]];
 //  CHECK-NEXT:   goto label1;
 //  CHECK-NEXT:   ::label1:: do
 //  CHECK-NEXT:     local [[v7:.+]] <const> = _icmp_slt_i64([[v5]], [[v2]])
diff --git a/mlir-tensorrt/python/CompilerPackage.cmake b/mlir-tensorrt/python/CompilerPackage.cmake
index 94eb40141..44b3dd94e 100644
--- a/mlir-tensorrt/python/CompilerPackage.cmake
+++ b/mlir-tensorrt/python/CompilerPackage.cmake
@@ -65,34 +65,14 @@ foreach(dialect IN LISTS MLIR_TRT_PYTHON_UPSTREAM_DIALECTS_EMBED)
     MLIRPythonSources.Dialects.${dialect})
 endforeach()
 
-# Declare the TensorRT dialect python bindings.
-declare_mlir_dialect_python_bindings(
-  DIALECT_NAME tensorrt
-  ADD_TO_PARENT MLIRTensorRTPythonCompiler.Dialects
-  ROOT_DIR "${SRC_DIR}"
-  TD_FILE
-    dialects/TensorRTOps.td
-  SOURCES
-    dialects/tensorrt.py
-  )
+# Add the tensorrt dialect from the 'tensorrt/python' directory.
+set_property(TARGET MLIRTensorRTPythonCompiler.Dialects APPEND PROPERTY mlir_python_DEPENDS
+  MLIRTensorRTDialectPythonSources.Dialect.tensorrt)
 
 ################################################################################
 # Python extensions.
 ################################################################################
 
-# Declare the PyBind11 module associated with the TensorRT dialect bindings.
-declare_mlir_python_extension(MLIRTensorRTPythonCompiler.Dialects.tensorrt.PyBind
-  MODULE_NAME _tensorrt
-  ADD_TO_PARENT MLIRTensorRTPythonCompiler.Dialects.tensorrt
-  SOURCES
-    bindings/Compiler/Dialects/DialectTensorRT.cpp
-  EMBED_CAPI_LINK_LIBS
-    MLIRTensorRTCAPITensorRTDialect
-    MLIRCAPITransforms
-  PRIVATE_LINK_LIBS
-    LLVMSupport
-  )
-
 # Declare the site initializer.
 declare_mlir_python_extension(MLIRTensorRTPythonCompiler.SiteInitializer.PyBind
   MODULE_NAME _site_initialize_0
diff --git a/mlir-tensorrt/python/bindings/CPyBindInterop.h b/mlir-tensorrt/python/bindings/CPyBindInterop.h
index b398d718a..18cf89539 100644
--- a/mlir-tensorrt/python/bindings/CPyBindInterop.h
+++ b/mlir-tensorrt/python/bindings/CPyBindInterop.h
@@ -12,11 +12,8 @@
 #ifndef BINDINGS_CPYBINDINTEROP
 #define BINDINGS_CPYBINDINTEROP
 
-#if !defined(_MSC_VER)
-#include <Python.h>
-#endif
-
-#include "mlir-tensorrt-c/Compiler/Compiler.h"
+#include "pybind11/pybind11.h"
+#include "llvm/ADT/Twine.h"
 
 #define MTRT_PYTHON_CAPI_PTR_ATTR "_CAPIPtr"
 
@@ -27,7 +24,7 @@
   MTRT_PYTHON_COMPILER_API_NAMESPACE "." #x "." MTRT_PYTHON_CAPI_PTR_ATTR
 
 #define MTRT_RUNTIME_CAPI_PTR_PATH(x)                                          \
-  MTRT_PYTHON_COMPILER_API_NAMESPACE "." #x "." MTRT_PYTHON_CAPI_PTR_ATTR
+  MTRT_PYTHON_RUNTIME_API_NAMESPACE "." #x "." MTRT_PYTHON_CAPI_PTR_ATTR
 
 /// A utility macro that declares inline static functions
 /// `mtrtPython[objName]ToCapsule` and `mtrtPythonCapsuleTo[objName]`. These can
@@ -59,4 +56,27 @@
     return MTRT_##objName{ptr};                                                \
   }
 
+#define MTRT_QUOTE(x) #x
+
+inline pybind11::object mtrtApiObjectToCapsule(pybind11::handle apiObject) {
+  if (!pybind11::hasattr(apiObject, MTRT_PYTHON_CAPI_PTR_ATTR)) {
+    auto repr = pybind11::repr(apiObject).cast<std::string>();
+    throw pybind11::type_error(
+        (llvm::Twine("Expected an MLIR-TensorRT object (got ") + repr + ").")
+            .str());
+  }
+  return apiObject.attr(MTRT_PYTHON_CAPI_PTR_ATTR);
+}
+
+#define MTRT_DEFINE_PYBIND_CASTER(Name, CType)                                 \
+  template <>                                                                  \
+  struct type_caster<CType> {                                                  \
+    PYBIND11_TYPE_CASTER(CType, _(MTRT_QUOTE(Name)));                          \
+    bool load(handle src, bool) {                                              \
+      py::object capsule = mtrtApiObjectToCapsule(src);                        \
+      value = mtrtPythonCapsuleTo##Name(capsule.ptr());                        \
+      return !mtrt##Name##IsNull(value);                                       \
+    }                                                                          \
+  }
+
 #endif // BINDINGS_CPYBINDINTEROP
diff --git a/mlir-tensorrt/python/bindings/Compiler/CompilerPyBind.cpp b/mlir-tensorrt/python/bindings/Compiler/CompilerPyBind.cpp
index 51efb8473..07e1930a3 100644
--- a/mlir-tensorrt/python/bindings/Compiler/CompilerPyBind.cpp
+++ b/mlir-tensorrt/python/bindings/Compiler/CompilerPyBind.cpp
@@ -29,13 +29,6 @@
 namespace py = pybind11;
 using namespace mlirtrt;
 
-///===----------------------------------------------------------------------===//
-// CPython <-> CAPI utilities
-//===----------------------------------------------------------------------===//
-
-MTRT_DEFINE_COMPILER_INLINE_PY_CAPSULE_CASTER_FUNCS(
-    StableHLOToExecutableOptions)
-
 namespace {
 
 //===----------------------------------------------------------------------===//
@@ -71,18 +64,10 @@ class PyStableHLOToExecutableOptions
 public:
   using PyMTRTWrapper::PyMTRTWrapper;
   DECLARE_WRAPPER_CONSTRUCTORS(PyStableHLOToExecutableOptions);
-
   static constexpr auto kMethodTable =
       CAPITable<MTRT_StableHLOToExecutableOptions>{
           mtrtStableHloToExecutableOptionsIsNull,
-          mtrtStableHloToExecutableOptionsDestroy,
-          mtrtPythonCapsuleToStableHLOToExecutableOptions,
-          mtrtPythonStableHLOToExecutableOptionsToCapsule};
-
-  // We need this member so we can keep the Python callback alive long enough.
-  std::function<std::string(MlirOperation)> callback;
-
-  ~PyStableHLOToExecutableOptions() { callback = nullptr; }
+          mtrtStableHloToExecutableOptionsDestroy};
 };
 
 /// Python object type wrapper for `MlirPassManager`.
@@ -328,43 +313,7 @@ PYBIND11_MODULE(_api, m) {
           py::arg("enabled"),
           py::arg("debug_types") = std::vector<std::string>{},
           py::arg("dump_ir_tree_dir") = py::none(),
-          py::arg("dump_tensorrt_dir") = py::none())
-
-#ifdef MLIR_TRT_TARGET_TENSORRT
-      .def(
-          "set_tensorrt_translation_metadata_callback",
-          [](PyStableHLOToExecutableOptions &self,
-             std::function<std::string(MlirOperation)> pyCallback) {
-            // Since we're constructing a C callback, our closures must not
-            // capture. We can pass in the Python callback via the userData
-            // argument.
-            auto callback = [](MlirOperation op, MlirStringCallback append,
-                               void *appendCtx, void *userDataVoid) {
-              auto &pyCallback =
-                  *static_cast<std::function<std::string(MlirOperation)> *>(
-                      userDataVoid);
-
-              if (!pyCallback)
-                return;
-
-              std::string result;
-              try {
-                result = pyCallback(op);
-              } catch (const std::exception &e) {
-                llvm::errs() << e.what() << '\n';
-              }
-
-              append(MlirStringRef{result.data(), result.size()}, appendCtx);
-            };
-
-            self.callback = pyCallback;
-            THROW_IF_MTRT_ERROR(
-                mtrtStableHloToExecutableOptionsSetTensorRTTranslationMetadataCallback(
-                    self, callback, reinterpret_cast<void *>(&self.callback)));
-          },
-          py::arg("callback"), py::keep_alive<1, 2>{})
-#endif
-      ;
+          py::arg("dump_tensorrt_dir") = py::none());
 
   py::class_<PyStableHloPipeline>(m, "StableHloPipeline", py::module_local())
       .def(py::init<>([](PyCompilerClient &client,
diff --git a/mlir-tensorrt/python/bindings/Compiler/SiteInitializer.cpp b/mlir-tensorrt/python/bindings/Compiler/SiteInitializer.cpp
index d2dfb7561..9e71a2594 100644
--- a/mlir-tensorrt/python/bindings/Compiler/SiteInitializer.cpp
+++ b/mlir-tensorrt/python/bindings/Compiler/SiteInitializer.cpp
@@ -20,8 +20,9 @@ PYBIND11_MODULE(_site_initialize_0, m) {
   m.doc() = "MLIR all MLIR-TensorRT related dialects and passes";
 
   m.def("register_dialects", [](MlirDialectRegistry registry) {
-    mlirTensorRTRegisterAllDialects(registry);
+    mtrtCompilerRegisterDialects(registry);
   });
 
-  mlirTensorRTRegisterAllPasses();
+  mtrtCompilerRegisterPasses();
+  mtrtCompilerRegisterTasks();
 }
diff --git a/mlir-tensorrt/python/bindings/Runtime/RuntimePyBind.cpp b/mlir-tensorrt/python/bindings/Runtime/RuntimePyBind.cpp
index ddb0c74e2..76fec1d97 100644
--- a/mlir-tensorrt/python/bindings/Runtime/RuntimePyBind.cpp
+++ b/mlir-tensorrt/python/bindings/Runtime/RuntimePyBind.cpp
@@ -27,7 +27,6 @@
 #include "llvm/Support/raw_ostream.h"
 #include <exception>
 #include <memory>
-#include <numeric>
 #include <stdexcept>
 #include <string_view>
 
@@ -36,85 +35,18 @@ using namespace mlirtrt;
 
 //===----------------------------------------------------------------------===//
 // MTRT_* <-> PyCapsule utilities.
+// These are only needed in the case where we want to use implicitly cast the
+// PyBind11 object to the original C API type. This  is required to use
+// `std::optional<...>` of the original C API type as an argument type in the
+// functions bound to Python through Pybind11 below.
 //===----------------------------------------------------------------------===//
 
 MTRT_DEFINE_RUNTIME_INLINE_PY_CAPSULE_CASTER_FUNCS(Device)
 MTRT_DEFINE_RUNTIME_INLINE_PY_CAPSULE_CASTER_FUNCS(Stream)
-MTRT_DEFINE_RUNTIME_INLINE_PY_CAPSULE_CASTER_FUNCS(RuntimeValue)
-MTRT_DEFINE_RUNTIME_INLINE_PY_CAPSULE_CASTER_FUNCS(MemRefValue)
-MTRT_DEFINE_RUNTIME_INLINE_PY_CAPSULE_CASTER_FUNCS(ScalarValue)
-
-//===----------------------------------------------------------------------===//
-// PyBind Casters
-//===----------------------------------------------------------------------===//
-static py::object mtrtApiObjectToCapsule(py::handle apiObject) {
-  if (PyCapsule_CheckExact(apiObject.ptr()))
-    return py::reinterpret_borrow<py::object>(apiObject);
-  if (!py::hasattr(apiObject, MTRT_PYTHON_CAPI_PTR_ATTR)) {
-    auto repr = py::repr(apiObject).cast<std::string>();
-    throw py::type_error(
-        (llvm::Twine("Expected an MLIR-TensorRT object (got ") + repr + ").")
-            .str());
-  }
-  return apiObject.attr(MTRT_PYTHON_CAPI_PTR_ATTR);
-}
 
 namespace pybind11::detail {
-/// Casts object (capsule) -> MTRT_Device
-template <>
-struct type_caster<MTRT_Device> {
-  PYBIND11_TYPE_CASTER(MTRT_Device, _("MTRT_Device"));
-  bool load(handle src, bool) {
-    py::object capsule = mtrtApiObjectToCapsule(src);
-    value = mtrtPythonCapsuleToDevice(capsule.ptr());
-    return !mtrtDeviceIsNull(value);
-  }
-};
-
-/// Casts object (capsule) -> MTRT_Stream
-template <>
-struct type_caster<MTRT_Stream> {
-  PYBIND11_TYPE_CASTER(MTRT_Stream, _("MTRT_Stream"));
-  bool load(handle src, bool) {
-    py::object capsule = mtrtApiObjectToCapsule(src);
-    value = mtrtPythonCapsuleToStream(capsule.ptr());
-    return !mtrtStreamIsNull(value);
-  }
-};
-
-/// Casts object (capsule) -> MTRT_RuntimeValue
-template <>
-struct type_caster<MTRT_RuntimeValue> {
-  PYBIND11_TYPE_CASTER(MTRT_RuntimeValue, _("MTRT_RuntimeValue"));
-  bool load(handle src, bool) {
-    py::object capsule = mtrtApiObjectToCapsule(src);
-    value = mtrtPythonCapsuleToRuntimeValue(capsule.ptr());
-    return !mtrtRuntimeValueIsNull(value);
-  }
-};
-
-/// Casts object (capsule) -> MTRT_ScalarValue
-template <>
-struct type_caster<MTRT_ScalarValue> {
-  PYBIND11_TYPE_CASTER(MTRT_ScalarValue, _("MTRT_ScalarValue"));
-  bool load(handle src, bool) {
-    py::object capsule = mtrtApiObjectToCapsule(src);
-    value = mtrtPythonCapsuleToScalarValue(capsule.ptr());
-    return !mtrtScalarValueIsNull(value);
-  }
-};
-
-/// Casts object (capsule) -> MTRT_MemRefValue
-template <>
-struct type_caster<MTRT_MemRefValue> {
-  PYBIND11_TYPE_CASTER(MTRT_MemRefValue, _("MTRT_MemRefValue"));
-  bool load(handle src, bool) {
-    py::object capsule = mtrtApiObjectToCapsule(src);
-    value = mtrtPythonCapsuleToMemRefValue(capsule.ptr());
-    return !mtrtMemRefValueIsNull(value);
-  }
-};
-
+MTRT_DEFINE_PYBIND_CASTER(Device, MTRT_Device);
+MTRT_DEFINE_PYBIND_CASTER(Stream, MTRT_Stream);
 } // namespace pybind11::detail
 
 namespace {
@@ -182,12 +114,10 @@ class PyScalarValue : public PyMTRTWrapper<PyScalarValue, MTRT_ScalarValue> {
   DECLARE_WRAPPER_CONSTRUCTORS(PyScalarValue);
 
   static constexpr auto kMethodTable = CAPITable<MTRT_ScalarValue>{
-      mtrtScalarValueIsNull,
-      [](MTRT_ScalarValue value) {
+      mtrtScalarValueIsNull, [](MTRT_ScalarValue value) {
         (void)value;
         return mtrtStatusGetOk();
-      },
-      mtrtPythonCapsuleToScalarValue, mtrtPythonScalarValueToCapsule};
+      }};
 };
 
 /// Python wrapper around MTRT_MemRefValue.
@@ -197,8 +127,7 @@ class PyMemRefValue : public PyMTRTWrapper<PyMemRefValue, MTRT_MemRefValue> {
   DECLARE_WRAPPER_CONSTRUCTORS(PyMemRefValue);
 
   static constexpr auto kMethodTable = CAPITable<MTRT_MemRefValue>{
-      mtrtMemRefValueIsNull, mtrtMemRefValueDestroy,
-      mtrtPythonCapsuleToMemRefValue, mtrtPythonMemRefValueToCapsule};
+      mtrtMemRefValueIsNull, mtrtMemRefValueDestroy};
 
   MTRT_RuntimeClient getClient() { return mtrtMemRefGetClient(*this); }
 };
@@ -210,8 +139,7 @@ class PyRuntimeValue : public PyMTRTWrapper<PyRuntimeValue, MTRT_RuntimeValue> {
   DECLARE_WRAPPER_CONSTRUCTORS(PyRuntimeValue);
 
   static constexpr auto kMethodTable = CAPITable<MTRT_RuntimeValue>{
-      mtrtRuntimeValueIsNull, mtrtRuntimeValueDestroy,
-      mtrtPythonCapsuleToRuntimeValue, mtrtPythonRuntimeValueToCapsule};
+      mtrtRuntimeValueIsNull, mtrtRuntimeValueDestroy};
 };
 
 /// Python object type wrapper for `MTRT_StableHLOToExecutableOptions`.
diff --git a/mlir-tensorrt/python/bindings/Utils.h b/mlir-tensorrt/python/bindings/Utils.h
index fdc2dee19..cab35cd1e 100644
--- a/mlir-tensorrt/python/bindings/Utils.h
+++ b/mlir-tensorrt/python/bindings/Utils.h
@@ -165,8 +165,7 @@ class PyMTRTWrapper {
     if constexpr (cFuncTable.capsuleToCApi == nullptr) {
       throw py::value_error("object cannot be converted from opaque capsule");
     } else {
-      MTRT_StableHLOToExecutableOptions cObj =
-          cFuncTable.capsuleToCApi(capsule.ptr());
+      CType cObj = cFuncTable.capsuleToCApi(capsule.ptr());
       return py::cast(Derived(cObj), py::return_value_policy::move);
     }
   }
@@ -199,8 +198,7 @@ class PySharedMTRTWrapper {
     if constexpr (cFuncTable.capsuleToCApi == nullptr) {
       throw py::value_error("boject cannot be converted from opaque capsule");
     } else {
-      MTRT_StableHLOToExecutableOptions cObj =
-          cFuncTable.capsuleToCApi(capsule.ptr());
+      CType cObj = cFuncTable.capsuleToCApi(capsule.ptr());
       return py::cast(Derived(cObj), py::return_value_policy::move);
     }
   }
diff --git a/mlir-tensorrt/tensorrt/CMakeLists.txt b/mlir-tensorrt/tensorrt/CMakeLists.txt
index 4bac540ed..dd7ea0677 100644
--- a/mlir-tensorrt/tensorrt/CMakeLists.txt
+++ b/mlir-tensorrt/tensorrt/CMakeLists.txt
@@ -95,4 +95,8 @@ include_directories(${MLIR_TENSORRT_DIALECT_BINARY_DIR}/include)
 add_subdirectory(include/mlir-tensorrt-dialect)
 add_subdirectory(lib)
 add_subdirectory(test)
-add_subdirectory(tensorrt-opt)
+add_subdirectory(tools)
+
+if(MLIR_ENABLE_BINDINGS_PYTHON)
+  add_subdirectory(python)
+endif()
diff --git a/mlir-tensorrt/tensorrt/cmake/TensorRTFunctions.cmake b/mlir-tensorrt/tensorrt/cmake/TensorRTFunctions.cmake
index 9284172dd..675498260 100644
--- a/mlir-tensorrt/tensorrt/cmake/TensorRTFunctions.cmake
+++ b/mlir-tensorrt/tensorrt/cmake/TensorRTFunctions.cmake
@@ -1,18 +1,17 @@
 # --------------------------------------------------------------
-# Creates `targetName` that invokes mlir-tensorrt-tblgen
+# Creates `targetName` that invokes tensorrt-tblgen
 # on a [dialect]Ops.td file to generate implementations for the
 # TensorRTEncodingOpInterface's encodeOp interface method.
 # --------------------------------------------------------------
 function(add_tensorrt_encoding_def_gen targetName inputFileName outputFileName )
     list(TRANSFORM MLIR_INCLUDE_DIRS PREPEND "-I" OUTPUT_VARIABLE _mlir_includes)
     add_custom_command(OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${outputFileName}"
-    COMMAND mlir-tensorrt-tblgen --gen-tensorrt-layer-add-defs
+    COMMAND tensorrt-tblgen --gen-tensorrt-layer-add-defs
       "${inputFileName}"
-      -I "${MLIR_TENSORRT_ROOT_DIR}/include"
-      -I "${MLIR_TENSORRT_ROOT_DIR}/tensorrt/include"
+      -I "${MLIR_TENSORRT_DIALECT_SOURCE_DIR}/include"
       ${_mlir_includes}
       -o "${CMAKE_CURRENT_BINARY_DIR}/${outputFileName}"
-    DEPENDS "${inputFileName}" mlir-tensorrt-tblgen
+    DEPENDS "${inputFileName}" tensorrt-tblgen
     )
     add_custom_target(${targetName} DEPENDS
       "${CMAKE_CURRENT_BINARY_DIR}/${outputFileName}")
diff --git a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/CMakeLists.txt b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/CMakeLists.txt
index 06e75d5bd..f328bd891 100644
--- a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/CMakeLists.txt
+++ b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/CMakeLists.txt
@@ -4,13 +4,13 @@ include_directories(${MLIR_INCLUDE_DIRS})
 function(add_generate_enum_converters targetName)
   list(TRANSFORM MLIR_INCLUDE_DIRS PREPEND "-I" OUTPUT_VARIABLE _mlir_includes)
   add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/EnumConverters.inc.cpp
-    COMMAND mlir-tensorrt-tblgen --gen-tensorrt-enum-converter-defs
+    COMMAND tensorrt-tblgen --gen-tensorrt-enum-converter-defs
       ${MLIR_TENSORRT_DIALECT_SOURCE_DIR}/include/mlir-tensorrt-dialect/TensorRT/IR/TensorRTEnums.td
       -I ${MLIR_TENSORRT_DIALECT_SOURCE_DIR}/include
       ${_mlir_includes}
       -o ${CMAKE_CURRENT_BINARY_DIR}/EnumConverters.inc.cpp
     DEPENDS ${MLIR_TENSORRT_DIALECT_SOURCE_DIR}/include/mlir-tensorrt-dialect/TensorRT/IR/TensorRTEnums.td
-            mlir-tensorrt-tblgen
+            tensorrt-tblgen
   )
   add_custom_target(${targetName} DEPENDS
     ${CMAKE_CURRENT_BINARY_DIR}/EnumConverters.inc.cpp)
diff --git a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/NetworkEncoder.h b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/NetworkEncoder.h
index 777677d4a..6d1391310 100644
--- a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/NetworkEncoder.h
+++ b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TensorRTEncodingOpInterface/NetworkEncoder.h
@@ -74,14 +74,11 @@ static constexpr nvinfer1::Weights kNullWeights =
 
 class NvInferNetworkEncoder {
 public:
-  NvInferNetworkEncoder(
-      nvinfer1::INetworkDefinition *network,
-      nvinfer1::IOptimizationProfile *profile, TensorRTVersion version,
-      bool usesStronglyTyped,
-      std::function<std::string(Operation *)> metadataCallback)
+  NvInferNetworkEncoder(nvinfer1::INetworkDefinition *network,
+                        nvinfer1::IOptimizationProfile *profile,
+                        TensorRTVersion version, bool usesStronglyTyped)
       : network(network), profile(profile), version(std::move(version)),
-        usesStronglyTyped(usesStronglyTyped),
-        layerMetadataCallback(std::move(metadataCallback)) {}
+        usesStronglyTyped(usesStronglyTyped) {}
 
   /// Lookup the TRT ITensor* equivalent of a Value.
   nvinfer1::ITensor *lookup(Value v) const;
@@ -253,8 +250,6 @@ class NvInferNetworkEncoder {
   bool hasQDQOps{false};
 
   PluginManager pluginMgr;
-
-  std::function<std::string(Operation *)> layerMetadataCallback;
 };
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TranslateToTensorRT.h b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TranslateToTensorRT.h
index 5f64c3e17..2d5cbb25f 100644
--- a/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TranslateToTensorRT.h
+++ b/mlir-tensorrt/tensorrt/include/mlir-tensorrt-dialect/Target/TranslateToTensorRT.h
@@ -228,18 +228,17 @@ class TensorRTSerializedTimingCache {
 /// `tensorrt.shape_profile` arguments have been populated for each argument
 /// that has unknown dimensions.
 /// TODO(cbate): add additional options here for builder configuration.
-FailureOr<TensorRTEngineResult> buildFunction(
-    mlir::FunctionOpInterface op, TensorRTBuilderContext &builderContext,
-    TensorRTSerializedTimingCache &serializedTimingCache,
-    const TensorRTTranslationOptions &options =
-        TensorRTTranslationOptions::fromCLFlags(),
-    std::function<std::string(Operation *)> layerMetadataCallback = nullptr);
+FailureOr<TensorRTEngineResult>
+buildFunction(mlir::FunctionOpInterface op,
+              TensorRTBuilderContext &builderContext,
+              TensorRTSerializedTimingCache &serializedTimingCache,
+              const TensorRTTranslationOptions &options =
+                  TensorRTTranslationOptions::fromCLFlags());
 
 /// Create an instance of a translate-to-tensorrt pass using an existing
 /// TensorRTBuilderContext.
 std::unique_ptr<mlir::Pass> createTranslateTensorRTPass(
     std::shared_ptr<tensorrt::TensorRTBuilderContext> context,
-    std::function<std::string(Operation *)> layerMetadataCallback,
     TensorRTTranslationOptions options =
         TensorRTTranslationOptions::fromCLFlags());
 
diff --git a/mlir-tensorrt/python/bindings/Compiler/Dialects/DialectTensorRT.cpp b/mlir-tensorrt/tensorrt/lib/Bindings/Python/DialectTensorRT.cpp
similarity index 95%
rename from mlir-tensorrt/python/bindings/Compiler/Dialects/DialectTensorRT.cpp
rename to mlir-tensorrt/tensorrt/lib/Bindings/Python/DialectTensorRT.cpp
index 394d14005..0e134a405 100644
--- a/mlir-tensorrt/python/bindings/Compiler/Dialects/DialectTensorRT.cpp
+++ b/mlir-tensorrt/tensorrt/lib/Bindings/Python/DialectTensorRT.cpp
@@ -1,7 +1,6 @@
-
-//===- TensorRTModule.cpp -------------------------------------------------===//
+//===- DialectTensorRT.cpp ------------------------------------------------===//
 //
-// Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
+// Copyright (c) 2023-2024, NVIDIA CORPORATION. All rights reserved.
 //
 //===----------------------------------------------------------------------===//
 ///
diff --git a/mlir-tensorrt/tensorrt/lib/Target/TensorRTEncodingOpInterface/NetworkEncoder.cpp b/mlir-tensorrt/tensorrt/lib/Target/TensorRTEncodingOpInterface/NetworkEncoder.cpp
index da5f73408..8d0d9e1f4 100644
--- a/mlir-tensorrt/tensorrt/lib/Target/TensorRTEncodingOpInterface/NetworkEncoder.cpp
+++ b/mlir-tensorrt/tensorrt/lib/Target/TensorRTEncodingOpInterface/NetworkEncoder.cpp
@@ -269,8 +269,6 @@ void NvInferNetworkEncoder::setMetadata(nvinfer1::ILayer *layer,
                                         Operation *sourceOp) {
   std::string name = createName(namesSet, sourceOp);
   layer->setName(name.c_str());
-  if (layerMetadataCallback)
-    layer->setMetadata(layerMetadataCallback(sourceOp).c_str());
 }
 
 nvinfer1::ITensor *NvInferNetworkEncoder::lookup(Value v) const {
diff --git a/mlir-tensorrt/tensorrt/lib/Target/TranslateToTensorRT.cpp b/mlir-tensorrt/tensorrt/lib/Target/TranslateToTensorRT.cpp
index a2d775514..d551234c8 100644
--- a/mlir-tensorrt/tensorrt/lib/Target/TranslateToTensorRT.cpp
+++ b/mlir-tensorrt/tensorrt/lib/Target/TranslateToTensorRT.cpp
@@ -384,11 +384,11 @@ static LogicalResult maybeSetStronglyTypedOption(
 #endif
 }
 
-FailureOr<TensorRTEngineResult> tensorrt::buildFunction(
-    mlir::FunctionOpInterface op, TensorRTBuilderContext &builderContext,
-    TensorRTSerializedTimingCache &serializedTimingCache,
-    const TensorRTTranslationOptions &opts,
-    std::function<std::string(Operation *)> layerMetadataCallback) {
+FailureOr<TensorRTEngineResult>
+tensorrt::buildFunction(mlir::FunctionOpInterface op,
+                        TensorRTBuilderContext &builderContext,
+                        TensorRTSerializedTimingCache &serializedTimingCache,
+                        const TensorRTTranslationOptions &opts) {
   assert(builderContext.getBuilder() && "expected valid builder context");
   std::unique_ptr<nvinfer1::IBuilder> &builder = builderContext.getBuilder();
 
@@ -409,9 +409,9 @@ FailureOr<TensorRTEngineResult> tensorrt::buildFunction(
   nvinfer1::IOptimizationProfile *optimProfile =
       builder->createOptimizationProfile();
 
-  NvInferNetworkEncoder encoder(
-      network.get(), optimProfile, builderContext.getTensorRTVersion(),
-      opts.enableStronglyTyped, layerMetadataCallback);
+  NvInferNetworkEncoder encoder(network.get(), optimProfile,
+                                builderContext.getTensorRTVersion(),
+                                opts.enableStronglyTyped);
 
   // Currently we only support single-block functions with unique return
   // terminator ops.
@@ -733,10 +733,9 @@ class TranslateToTensorRTEnginePass
 
   explicit TranslateToTensorRTEnginePass(
       std::shared_ptr<TensorRTBuilderContext> builderContext,
-      TensorRTTranslationOptions options,
-      std::function<std::string(Operation *)> metadataCallback)
-      : builderContext(builderContext), translationOptions(std::move(options)),
-        layerMetadataCallback(std::move(metadataCallback)) {}
+      TensorRTTranslationOptions options)
+      : builderContext(builderContext), translationOptions(std::move(options)) {
+  }
 
   LogicalResult initialize(MLIRContext *context) final {
     if (!this->builderContext) {
@@ -806,9 +805,8 @@ class TranslateToTensorRTEnginePass
       LLVM_DEBUG(DBGS() << "starting to build TensorRT engine for function "
                         << func.getName() << "\n");
 
-      FailureOr<TensorRTEngineResult> engineResult =
-          buildFunction(func, *builderContext, *timingCache, translationOptions,
-                        layerMetadataCallback);
+      FailureOr<TensorRTEngineResult> engineResult = buildFunction(
+          func, *builderContext, *timingCache, translationOptions);
       if (failed(engineResult) || !engineResult->serializedEngine) {
         func.emitError() << "failed to translate function '" << func.getName()
                          << "' to a TensorRT engine";
@@ -889,15 +887,11 @@ class TranslateToTensorRTEnginePass
 
   /// Options affecting TensorRT translation.
   TensorRTTranslationOptions translationOptions;
-
-  std::function<std::string(Operation *)> layerMetadataCallback;
 };
 } // namespace
 
 std::unique_ptr<mlir::Pass> tensorrt::createTranslateTensorRTPass(
     std::shared_ptr<tensorrt::TensorRTBuilderContext> context,
-    std::function<std::string(Operation *)> layerMetadataCallback,
     TensorRTTranslationOptions options) {
-  return std::make_unique<TranslateToTensorRTEnginePass>(context, options,
-                                                         layerMetadataCallback);
+  return std::make_unique<TranslateToTensorRTEnginePass>(context, options);
 }
diff --git a/mlir-tensorrt/tensorrt/python/CMakeLists.txt b/mlir-tensorrt/tensorrt/python/CMakeLists.txt
new file mode 100644
index 000000000..6683bcad0
--- /dev/null
+++ b/mlir-tensorrt/tensorrt/python/CMakeLists.txt
@@ -0,0 +1,67 @@
+include(AddMLIRPython)
+
+# Specifies that all MLIR packages are co-located under the `mlir_tensorrt_dialect`
+# top level package (the API has been embedded in a relocatable way).
+# TODO: Add an upstream cmake param for this vs having a global here.
+add_compile_definitions("MLIR_PYTHON_PACKAGE_PREFIX=mlir_tensorrt.")
+
+
+################################################################################
+# Sources
+################################################################################
+
+declare_mlir_python_sources(MLIRTensorRTDialectPythonSources)
+declare_mlir_python_sources(MLIRTensorRTDialectPythonSources.Dialect
+  ADD_TO_PARENT MLIRTensorRTDialectPythonSources)
+
+declare_mlir_dialect_python_bindings(
+  ADD_TO_PARENT MLIRTensorRTDialectPythonSources.Dialect
+  ROOT_DIR "${CMAKE_CURRENT_SOURCE_DIR}/mlir_tensorrt"
+  TD_FILE dialects/PythonTensorRTOps.td
+  SOURCES
+    dialects/tensorrt.py
+  DIALECT_NAME tensorrt)
+
+set(PYTHON_SOURCE_DIR "${MLIR_TENSORRT_DIALECT_SOURCE_DIR}/lib/Bindings/Python")
+
+declare_mlir_python_extension(MLIRTensorRTDialectPythonSources.Dialect.tensorrt.PyBind
+  MODULE_NAME _tensorrt
+  ADD_TO_PARENT MLIRTensorRTDialectPythonSources.Dialect.tensorrt
+  ROOT_DIR "${PYTHON_SOURCE_DIR}"
+  SOURCES
+    DialectTensorRT.cpp
+  EMBED_CAPI_LINK_LIBS
+    MLIRTensorRTCAPITensorRTDialect
+)
+
+################################################################################
+# Common CAPI
+################################################################################
+
+add_mlir_python_common_capi_library(MLIRTensorRTDialectPythonCAPI
+  INSTALL_COMPONENT MLIRTensorRTDialectPythonModules
+  INSTALL_DESTINATION python_packages/tensorrt/mlir_tensorrt/_mlir_libs
+  OUTPUT_DIRECTORY "${MLIR_TENSORRT_DIALECT_BINARY_DIR}/python_packages/tensorrt/mlir_tensorrt/_mlir_libs"
+  RELATIVE_INSTALL_ROOT "../../../.."
+  DECLARED_SOURCES
+    MLIRTensorRTDialectPythonSources
+    # TODO: Remove this in favor of showing fine grained registration once
+    # available.
+    MLIRPythonSources.Core
+)
+
+################################################################################
+# Instantiation of all Python modules
+################################################################################
+
+add_mlir_python_modules(MLIRTensorRTDialectPythonModules
+  ROOT_PREFIX "${MLIR_TENSORRT_DIALECT_BINARY_DIR}/python_packages/tensorrt/mlir_tensorrt"
+  INSTALL_PREFIX "python_packages/tensorrt/mlir_tensorrt"
+  DECLARED_SOURCES
+    MLIRTensorRTDialectPythonSources
+    # TODO: Remove this in favor of showing fine grained registration once
+    # available.
+    MLIRPythonSources.Core
+  COMMON_CAPI_LINK_LIBS
+    MLIRTensorRTDialectPythonCAPI
+  )
diff --git a/mlir-tensorrt/python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/TensorRTOps.td b/mlir-tensorrt/tensorrt/python/mlir_tensorrt/dialects/PythonTensorRTOps.td
similarity index 82%
rename from mlir-tensorrt/python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/TensorRTOps.td
rename to mlir-tensorrt/tensorrt/python/mlir_tensorrt/dialects/PythonTensorRTOps.td
index b5ab9ecb1..aff4f1b3a 100644
--- a/mlir-tensorrt/python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/TensorRTOps.td
+++ b/mlir-tensorrt/tensorrt/python/mlir_tensorrt/dialects/PythonTensorRTOps.td
@@ -21,9 +21,10 @@
 /// TensorRT dialect ops defintion for automatic python bindings generation.
 ///
 //===----------------------------------------------------------------------===//
-#ifndef MLIR_TENSORRT_COMPILER_MLIR_TENSORRT_COMPILER_DIALECTS_TENSORRTOPS
-#define MLIR_TENSORRT_COMPILER_MLIR_TENSORRT_COMPILER_DIALECTS_TENSORRTOPS
+#ifndef TENSORRT_PYTHON_DIALECTS_PYTHONTENSORRTOPS
+#define TENSORRT_PYTHON_DIALECTS_PYTHONTENSORRTOPS
 
 include "mlir-tensorrt-dialect/TensorRT/IR/TensorRTOps.td"
 
-#endif // MLIR_TENSORRT_COMPILER_MLIR_TENSORRT_COMPILER_DIALECTS_TENSORRTOPS
+#endif // TENSORRT_PYTHON_DIALECTS_PYTHONTENSORRTOPS
+
diff --git a/mlir-tensorrt/python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/tensorrt.py b/mlir-tensorrt/tensorrt/python/mlir_tensorrt/dialects/tensorrt.py
similarity index 100%
rename from mlir-tensorrt/python/mlir_tensorrt_compiler/mlir_tensorrt/compiler/dialects/tensorrt.py
rename to mlir-tensorrt/tensorrt/python/mlir_tensorrt/dialects/tensorrt.py
diff --git a/mlir-tensorrt/tensorrt/tools/CMakeLists.txt b/mlir-tensorrt/tensorrt/tools/CMakeLists.txt
new file mode 100644
index 000000000..06717ac95
--- /dev/null
+++ b/mlir-tensorrt/tensorrt/tools/CMakeLists.txt
@@ -0,0 +1,2 @@
+add_subdirectory(tensorrt-tblgen)
+add_subdirectory(tensorrt-opt)
diff --git a/mlir-tensorrt/tensorrt/tensorrt-opt/CMakeLists.txt b/mlir-tensorrt/tensorrt/tools/tensorrt-opt/CMakeLists.txt
similarity index 100%
rename from mlir-tensorrt/tensorrt/tensorrt-opt/CMakeLists.txt
rename to mlir-tensorrt/tensorrt/tools/tensorrt-opt/CMakeLists.txt
diff --git a/mlir-tensorrt/tensorrt/tensorrt-opt/tensorrt-opt.cpp b/mlir-tensorrt/tensorrt/tools/tensorrt-opt/tensorrt-opt.cpp
similarity index 100%
rename from mlir-tensorrt/tensorrt/tensorrt-opt/tensorrt-opt.cpp
rename to mlir-tensorrt/tensorrt/tools/tensorrt-opt/tensorrt-opt.cpp
diff --git a/mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/CMakeLists.txt b/mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/CMakeLists.txt
new file mode 100644
index 000000000..74f0c053c
--- /dev/null
+++ b/mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/CMakeLists.txt
@@ -0,0 +1,11 @@
+
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+add_llvm_executable(tensorrt-tblgen tensorrt-tblgen.cpp)
+llvm_update_compile_flags(tensorrt-tblgen)
+target_link_libraries(tensorrt-tblgen PRIVATE
+  MLIRTblgenLib
+  )
+
+mlir_check_all_link_libraries(tensorrt-tblgen)
diff --git a/mlir-tensorrt/tools/MlirTensorRtTblgen.cpp b/mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/tensorrt-tblgen.cpp
similarity index 99%
rename from mlir-tensorrt/tools/MlirTensorRtTblgen.cpp
rename to mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/tensorrt-tblgen.cpp
index 17d22c0da..bacc8cea1 100644
--- a/mlir-tensorrt/tools/MlirTensorRtTblgen.cpp
+++ b/mlir-tensorrt/tensorrt/tools/tensorrt-tblgen/tensorrt-tblgen.cpp
@@ -1,4 +1,4 @@
-//===- MlirTensorRtTblgen.cpp -----------------------------------*- C++ -*-===//
+//===- tensorrt-tblgen.cpp --------------------------------------*- C++ -*-===//
 //
 // SPDX-FileCopyrightText: Copyright 2024 NVIDIA CORPORATION & AFFILIATES.
 // All rights reserved.
@@ -18,7 +18,7 @@
 //
 //===----------------------------------------------------------------------===//
 ///
-/// This file contains the main function for mlir-tensorrt-tblgen.
+/// This file contains the main function for tensorrt-tblgen.
 ///
 //===----------------------------------------------------------------------===//
 
diff --git a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_layer_metadata_callback.py b/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_layer_metadata_callback.py
deleted file mode 100644
index 39bf413db..000000000
--- a/mlir-tensorrt/test/python/mlir_tensorrt_compiler/compiler_api/test_layer_metadata_callback.py
+++ /dev/null
@@ -1,95 +0,0 @@
-# REQUIRES: tensorrt-version-ge-10.0
-# REQUIRES: host-has-at-least-1-gpus
-# REQUIRES: debug-print
-# RUN: %PYTHON %s 2>&1 | FileCheck %s
-
-import mlir_tensorrt.compiler.api as api
-from mlir_tensorrt.compiler.ir import *
-import tempfile
-import glob
-import os
-import json
-import gc
-
-STATIC_ASM = """
-func.func @main(%arg0: tensor<2x3x4xf32>) -> tensor<2x3x4xf32> {
-  %1 = stablehlo.add %arg0, %arg0 : (tensor<2x3x4xf32>, tensor<2x3x4xf32>) -> tensor<2x3x4xf32>
-  func.return %1 : tensor<2x3x4xf32>
-}
-"""
-
-
-def layer_metadata_callback(op) -> str:
-    print("layer_metadata_callback CALLED")
-    return "TEST_CUSTOM_METADATA"
-
-
-def compile_asm():
-    with Context() as context:
-        m = Module.parse(STATIC_ASM)
-        client = api.CompilerClient(context)
-
-        with tempfile.TemporaryDirectory() as tmp:
-            opts = api.StableHLOToExecutableOptions(
-                client,
-                [
-                    "--tensorrt-builder-opt-level=3",
-                    "--tensorrt-strongly-typed=false",
-                    "--debug=true",
-                    "--debug-only=translate-to-tensorrt,stablehlo-clustering",
-                    f"--tensorrt-layer-info-dir={tmp}",
-                ],
-            )
-
-            opts.set_tensorrt_translation_metadata_callback(layer_metadata_callback)
-
-            api.compiler_stablehlo_to_executable(client, m.operation.clone(), opts)
-
-            json_file = glob.glob(os.path.join(tmp, "*"))[0]
-
-            metadata = json.load(open(json_file, "r"))["Layers"][-1]["Metadata"]
-            print(metadata)
-
-
-print("Compiling ASM")
-compile_asm()
-# CHECK-LABEL: Compiling ASM
-# CHECK: layer_metadata_callback CALLED
-# CHECK: TEST_CUSTOM_METADATA
-
-
-def layer_metadata_callback2(op) -> str:
-    print("layer_metadata_callback2 CALLED")
-    return "TEST_CUSTOM_METADATA2"
-
-
-def compile_multiple():
-    # Compile multiple times with different callbacks to ensure pass manager caching doesn't
-    # cause issues.
-    with Context() as context:
-        m = Module.parse(STATIC_ASM)
-        client = api.CompilerClient(context)
-        opts0 = api.StableHLOToExecutableOptions(
-            client,
-            ["--tensorrt-builder-opt-level=3", "--tensorrt-strongly-typed=false"],
-        )
-        opts0.set_tensorrt_translation_metadata_callback(layer_metadata_callback)
-        api.compiler_stablehlo_to_executable(client, m.operation.clone(), opts0)
-
-        del opts0
-        gc.collect()
-
-        opts1 = api.StableHLOToExecutableOptions(
-            client,
-            ["--tensorrt-builder-opt-level=3", "--tensorrt-strongly-typed=false"],
-        )
-        opts1.set_tensorrt_translation_metadata_callback(layer_metadata_callback2)
-        api.compiler_stablehlo_to_executable(client, m.operation.clone(), opts1)
-
-
-print("Checking multiple compile calls")
-compile_multiple()
-
-# CHECK-LABEL: Checking multiple compile calls
-# CHECK: layer_metadata_callback CALLED
-# CHECK: layer_metadata_callback2 CALLED