Skip to content

Commit

Permalink
Merge pull request #8 from InfiniTensor/dev
Browse files Browse the repository at this point in the history
Turn `tensor.Tensor.shape` and `tensor.Tensor.strides` into tuples and add logo
  • Loading branch information
voltjia authored Sep 9, 2024
2 parents 895beec + c96c472 commit 34d3e04
Show file tree
Hide file tree
Showing 6 changed files with 25 additions and 5 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# NineToothed

![NineToothed Logo](docs/source/_static/ninetoothed-logo.png)

A domain-specific language (DSL) based on Triton but providing higher-level abstractions.

**Other language versions: [English](README.md), [简体中文](docs/README.zh.md).**
Expand Down Expand Up @@ -65,4 +67,4 @@ def matmul_kernel(a: a_tiled, b: b_tiled, c: c_tiled):

For matrix multiplication, we also have three tensor parameters, but the tiling method is more complex than vector addition. We denote the three matrices as $A$, $B$, and $C$, where $A$ and $B$ are inputs, and $C$ is the output. Tiling $C$ is simple; we just need to divide it into blocks of size `(BLOCK_SIZE_M, BLOCK_SIZE_N)` by rows and columns. Once each block computes its result, the entire $C$ is computed. However, how should we tile $A$ and $B$? The answer is to introduce another meta-parameter `BLOCK_SIZE_K`. This way, we can divide $A$ into blocks of size `(BLOCK_SIZE_M, BLOCK_SIZE_K)` and $B$ into blocks of size `(BLOCK_SIZE_K, BLOCK_SIZE_N)`. However, for matrix multiplication, $A$ and $B$ do not correspond block by block; each row of $A$ needs to correspond to each column of $B$. Therefore, we need to further `tile` $A$ and $B$ by rows and columns, respectively. Up to this point, we have a set of row blocks of $A$ and column blocks of $B$. However, each row block of $A$ must correspond to every column block of $B$. This is where `expand` comes in. We `expand` the row blocks of $A$ along the columns to the number of columns of $C$ and the column blocks of $B$ along the rows to the number of rows of $C$. This way, we successfully tile $A$, $B$, and $C$. In fact, our meta-operations up to this point have already enabled us to write kernel functions. However, we notice that the levels where the row blocks and column blocks reside, which we mentioned earlier, are two-dimensional, and their sizes are of the forms `(1, ...)` and `(..., 1)`. This means that if no other operations are performed, the way we access row blocks and column blocks would have to be `a[0, k]` and `b[k, 0]`. If we want to use `a` to find the range of `k`, we would need to use `a.shape[1]`, but we know that dimensions of size `1` can actually be removed completely. This is why we added two lines of `squeeze`. The `dtype` refers to the data type, which in PyTorch can generally be some integer or floating-point type, such as `torch.float32`. However, since meta-operations like `tile` can be performed in NineToothed, `dtype` can also be a `Tensor`. In other words, there is a concept of "tensors that store tensors" in NineToothed. In summary, these two lines perform operations on the tensors stored in the outmost tensor, removing the dimensions of size `1`. This way, when we access the row and column blocks, we can use `a[k]` and `b[k]`, and when finding the range of `k`, we can use `a.shape[0]`.

With tiling done, the rest is simple. In the function body, we define an `accumulator` to accumulate intermediate results. We then iterate through the corresponding row blocks of $A$ and column blocks of B, multiplying them and accumulating the results in `accumulator`. Finally, we place the `accumulator` in the corresponding block of $C$. Since each block of the parameter tensors undergoes this operation, the multiplication is completed for the whole tensors as well.
With tiling done, the rest is simple. In the function body, we define an `accumulator` to accumulate intermediate results. We then iterate through the corresponding row blocks of $A$ and column blocks of $B$, multiplying them and accumulating the results in `accumulator`. Finally, we place the `accumulator` in the corresponding block of $C$. Since each block of the parameter tensors undergoes this operation, the multiplication is completed for the whole tensors as well.
2 changes: 2 additions & 0 deletions docs/README.zh.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# 九齿

![九齿 Logo](source/_static/ninetoothed-logo.png)

一种基于 Triton 但提供更高层抽象的领域特定语言(DSL)。

**其他语言版本: [English](../README.md)[简体中文](README.zh.md)**
Expand Down
Binary file added docs/source/_static/ninetoothed-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "ninetoothed"
version = "0.4.0"
version = "0.5.0"
authors = [{ name = "Jiacheng Huang", email = "huangjiacheng0709@outlook.com" }]
description = "A domain-specific language based on Triton but providing higher-level abstraction."
readme = "README.md"
Expand Down
2 changes: 1 addition & 1 deletion src/ninetoothed/jit.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def visit_Attribute(self, node):
if isinstance(value, Tensor):
inner = value.dtype

return Symbol(inner.__dict__[node.attr]).node
return Symbol(getattr(inner, node.attr)).node

self.generic_visit(node)

Expand Down
20 changes: 18 additions & 2 deletions src/ninetoothed/tensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ def __init__(
self.name = f"tensor_{type(self).num_instances}"

if ndim is not None:
self.shape = [Symbol(self.size_string(i)) for i in range(ndim)]
self.strides = [Symbol(self.stride_string(i)) for i in range(ndim)]
self.shape = (Symbol(self.size_string(i)) for i in range(ndim))
self.strides = (Symbol(self.stride_string(i)) for i in range(ndim))
else:
self.shape = shape

Expand Down Expand Up @@ -191,6 +191,22 @@ def stride(self, dim=None):

return self.strides[dim]

@property
def shape(self):
return self._shape

@shape.setter
def shape(self, value):
self._shape = tuple(value)

@property
def strides(self):
return self._strides

@strides.setter
def strides(self, value):
self._strides = tuple(value)

@property
def ndim(self):
return len(self.shape)
Expand Down

0 comments on commit 34d3e04

Please sign in to comment.