readArray methods #156

evanhaldane · 2018-05-04T23:00:50Z

evanhaldane · 2018-05-04T23:04:17Z

Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala

      *
      * @group slow
      */
    def flatArray: Future[Array[closure.JvmValue]] = {
      flatBuffer.intransitiveMap(closure.valueType.memory.toArray).run
    }

+    /** Recursive function to reshape a flat array into a multi-dimensional one
+      */
+    private[Tensors] def reshapeArray(a:Array[_], b:Array[Int]): Array[_] = {


Needs a check that the product of the dimensions in b equals the length of a

also renaming those variables to something informative

Atry · 2018-05-05T04:15:41Z

Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala

      *
      * @group slow
      */
    def flatArray: Future[Array[closure.JvmValue]] = {
      flatBuffer.intransitiveMap(closure.valueType.memory.toArray).run
    }

+    /** Recursive function to reshape a flat array into a multi-dimensional one
+      */
+    private[Tensors] def reshapeArray(array:Array[_], shape:Array[Int]): Array[_] = {


When shape is Array.empty, the tensor is a scalar value, not an Array.

This method should either throw an exception for scalar value, or change the return type.

Right! If scalars are considered distinct from 1d arrays, then I think it makes sense that the function readArray would throw an exception.

If scalars are considered distinct from 1d arrays, why 1D arrays and 2D array shares the same method? Considering the Array[_] return type is not actually usable without an anInstanceOf.

There are three options:

Create two methods for scalars and 1+ dimensional arrays, respectively

Create one Any methods for both scalars and 1+ dimensional arrays

Create n methods for different rank of tensors, respectively

What do you think?

Atry · 2018-05-05T04:15:56Z

Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala

+
+    /** Recursive function to reshape a flat Seq into a multi-dimensional one
+      */
+    private[Tensors] def reshapeSeq(a:Seq[_], b:Array[Int]): Seq[_] = {


When shape is Array.empty, the tensor is a scalar value, not a Seq.

This method should either throw an exception for scalar value, or change the return type.

Atry · 2018-05-05T04:20:25Z

Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala

+      * @group slow
+      */
+
+    def toArray = {flatArray.map {z=> reshapeArray(z, shape)}}


Since toArray or toSeq in Scala collection library is not asynchronous, the name toArray will surprise people if it returns a Future.

Maybe we should rename it to readArray, which seems like a rather expensive operation.

evanhaldane · 2018-05-08T18:22:08Z

You're right that the user would need to know the number of dimensions and cast with asInstanceOf anyway, so how about this type of approach with explicit methods, sort of the same way Array.ofDim works?

(I haven't touched toSeq yet. Let me know what you think)

Atry · 2018-05-09T01:16:06Z

The best solution is changing Tensor to Tensor[A], where A could be Float, Array[Float], Array[Array[Float]] etc.

But it requires huge number changes in the code base.

At the moment I think readArray: Future[Any] is acceptable.

evanhaldane · 2018-05-09T01:59:56Z

Do you think it is worth having to parametrize Tensor everywhere else for the purpose of solving this issue? Are there other advantages to that approach? Determining the type parameter for operations that change the shape of the tensor could end up being problematic.

What do you not like about the read2DArray, read3DArray approach?

evanhaldane · 2018-05-09T02:01:25Z

Ah I just saw your edited comment.

I think read2DArray, read3DArray, etc. is probably better than the Any approach, as that would be almost unusable beyond 5 dimensions anyway.

Atry · 2018-05-09T02:03:41Z

Eventually we will need type parameters for Tensor, when implementing other element types: #104

Atry · 2018-05-09T02:08:08Z

In practice the length of a tensor parameter may vary in multiple calls to a function, e.g. different batch size when running a neural network.

However the number of dimensions of a tensor is usually stable. It is reasonable to statically type it.

Atry · 2018-05-09T02:09:57Z

If we will eventually add the type parameter for tensor, the readScalar / read2DArray / read3DArray approach is just a temporary solution.

evanhaldane · 2018-05-09T02:24:01Z

Right. Makes sense, though many changes necessary and some issues to solve (e.g. the function split).

For now should we go with the (probably temporary) solution and address those issues as part of #104?

Atry · 2018-05-09T02:50:02Z

If in the future it will become the following type signature:

trait Tensor[A] {
  def read: Future[A]
}

Then I thought a sane type signature for now could be:

trait Tensor {
  def read: Future[Any]
}

Do you think the Seq / Array distinct is important here?

Atry · 2018-05-09T02:50:55Z

Which type is preferred, Seq or Array, or both?

Atry · 2018-05-09T02:52:24Z

Note that the approach of n-dimensional Seq can be potentially more efficient than Array by creating nested views of underlying flat array, avoiding memory copy.

Atry · 2018-05-09T02:56:30Z

We had create a view for Tensor.split: https://github.com/ThoughtWorksInc/Compute.scala/blob/f893fde/Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala#L1060

evanhaldane · 2018-05-09T03:07:01Z

If the main purpose of toArray or toSeq is to interact with other code, then is efficiency relevant? If the user needs an Array because of some other library or code base, then that's what matters, right?

evanhaldane · 2018-05-09T03:10:47Z

or were you asking about the underlying parametrization of Tensor?

Atry · 2018-05-09T03:48:47Z

There are some issues to address in this PR:

Which type is preferred when reading a tensor to a JVM type, Seq or Array, or both?
What type signature is for the reading methods?

evanhaldane · 2018-05-09T19:18:48Z

Both.
I've added the appropriate signature for each of the individual read methods.

Atry · 2018-05-10T02:03:13Z

I can merge this PR for now.
Would you mind if those read*DArray methods will be replaced to a single read method when the type parameter is introduced for Tensor? @evanhaldane

evanhaldane · 2018-05-10T02:05:29Z

Sure! I've subscribed to notifications on #104.

Atry · 2018-05-10T02:07:19Z

The incoming change will break backward compatibility.... It's OK since the version number is still pre 1.0.

Atry · 2018-05-10T02:09:06Z

Please remove [WIP] in the title and rebase it according to the git HEAD before it is ready to merge.

methods to read Tensor into Array and Seq update README

Atry · 2018-05-25T05:13:59Z

Merged.

Thank you! @evanhaldane

evanhaldane commented May 4, 2018

View reviewed changes

Atry requested changes May 5, 2018

View reviewed changes

Atry approved these changes May 10, 2018

View reviewed changes

evanhaldane added 2 commits May 9, 2018 23:14

methods to read Tensor into Array and Seq

498bb27

methods to read Tensor into Array and Seq update README

update README

511bbc5

evanhaldane force-pushed the toArray branch from 1c57442 to 511bbc5 Compare May 10, 2018 03:15

evanhaldane changed the title ~~[WIP] toArray and toSeq methods~~ readArray methods May 10, 2018

Atry merged commit d38b12e into ThoughtWorksInc:0.4.x May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readArray methods #156

readArray methods #156

evanhaldane commented May 4, 2018

evanhaldane May 4, 2018

evanhaldane May 4, 2018

Atry May 5, 2018 •

edited

Loading

evanhaldane May 7, 2018

Atry May 7, 2018

Atry May 5, 2018 •

edited

Loading

Atry May 5, 2018

evanhaldane commented May 8, 2018

Atry commented May 9, 2018 •

edited

Loading

evanhaldane commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

evanhaldane commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018 •

edited

Loading

evanhaldane commented May 9, 2018

Atry commented May 10, 2018

evanhaldane commented May 10, 2018

Atry commented May 10, 2018

Atry commented May 10, 2018

Atry commented May 25, 2018

readArray methods #156

readArray methods #156

Conversation

evanhaldane commented May 4, 2018

evanhaldane May 4, 2018

Choose a reason for hiding this comment

evanhaldane May 4, 2018

Choose a reason for hiding this comment

Atry May 5, 2018 • edited Loading

Choose a reason for hiding this comment

evanhaldane May 7, 2018

Choose a reason for hiding this comment

Atry May 7, 2018

Choose a reason for hiding this comment

Atry May 5, 2018 • edited Loading

Choose a reason for hiding this comment

Atry May 5, 2018

Choose a reason for hiding this comment

evanhaldane commented May 8, 2018

Atry commented May 9, 2018 • edited Loading

evanhaldane commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018

Atry commented May 9, 2018 • edited Loading

Atry commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018

Atry commented May 9, 2018 • edited Loading

Atry commented May 9, 2018 • edited Loading

Atry commented May 9, 2018 • edited Loading

evanhaldane commented May 9, 2018

evanhaldane commented May 9, 2018

Atry commented May 9, 2018 • edited Loading

evanhaldane commented May 9, 2018

Atry commented May 10, 2018

evanhaldane commented May 10, 2018

Atry commented May 10, 2018

Atry commented May 10, 2018

Atry commented May 25, 2018

Atry May 5, 2018 •

edited

Loading

Atry May 5, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading

Atry commented May 9, 2018 •

edited

Loading