Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readArray methods #156

Merged
merged 2 commits into from
May 25, 2018
Merged

readArray methods #156

merged 2 commits into from
May 25, 2018

Conversation

evanhaldane
Copy link
Contributor

fixes /issues/130

*
* @group slow
*/
def flatArray: Future[Array[closure.JvmValue]] = {
flatBuffer.intransitiveMap(closure.valueType.memory.toArray).run
}

/** Recursive function to reshape a flat array into a multi-dimensional one
*/
private[Tensors] def reshapeArray(a:Array[_], b:Array[Int]): Array[_] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a check that the product of the dimensions in b equals the length of a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also renaming those variables to something informative

*
* @group slow
*/
def flatArray: Future[Array[closure.JvmValue]] = {
flatBuffer.intransitiveMap(closure.valueType.memory.toArray).run
}

/** Recursive function to reshape a flat array into a multi-dimensional one
*/
private[Tensors] def reshapeArray(array:Array[_], shape:Array[Int]): Array[_] = {
Copy link
Collaborator

@Atry Atry May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When shape is Array.empty, the tensor is a scalar value, not an Array.

This method should either throw an exception for scalar value, or change the return type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! If scalars are considered distinct from 1d arrays, then I think it makes sense that the function readArray would throw an exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If scalars are considered distinct from 1d arrays, why 1D arrays and 2D array shares the same method? Considering the Array[_] return type is not actually usable without an anInstanceOf.

There are three options:

  1. Create two methods for scalars and 1+ dimensional arrays, respectively
  2. Create one Any methods for both scalars and 1+ dimensional arrays
  3. Create n methods for different rank of tensors, respectively

What do you think?


/** Recursive function to reshape a flat Seq into a multi-dimensional one
*/
private[Tensors] def reshapeSeq(a:Seq[_], b:Array[Int]): Seq[_] = {
Copy link
Collaborator

@Atry Atry May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When shape is Array.empty, the tensor is a scalar value, not a Seq.

This method should either throw an exception for scalar value, or change the return type.

* @group slow
*/

def toArray = {flatArray.map {z=> reshapeArray(z, shape)}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since toArray or toSeq in Scala collection library is not asynchronous, the name toArray will surprise people if it returns a Future.

Maybe we should rename it to readArray, which seems like a rather expensive operation.

@evanhaldane
Copy link
Contributor Author

You're right that the user would need to know the number of dimensions and cast with asInstanceOf anyway, so how about this type of approach with explicit methods, sort of the same way Array.ofDim works?

(I haven't touched toSeq yet. Let me know what you think)

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

The best solution is changing Tensor to Tensor[A], where A could be Float, Array[Float], Array[Array[Float]] etc.

But it requires huge number changes in the code base.

At the moment I think readArray: Future[Any] is acceptable.

@evanhaldane
Copy link
Contributor Author

Do you think it is worth having to parametrize Tensor everywhere else for the purpose of solving this issue? Are there other advantages to that approach? Determining the type parameter for operations that change the shape of the tensor could end up being problematic.

What do you not like about the read2DArray, read3DArray approach?

@evanhaldane
Copy link
Contributor Author

Ah I just saw your edited comment.

I think read2DArray, read3DArray, etc. is probably better than the Any approach, as that would be almost unusable beyond 5 dimensions anyway.

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

Eventually we will need type parameters for Tensor, when implementing other element types: #104

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

In practice the length of a tensor parameter may vary in multiple calls to a function, e.g. different batch size when running a neural network.

However the number of dimensions of a tensor is usually stable. It is reasonable to statically type it.

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

If we will eventually add the type parameter for tensor, the readScalar / read2DArray / read3DArray approach is just a temporary solution.

@evanhaldane
Copy link
Contributor Author

Right. Makes sense, though many changes necessary and some issues to solve (e.g. the function split).

For now should we go with the (probably temporary) solution and address those issues as part of #104?

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

If in the future it will become the following type signature:

trait Tensor[A] {
  def read: Future[A]
}

Then I thought a sane type signature for now could be:

trait Tensor {
  def read: Future[Any]
}

Do you think the Seq / Array distinct is important here?

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

Which type is preferred, Seq or Array, or both?

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

Note that the approach of n-dimensional Seq can be potentially more efficient than Array by creating nested views of underlying flat array, avoiding memory copy.

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

@evanhaldane
Copy link
Contributor Author

If the main purpose of toArray or toSeq is to interact with other code, then is efficiency relevant? If the user needs an Array because of some other library or code base, then that's what matters, right?

@evanhaldane
Copy link
Contributor Author

or were you asking about the underlying parametrization of Tensor?

@Atry
Copy link
Collaborator

Atry commented May 9, 2018

There are some issues to address in this PR:

  1. Which type is preferred when reading a tensor to a JVM type, Seq or Array, or both?
  2. What type signature is for the reading methods?

@evanhaldane
Copy link
Contributor Author

  1. Both.
  2. I've added the appropriate signature for each of the individual read methods.

@Atry
Copy link
Collaborator

Atry commented May 10, 2018

I can merge this PR for now.
Would you mind if those read*DArray methods will be replaced to a single read method when the type parameter is introduced for Tensor? @evanhaldane

@evanhaldane
Copy link
Contributor Author

Sure! I've subscribed to notifications on #104.

@Atry
Copy link
Collaborator

Atry commented May 10, 2018

The incoming change will break backward compatibility.... It's OK since the version number is still pre 1.0.

@Atry
Copy link
Collaborator

Atry commented May 10, 2018

Please remove [WIP] in the title and rebase it according to the git HEAD before it is ready to merge.

methods to read Tensor into Array and Seq

update README
@evanhaldane evanhaldane changed the title [WIP] toArray and toSeq methods readArray methods May 10, 2018
@Atry Atry merged commit d38b12e into ThoughtWorksInc:0.4.x May 25, 2018
@Atry
Copy link
Collaborator

Atry commented May 25, 2018

Merged.

Thank you! @evanhaldane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Tensor#toSeq and Tensor#toArray methods
2 participants