Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. #229

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hlin09
Copy link
Contributor

@hlin09 hlin09 commented Mar 20, 2015

  1. Fixes 237 by including private function checks in package namespaces.
  2. Add a test for this.

@shivaram
Copy link
Contributor

@piccolbo Would be great if you could help test this patch.

@piccolbo
Copy link
Contributor

Will do

@piccolbo
Copy link
Contributor

Does not pass. Fails a little later, same way. Details to follow shortly.

@piccolbo
Copy link
Contributor

This is the first failure I got

> ### ** Examples
> 
> as.data.frame(
+   where(
+     input(mtcars),
+     cyl > 4))
Error in kv2rdd.list(if (ncol(k) == 0) f1(kv) else do.call(rbind, lapply(unname(split(kv,  : 
  could not find function "keys.spark"
Calls: source ... computeFunc -> <Anonymous> -> FUN -> FUN -> kv2rdd.list
Execution halted
15/03/20 14:03:32 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: R computation failed with
 Error in kv2rdd.list(if (ncol(k) == 0) f1(kv) else do.call(rbind, lapply(unname(split(kv,  : 
  could not find function "keys.spark"
Calls: source ... computeFunc -> <Anonymous> -> FUN -> FUN -> kv2rdd.list

So this is progress, because kv2rdd.list is private and is found but keys.spark is also private and is not found. I patched the code to fully qualify that name and it moves to a later point of failure:

Error in lazy_eval(x, c(data, list(.data = data))) : 
  could not find function "as.lazy"
Calls: source ... f1 -> do.call -> <Anonymous> -> lazy.eval -> lazy_eval
Execution halted
15/03/20 13:56:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: R computation failed with
 Error in lazy_eval(x, c(data, list(.data = data))) : 
  could not find function "as.lazy"
Calls: source ... f1 -> do.call -> <Anonymous> -> lazy.eval -> lazy_eval
Execution halted
    at edu.berkeley.cs.amplab.sparkr.BaseRRDD.compute(RRDD.scala:80)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:54)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Now this is getting complicated. lazy_eval is in package lazyeval, imported by plyrmr (just like SparkR). as.lazy is an exported function in that package. So my initial idea about this being an issue with private functions only is not the correct one.

@hlin09
Copy link
Contributor Author

hlin09 commented Mar 20, 2015

Thanks @piccolbo for reporting. Let me do more debugging on this later today.

@shivaram
Copy link
Contributor

BTW @hlin09 one thing you could try is to debug using plyrmr directly. @piccolbo can probably tell us if there are any setup instructions we should use.

@piccolbo
Copy link
Contributor

I think it's a good idea because also of SPARKR-238. The instructions talk
about installing hadoop first, and I think you can just ignore that.

I think if you cut it to the chase you just need to do

library(devtools)
install_github("RevolutionAnalytics/rmr2", subdir = "pkg")
install_github("RevolutionAnalytics/plyrmr", subdir = "pkg")

We don't normally give these instructions because we need regular users to
install the official latest version. You guys are not regular users. I see
some warnings but it seems to work. Otherwise let me know and I will point
you to the long way.

Then R CMD check path-to-plyrmr will repro 237

On Fri, Mar 20, 2015 at 4:28 PM, Shivaram Venkataraman <
notifications@github.com> wrote:

BTW @hlin09 https://github.com/hlin09 one thing you could try is to
debug using plyrmr directly. @piccolbo https://github.com/piccolbo can
probably tell us if there are any setup instructions we should use.


Reply to this email directly or view it on GitHub
#229 (comment)
.

@hlin09
Copy link
Contributor Author

hlin09 commented Mar 22, 2015

@piccolbo Thanks for the helpful instructions. I have just done some tests. Please try this patch and let me know if it works.

@hlin09 hlin09 changed the title Fix 237 by including private function checks in package namespaces. [SparkR-237, 238] Fix cleanClosure by including private function checks in package namespaces. Mar 22, 2015
@piccolbo
Copy link
Contributor

The tests pass the original error point, but fail elsewhere. From the error message, it seems an instance of SparkR-238, not 237. My suggestion is that we can assume 237 fixed and focus on 238, but maybe wait to close it until all plyrmr tests pass cleanly (I am assuming that all problems will prove to be related to changes in SparkR, which is only a working hypothesis).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants