Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for the core algorithm #4

Open
jlperla opened this issue Oct 21, 2020 · 5 comments
Open

Add benchmarks for the core algorithm #4

jlperla opened this issue Oct 21, 2020 · 5 comments
Assignees

Comments

@jlperla
Copy link
Member

jlperla commented Oct 21, 2020

For the most part, you will want to benchmark the time to do XXX rademachers for the larger problems, or it might take too long to run multiple benchmarks.

Lets do a sanity check then on how are speed for teh large and huge compare to the existing matlab code. Maybe put the timing of the matlab as a comment in the benchmarks so that we don't need to rerun it ever again.

@paulcorcuera
Copy link
Collaborator

There is a benchmark\compute_whole.jl file that records the speed between Julia and Matlab for the large problem. Both are reasonably close, and the Julia code gets faster (relative to Matlab) as the number of rademacher simulations increase. I will have to use the cluster to benchmark things in Julia for the huge network, since the problem is infeasible in my computer.

@jlperla
Copy link
Member Author

jlperla commented Oct 23, 2020

Great!

It would be good to be able to say for the number of rademachers given by the heuristic Raffa gave what the performance of the matlab stuff was on the cluster for the large and huge datasets. We should keep track of the matlab numbers so we don't have to rerun matlab for a long time. I think it would also be useful for you to put the time for the large system on your computer as a comment as well. I would put the number of random projections in the comment as well so if we change the heuristic we will know what number of projections it was run on.

I think having these in comments in the benchmark julia source is good enough for now. Eventually we won't even need them once this becomes the defacto benchmark.

@paulcorcuera
Copy link
Collaborator

All timings are recorded in some comments in my code (for both my computer and on the cluster). For the huge dataset I encountered a killed process ,with both Julia and Matlab ,even when running for a small number of repetitions (50, and I have been allocating 3-4 hrs and 256GB memory to avoid long queue times). I can attempt an even smaller number like 5 or 10 rademachers, if you guys think it's useful to have it there. However, if you guys think having the benchmarks for the large network is enough, we could close this issue now. @rsaggio87

@rsaggio87
Copy link
Contributor

are u saying that the code crashed when running on the huge dataset? I thought we had this under control...

@paulcorcuera
Copy link
Collaborator

Sorry, no. What I am saying is that I receive a killed message, which is most likely due to reaching the memory allocation limit (I am using 256GB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants