Replies: 3 comments
-
Hmm, something is going wrong here. In theory, you should just be able to do You could also consider |
Beta Was this translation helpful? Give feedback.
-
I am getting slightly different results, but still the fastest implementation is using @benchmark Matrix(1.0I, 20000, 20000)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min … max): 843.900 ms … 1.179 s ┊ GC (min … max): 0.10% … 9.66%
Time (median): 1.111 s ┊ GC (median): 10.25%
Time (mean ± σ): 1.025 s ± 165.337 ms ┊ GC (mean ± σ): 9.48% ± 8.79%
█ ▁ ▁ ▁
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█ ▁
844 ms Histogram: frequency by time 1.18 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_turbo()
dp = Array{Float64}(undef, 20000, 20000);
@turbo for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_turbo()
@benchmark i_turbo()
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.156 s … 1.562 s ┊ GC (min … max): 0.08% … 13.87%
Time (median): 1.333 s ┊ GC (median): 7.36%
Time (mean ± σ): 1.346 s ± 181.958 ms ┊ GC (mean ± σ): 7.68% ± 7.92%
█ █ █ █
█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
1.16 s Histogram: frequency by time 1.56 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_tturbo()
dp = Array{Float64}(undef, 20000, 20000);
@tturbo for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_tturbo()
@benchmark i_tturbo()
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max): 499.124 ms … 1.105 s ┊ GC (min … max): 0.16% … 42.38%
Time (median): 963.943 ms ┊ GC (median): 36.06%
Time (mean ± σ): 861.720 ms ± 258.747 ms ┊ GC (mean ± σ): 30.91% ± 20.18%
█ █ █ █ █ █
█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁█▁▁▁█ ▁
499 ms Histogram: frequency by time 1.11 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_mt()
dp = Array{Float64}(undef, 20000, 20000);
Threads.@threads for j ∈ axes(dp,2)
@simd for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_mt()
@benchmark i_mt()
BenchmarkTools.Trial: 11 samples with 1 evaluation.
Range (min … max): 312.225 ms … 740.457 ms ┊ GC (min … max): 0.31% … 51.99%
Time (median): 473.053 ms ┊ GC (median): 28.27%
Time (mean ± σ): 476.158 ms ± 114.606 ms ┊ GC (mean ± σ): 28.81% ± 17.80%
█
▇▁▁▁▁▁▇▁▁▁▁▁▁▁▇▁▇▁▁▁▁▁█▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
312 ms Histogram: frequency by time 740 ms <
Memory estimate: 2.98 GiB, allocs estimate: 199. |
Beta Was this translation helpful? Give feedback.
-
Using the suggested method I could get similar performance. I am wondering if it could be further improved. function i_polyester()
dp = Array{Float64}(undef, 20000, 20000);
@batch minbatch=1250 for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_polyester()
@benchmark i_polyester()
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 266.641 ms … 633.969 ms ┊ GC (min … max): 0.28% … 47.30%
Time (median): 508.365 ms ┊ GC (median): 34.35%
Time (mean ± σ): 501.235 ms ± 116.029 ms ┊ GC (mean ± σ): 31.46% ± 15.68%
▁ ▁ █▁▁ ▁ ▁ ▁ ▁
█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁█ ▁
267 ms Histogram: frequency by time 634 ms <
Memory estimate: 2.98 GiB, allocs estimate: 2. |
Beta Was this translation helpful? Give feedback.
-
Recently I found creating a large identity matrix the default way (e.g.
Matrix(1.0I, 20000, 20000)
) could be relatively slow due to the limited memory bandwidth of a single thread. To address that, I found I could do better with something like below:I am wondering if I could use
LoopVectorization
to achieve better performance, or simply minimize the use of nativeThreads.@threads
. Thanks in advance!Beta Was this translation helpful? Give feedback.
All reactions