Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger than memory heatmaps #973

Open
rafaqz opened this issue May 23, 2021 · 39 comments
Open

Larger than memory heatmaps #973

rafaqz opened this issue May 23, 2021 · 39 comments
Labels
enhancement Feature requests and enhancements Makie Backend independent issues (Makie core) plot Related to plot object

Comments

@rafaqz
Copy link
Contributor

rafaqz commented May 23, 2021

With spatial raster data it would be good to be able to plot larger-than-memory heatmaps and images, and zoom in to get detail.

I'm not sure how/where this could be implemented, just flagging this as something that would be very useful in future, and as a key requirement for some spatial plotting workflows.

See:
yeesian/ArchGDAL.jl#195
https://github.com/rafaqz/GeoData.jl/issues/172

@SimonDanisch
Copy link
Member

from #443 we got an example, that shows how one can quite easily implement some calculating/fetching of data when zooming in/out:

using GLMakie
fig = Figure()
ax = Axis(fig[1, 1])
function datashader(limits, pixelarea)
    # return your heatmap data
    # here, I just calculate a sine field as a demo
    xpixels, ypixels = widths(pixelarea)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    [cos(x) * sin(y) for x in LinRange(xmin, xmax, xpixels),
                         y in LinRange(ymin, ymax, ypixels)]
end
xrange = lift(x -> minimum(x)[1] .. maximum(x)[1], ax.finallimits)
yrange = lift(x -> minimum(x)[2] .. maximum(x)[2], ax.finallimits)
pixels = lift(datashader, ax.finallimits, ax.scene.px_area)
heatmap!(ax, xrange, yrange, pixels,
    xautolimits = false, yautolimits = false)
display(fig)

@rafaqz rafaqz closed this as completed May 25, 2021
@rafaqz rafaqz reopened this May 25, 2021
@rafaqz
Copy link
Contributor Author

rafaqz commented May 25, 2021

Thanks! Amazing that can be done already. Is it possible to put this in a recipe? It would be good if GeoData.jl could do this by default above a certain size.

@SimonDanisch
Copy link
Member

Yeah I'd be happy to... I can help with putting it in a recipe, if someone extends the above script with the required interpolation math etc ;)

@rafaqz
Copy link
Contributor Author

rafaqz commented May 26, 2021

Oh that's ok I can do it, I just wondered if it was possible at all! But didn't make that so clear. I guess lift does some callback magic to allow updating in the interface, which is very cool.

I'll let you know if I hit any snags.

@lazarusA
Copy link
Contributor

@rafaqz hey, so... how are you actually using datashader, let's say for a matrix 40,000x40,000 as the input.

@rafaqz
Copy link
Contributor Author

rafaqz commented Jun 30, 2021

Yep, it works fine in basic scripts. Im too busy this month and I'll wait for the recipes iterface to settle here, but the plan is that GeoData.jl will have lazy plot zooming leveraging DiskArrays.jl/Makie.jl, using something like this.

Edit: With the small text on my phone I missed the word "how" and thought you were just asking if I actually used this!

@SimonDanisch
Copy link
Member

Yep, it works fine in basic scripts

Would you mind already posting that basic script so others can play around with it?
Or is it still just the demo that doesn't actually load anything from disk/ram?

@rafaqz
Copy link
Contributor Author

rafaqz commented Jun 30, 2021

Ok, trying to clean up my example to post here, but it seems like the example above doesn't work anymore either?

julia> fig = GLMakie.Figure()
ERROR: type GridLayout has no field needs_update
Stacktrace:
  [1] getproperty(x::GridLayout, f::Symbol)
    @ Base ./Base.jl:33
  [2] (::AbstractPlotting.var"#831#832"{GridLayout})(al::Outside)
    @ AbstractPlotting ~/.julia/packages/AbstractPlotting/M8Nlv/src/figures.jl:80

And to clarify, in my now-broken the code, the disk loading is abstracted DiskArrays.jl/GeoData.jl internals, it's not apparent that's happening, it's just used like an array. But I'll post that if we can get the above working and I can fix my code too.

Here's a simple way to replace the sin/cos array used above (I think, can't test):

dxmin, dymin = map(max, map(x -> trunc(Int, x), (xmin, ymin)), map(first, axes(A)))
dxmax, dymax = map(min, map(x -> trunc(Int, x), (xmax, ymax)), map(last, axes(A)))
dxpixels = round(Int, (dxmax - dxmin) / (xmax - xmin) * xpixels)
dypiyels = round(Int, (dymax - dymin) / (ymax - ymin) * ypixels)
xs = [trunc(Int, x) for x in LinRange(dxmin, dxmax, xpixels)]
ys = [trunc(Int, y) for y in LinRange(dymin, dymax, ypixels)]
A[xs, ys]

You could replace trunc with interpolation if you need that.

The more complicated script uses DimensionalData.jl selectors to work in e.g. lat/lon instead of pixels, but I don't think that was finished and I also can't test it.

@jkrumbiegel
Copy link
Member

jkrumbiegel commented Jun 30, 2021

This is a regression in relation to GridLayoutBase and is already fixed, try updating Makie. You seem to be on an older version if AbstractPlotting is loaded.

@rafaqz
Copy link
Contributor Author

rafaqz commented Jun 30, 2021

Right. WebIO.jl needs to register a patch version, it's holding back Observables to 0.3.3. Thats why no new Makie...

@garibarba
Copy link

from #443 we got an example, that shows how one can quite easily implement some calculating/fetching of data when zooming in/out:

using GLMakie
fig = Figure()
ax = Axis(fig[1, 1])
function datashader(limits, pixelarea)
    # return your heatmap data
    # here, I just calculate a sine field as a demo
    xpixels, ypixels = widths(pixelarea)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    [cos(x) * sin(y) for x in LinRange(xmin, xmax, xpixels),
                         y in LinRange(ymin, ymax, ypixels)]
end
xrange = lift(x -> minimum(x)[1] .. maximum(x)[1], ax.finallimits)
yrange = lift(x -> minimum(x)[2] .. maximum(x)[2], ax.finallimits)
pixels = lift(datashader, ax.finallimits, ax.scene.px_area)
heatmap!(ax, xrange, yrange, pixels,
    xautolimits = false, yautolimits = false)
display(fig)

This works quite well for me, but I'm getting poor performance (unusable) when using WGLMakie instead of GLMakie (on macOS FYI).
I've found this discussion (https://discourse.julialang.org/t/lag-with-wglmakie/55399/8) pointing out that the data transfer (Julia-WebGL) should be expected to be slow. Am I missing something here or is this the full story? Thanks!

@SimonDanisch
Copy link
Member

SimonDanisch commented Oct 4, 2021

It should be much slower than GLMakie ...Although not very optimized at the moment... So if you want to do some profiling, you may find some low hanging fruits to improve performance...
Which reminds me, I just recently figured out a better way to serialize arrays more efficiently, which could be used in JSServe as well....

@rafaqz
Copy link
Contributor Author

rafaqz commented Mar 17, 2023

With #2779 I'm wondering if Makie should do this by default for all heatmaps? Just plotting regular in-memory arrays as heatmaps is slow if they get too big.

@SimonDanisch
Copy link
Member

Hm we don't have axis recipes yet, and this would need access to the finallimits... But maybe we can prototype this in some package, and then figure out how to integrate it into Makie going forward?

@asinghvi17
Copy link
Member

Isn't this what Tyler.jl does (more or less)? I think there's a PR there which would allow arbitrary tiles, which one could extend to tiles mapped or resampled from a backing DiskArray...

@SimonDanisch
Copy link
Member

Yeaah, I was thinking about that.... But didn't have a quick idea how to make it work nicely (I mean, something I could implement in ~30 min or so)

@jkrumbiegel
Copy link
Member

Hm we don't have axis recipes yet, and this would need access to the finallimits

isn't that similar to the stuff vlines etc do, this works with the camera observables we get via the parent scene already. In the future, that could just be part of the scene/plot interface so we don't have to hook the observables directly

@rafaqz
Copy link
Contributor Author

rafaqz commented Mar 17, 2023

Tyler.jl has 2x resolution changes though, so there will be noticeable jumps when you zoom if heatmaps work like Tyler. I'm just doing a rounding interpolation currently.

function datashader(A, limits, pixelarea; maxres = 500)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    xpixels, ypixels = widths(pixelarea)
    xres = round(Int, max(1, min(maxres, xpixels, xmax - xmin)))
    yres = round(Int, max(1, min(maxres, ypixels, ymax - ymin)))
    dxmin = max(xmin, 1)
    dxmax = min(xmax, size(A, 1))
    dymin = max(ymin, 1)
    dymax = min(ymax, size(A, 2))
    xrange = LinRange(dxmin, dxmax, xres)
    yrange = LinRange(dymin, dymax, yres)
    # Interpolate to vectors of indices
    xs = trunc.(Int, xrange)
    ys = trunc.(Int, yrange)
    return A[xs, ys]
end

xrange = lift(ax.finallimits) do fl
    max(minimum(fl)[1], 1) .. min(maximum(fl)[1], size(first(maps)[], 1))
end
yrange = lift(ax.finallimits) do fl
    max(minimum(fl)[2], 1) .. min(maximum(fl)[2], size(first(maps)[], 2))
end

pixels = lift(datashader, m, ax.finallimits, ax.scene.px_area)
p = heatmap!(ax, xrange, yrange, pixels)

(although this is slightly buggy) Should be mostly fixed

@jkrumbiegel
Copy link
Member

It could be that this implementation where x, y and the data is lifted from finallimits updates inefficiently because the heatmap args conversion is triggered three times per limit change. Might be better implemented with on, then mutating x and y before triggering the data.

@felixcremer
Copy link
Contributor

@rafaqz is this something we would like to look at next week during MakieCon?

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

Yeah we could totally hack on this.

I was thinking it would be good to have zoom level caching like Tyler.jl, which would need defining zoom levels... so @asinghvi17 is right its basically just generalising Tyler.jl.

But we could build the tile pyramid efficiently somehow, just doing one pass to make e.g. the first 4 layers of tiles. The do the rest lazily as they need less disk reads anyway.

We could also try using layers of COG tiffs and similar where the tile pyramid is already built. Rasters.jl doesnt access those currently so thats a little more work.

@asinghvi17
Copy link
Member

If we can get a good recipe for layers, we could make it easy to create a layers object from file, then just plot that. Perhaps an alternate dispatch for heatmap would also serve this purpose.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

Interesting, yeah if there was some kind of plottable Layers object we could return it from recipes and provide callbacks for getting the data.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

Regular heatmaps could also use it with a function like the one above for subsampling.

@timholy
Copy link
Contributor

timholy commented Apr 16, 2023

FYI this is something that ImageView does (via ImageBase.restrict), although there seem to be some needed optimizations to make it faster. You're welcome to steal anything that seems useful.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

Ah interesting ImageBase.jl restrict does proper anti-aliased pyramid building.

I was imagining for quick plotting of large disk-based data we would just use a nearest pixel algorithm so its fast. Then with some chunking patterns we can probably skip reading some of the data at all for the top level images.

But we could use restrict for better quality outputs.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

Some light reading:

https://www.researchgate.net/publication/311423420_An_Efficient_Tile-Pyramids_Building_Method_for_Fast_Visualization_of_Massive_Geospatial_Raster_Datasets

@felixcremer from your other Rasters.jl plotting issue where you mosiac a lot of plots I guess you would want that for this too?

We could build this so its efficient for multiple mosaic tiles.

@felixcremer
Copy link
Contributor

In my case the data of a single tile is 15000 x 15000 which already kills Julia if I try to plot all of it. I currently only plot every 10nth pixel. But yes, when we go to more tiles I would like to only plot a meaningful subset.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

We should be able to mosaic 20 of those and pan and zoom around with no lag

(I mean it should be possible to write it so we can do that 😅 )

@felixcremer
Copy link
Contributor

That was how I was reading this, yes. So lets do work on this on tuesday.

@timholy
Copy link
Contributor

timholy commented Apr 16, 2023

Yeah, the problem with plotting every nth is that it can be awfully misleading---there can be really interesting stuff happening in a superpixel and you can miss it entirely.

There's a certain sense in which I think the best option is to come up with a way of representing the extrema (min/max) of each superpixel, but I don't know of a good way of doing that. If you had to pick one, the mean seems as good as anything else I can think of.

@rafaqz
Copy link
Contributor Author

rafaqz commented Apr 16, 2023

The mean is always nice. But it depends how much patience we have for building the pyramid when its so easy to just zoom in and see more detail of a particular area.

Reading every Nth might mean reading 1 Nth of the data if e.g. the chunks are colums. So N times faster to get the first overview image.

@timholy
Copy link
Contributor

timholy commented Apr 17, 2023

I agree. But the downsides should be acknowledged. Maybe a practical example would make it clearer: imagine a photograph from astronomy and imagine stars are pointlike and very sparse. The every-nth pixel solution might return a completely black image, missing every single star in the image.

@ffreyer ffreyer added enhancement Feature requests and enhancements Makie Backend independent issues (Makie core) plot Related to plot object labels Aug 22, 2024
@asinghvi17
Copy link
Member

#4317 is an implementation of the every-nth-pixel method if I understand correctly, but it shouldn't be too hard to generalize that to some arbitrary accumulator.

@SimonDanisch
Copy link
Member

It also added a gaussian Pyramid, which helps to remove those problems... Interesting would be how this works with e.g. DiskArrays and how to interpolate larger than memory data...

@rafaqz
Copy link
Contributor Author

rafaqz commented Sep 12, 2024

We will need to handle the windowing, but DiskArrayEngine should be able to do this cc @meggart

@asinghvi17
Copy link
Member

Fabian is out of office for the next month unfortunately :D. Maybe @felixcremer has some idea? The ideas in generating a pyramid in PyramidScheme are probably similar.

@felixcremer
Copy link
Contributor

I tried to play around with the new Makie.Pyramid function, but this is killing my julia process with the example in the docs, because I get a memory usage of more than 15 GB.

PyramidScheme.jl is using DiskArrayEngine to do the aggregation of the data. We are not doing any interpolation, but aggregating every two by two pixel to a coarser resolution pixel. One could use PyramidScheme.Pyramid to prepare a Pyramid for plotting. This would not load the high resolution data into memory if it is a DiskArray and therefore have a smaller memory footprint.
It might be worth it to also implement the gaussian pyramid in a diskarray aware way.

@timholy
Copy link
Contributor

timholy commented Sep 16, 2024

One would surely have to allocate the pyramid using mmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and enhancements Makie Backend independent issues (Makie core) plot Related to plot object
Projects
Status: In Progress
Development

No branches or pull requests

9 participants