Guidance on using the tile_size parameter? #7

rsimon · 2023-06-13T13:35:53Z

Hi,

do you have any guidance on how to choose the tile_size parameter? I've been experimenting with our dataset of geo-coordinates (projected, EPSG 3857, with X/Y coordinate ranges between ~ [-20.000.000, 20.000.000]).

This seems to work in deepscatter in principle. But rendering is extremely slow, not like the other online demos I've seen. It also appears to render all our data at the initial zoom level. I then changed the tile_size to 1.000.000 and recreated the tileset. This seemed to display the initial render much faster, but then didn't appear to render a hiqher-quality image after zooming.

Another issue I observed: the inital viewport wasn't automatically adjusted to the coordinate range of the points. Rather, the points were crammed into a tiny corner. Easy to zoom into the correct region, of course. But I'm wondering if there's something I'm doing wrong with my coordinates?

The text was updated successfully, but these errors were encountered:

bmschmidt · 2023-06-13T23:01:48Z

As a rule I use a tile_size parameter of about 65,000. I don't know that there are major benefits to larger ones--the key sign that you need a larger tile size, in my mind, would be if the loading waterfall is always backlogged and transfers are spending much of their time in a state other than transferring data from server to client. It's relatively flexible, but there may be some cases where it breaks. (Off the top of my head there is one place where I allocate memory on the GPU in blocks of 64MB, which means that a tile_size of greater than 16 million might completely break things in deepscatter; I can't think of any others, but I also haven't tested much. Created an issue there)

If you expose the scatterplot as a window variable plot, you can see what deepscatter thinks the extent is by entering plot.dataset.extent. A miscalculated extent can indeed cause some problems, because it means the quadtree will be degenerate for several layers--rather than each tile having four populated children, it will have just one for several partitions.

My advice for now is I think the same as before--try populating not from CSV, but from a parquet file where you can be confident of types. It's possible, e.g., that I have some North America-specific character localization choices in the way it does the Arrow CSV parser. I don't know I've every built a quadtile larger than 20 million off of a CSV file, but I've done over 1 billion happily on parquet.

rsimon · 2023-06-15T09:32:51Z

Thanks for this. I converted to Parquet (using PyArrow). This did seem to improve things somewhat - and worked on the whole dataset! 🙏

But I still think I'm getting wrong behavior during tile rendering/zooming. Rendering still takes ages, and my machine freezes almost completely when rendering. (I'm attaching a screenshot below. Done in mid-render, but still after waiting for at least 5mins or so.)

I also logged the inferred extent, as you suggested. Deepscatter claims it's approx. x [ -3.433.116.378, 21.834.822.860 ], y [ -968.008.534, 4.070.507.694 ]. This is way off, but probably matches exactly with the intial viewport I'm seeing. Is there a way to fix the extent during vthe generation of the tileset?

rsimon · 2023-06-15T09:33:46Z

P.S.: for reference, I'm attaching a screenshot of the initial viewport. (It's the tiny orange dot, mid/left.)

rsimon · 2023-06-15T09:59:00Z

Ah - never mind, found it. It's the limits argument.

Ok, fixing the coordinate ranges yields much better results:

Rendering is still a bit slow (as you can see with the missing tiles in North America, and my machine is still half-frozen. But I assume that's not avoidable?

bmschmidt · 2023-06-15T10:29:19Z

There shouldn't be major problems rendering 100m points quickly. I'll send you a link to a 1.5b point map that should render fairly smoothly. I suspect the rendering problems and the limits problems are somehow related. If the limits are wrong, it will attempt to draw too many points at a time. Someone else has reported. similar bug in quadfeather this week I'm investigating.

Something else that can cause big performance degradation because of the way GPUs work is coincident points--it's possible adding some random noise with the --randomize parameter might help here. But I suspect something else is going on.

rsimon · 2023-06-15T10:49:28Z

Thanks, I'll try --randomize. But we shouldn't really have coincident points I think. At least not significant amounts...

bmschmidt · 2023-06-15T10:55:28Z

Are you running with tile_size 1m still? On further thought could conceivably slow things a great deal, because it could lead to many unnecessary points being processed on GPU.

rsimon · 2023-06-15T10:56:08Z

No, I went back to tilesize 50.000.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on using the tile_size parameter? #7

Guidance on using the tile_size parameter? #7

rsimon commented Jun 13, 2023

bmschmidt commented Jun 13, 2023

rsimon commented Jun 15, 2023

rsimon commented Jun 15, 2023

rsimon commented Jun 15, 2023

bmschmidt commented Jun 15, 2023

rsimon commented Jun 15, 2023

bmschmidt commented Jun 15, 2023

rsimon commented Jun 15, 2023

Guidance on using the tile_size parameter? #7

Guidance on using the tile_size parameter? #7

Comments

rsimon commented Jun 13, 2023

bmschmidt commented Jun 13, 2023

rsimon commented Jun 15, 2023

rsimon commented Jun 15, 2023

rsimon commented Jun 15, 2023

bmschmidt commented Jun 15, 2023

rsimon commented Jun 15, 2023

bmschmidt commented Jun 15, 2023

rsimon commented Jun 15, 2023