Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell.boundary method slowness #7

Open
aaime opened this issue Sep 2, 2020 · 5 comments
Open

Cell.boundary method slowness #7

aaime opened this issue Sep 2, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@aaime
Copy link

aaime commented Sep 2, 2020

I'm trying to use the "boundary" method of Cell to get a better outline for dart and skew_quad cells, while displaying a map with cells, cell parents, and parent parents.

I figured I'd need 10 points per side in order for the parents and the "grandsons" to align, which seems to be the case. However, calling boundary is a lot slower than calling vertices, to the point that map display it not interactive any longer.

For reference, this map, produced using vertices(False), takes 2 seconds, which is slow but still acceptable for an output of this size (click to see full size):
image

This one instead, produced using boundary(10, False), requires 14 seconds, which is too much even for a large output:
image

I've reduced it to a pure python script and tested the results with "time", to make sure it was not the GeoServer Java code calling onto the interpreter, slowing down things.

Vertices test

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
cell = dggs.cell(('S', 2, 2, 4))
for i in range(1, 10000):
  list(cell.vertices(False))
 time python3 /tmp/test_vertices.py 

real	0m14,944s
user	0m15,375s
sys	0m1,694s

Boundary test

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
cell = dggs.cell(('S', 2, 2, 4))
for i in range(1, 10000):
  list(cell.boundary(10, False))

And output:

>time python3 /tmp/test_boundary.py 

real	1m14,551s
user	1m14,906s
sys	0m1,751s

Boundary does more work, so it's normal that it would be slower... but the overhead seem to be too big.

Side note, it would probably interesting to compromise and call the boundary method only on skew_quad and dart cells, but I don't see a method to determine the type of the cell. Is there any obvious test that can be performed to lean about the nature of the cell? I'm guessing that maybe it's just a matter of checking if the top-most parent is either N or S? (edit, this approach seems to work, reduces the time of the "accurate" map down to 6 seconds... which is still too much).

@rggibb
Copy link
Collaborator

rggibb commented Sep 2, 2020

I would stay with the vertices. There will be a simple algorithm to select which vertices from resoution n are on the boundary of resolution (n-1) or (n-2) etc. I'll give some thought to this & post another reply :).

There will also be a simple algorithm to determine whether the resulting vertices can be pruned because they form a straight line - essentially iso-latitudinal and iso-longitudinal boundaries are straight in lat, long space, as you note in your side-note, that means only the dart and skew_quad cells need to be processed.

Cell shape can be determined with the function cell.ellipsoidal_shape(self) returns the shape of the cell as one of (‘quad’, ‘cap’, ‘dart’, or ‘skew_quad’) when viewed on the ellipsoid. If you want to do some bulk sorting of which cells need to be called, then in a (lat,long) coordinate space:

  • O, P, Q & R faces only have quad cells, which have four straight boundaries - either iso-latitudnal or iso-longitudinal,
  • N & S faces all north and south cell boundaries are iso-latitudinal and therefore straight,
  • N & S dart cells and skew-quad cells have a pair of east and west boundaries that need to be interpolated, and
  • N & S cap cells have no boundaries needing to be interpolated since all their boundaries are either south or north iso-latitudinal boundaries respectively and are therefore all straight.

@rggibb
Copy link
Collaborator

rggibb commented Sep 3, 2020

@aaime OK, here is some pseudo-code. I havent developed the full code, because I'm not sure whether you will code it up in python or java.

As noted above, quad and cap cells, dont need further detail, they can just be drawn using their own vertices.

Noting the vertices for a cell C are returned with C.vertices().

To refine the boundaries of skew_quad cells using the vertices of its chjildren & noting that northern and southern bounadries don't need refining - the vertices we need from the chiildren are shown in the sketch as o:

C:
o-----+-----+-----o
|     |     |     |
|     |     |     |
|     |     |     |
o-----+-----+-----o
|     |     |     |
|  C3 |     | C5  |
|     |     |     |
o-----+-----+-----o
|     |     |     |
|     |     |     |
|     |     |     |
o-----+-----+-----o

In pseudo-code going counterclockwise from the nw or top-left vertex, these are:

c-vert-list = C.vertices(False);`
print (c-vert-list[0,1], 
       C.subcells()[5].vertices(False)[1,2], 
       c-vert-list[2,3], 
       Csubcells()[3].vertices(False)[3,0], 
       c-vert-list[0];
      )

This doesn't work directly in python because indexing for lists, arrays & dictionaries aren't the same thing, and also you would want all the elements to be at the same level of the list in the result & not nested as shown above, but my notation emphasizes that with suitable list/array element management, only three calls to vertices are required, one for the 4x vertices of C, one for the eastern vertices of C5 & one for the western vertices of C3.

To get the next l;evel of detail from the grandchildren, we need the vertices marked x below in addition to those marked o which we already have:

C:
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C03    |       |    C25|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C33    |       |    C55|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o
|       |       |       |
x   -   +       +   -   x
|C63    |       |    C85|
x   -   +       +   -   x
|       |       |       |
o-------+-------+-------o

In pseudo-code going counterclockwise from the nw or top-left vertex, these are :

c-vert-list = C.vertices(False);`
c5-vert-list = C.subcells()[5].vertices(False);`
c3-vert-list = C.subcells()[3].vertices(False);`
print (c-vert-list[0,1], 
       C.subcells()[2].subcells()[5].vertices()[1,2],
       c5-vert-list[1], 
       C.subcells()[5].subcells()[5].vertices()[1,2],
       c5-vert-list[2], 
       C.subcells()[8].subcells()[5].vertices()[1,2],
       c-vert-list[2,3], 
       C.subcells()[6].subcells()[3].vertices()[3,0],
       c3-vert-list[3], 
       C.subcells()[3].subcells()[3].vertices()[3,0],
       c3-vert-list[0], 
       C.subcells()[0].subcells()[3].vertices()[3,0],
       c-vert-list[0];
      )

So an additional six calls to vertices() for grandchildren.

For dart cells, we only need two sides to be refined, but those two sides will be neighbours instead of being on opposite sides. There isn't a function that can be called to quickly determine which two boundaries need to be processed, and judging by the code to determine whether neighbors are 'east' or 'west' it wont be trivial. So instead, noting that the boundaries that don't need processing are iso-latitudinal, a quick check of the latidudes for the vertices can be used. The following pseudo-code presumes that vertex[0] is a latitude.

c-vert-list = C.vertices(False);
c-refined-list[]
for v in [0, 1, 2, 3]:
 case 0: 
  c-refined-list[].append(c-vert-list[0])
  if c-vert-list[0][0] <> c-vert-list[1][0]
    c-refined-list[].append(C.subcells()[1].vertices(False)[0,1])

  case 1:
  c-refined-list[].append(c-vert-list[1])
  if c-vert-list[1][0] <> c-vert-list[2][0]
    c-refined-list[].append(C.subcells()[5].vertices(False)[1,2])

  case 2:
  c-refined-list[].append(c-vert-list[2])
  if c-vert-list[2][0] <> c-vert-list[3][0]
    c-refined-list[].append(C.subcells()[7].vertices(False)[2,3])

  case 3:
  c-refined-list[].append(c-vert-list[3])
  if c-vert-list(3][0] <> c-vert-list[0][0]
    c-refined-list[].append(C.subcells()[3].vertices(False)[3,0])

  c-refined-list[].append(c-vert-list[0])

Obviosly, this can be extended to grandchildren by iterating down a resolution level in a similar fashion to the code for skew_quad cells.

@aaime
Copy link
Author

aaime commented Sep 3, 2020

Tried it out, without good success.
Came out with this bit of python code that I was then invoking from Java, grabbing the value of "result" in the end.
Just added a few prints around to inspect intermediate results:

from rhealpixdggs import dggs, ellipsoids
from rhealpixdggs.ellipsoids import Ellipsoid
from rhealpixdggs.dggs import RHEALPixDGGS, Cell

WGS84_TB16 = Ellipsoid(a=6378137.0, b=6356752.314140356, e=0.0578063088401, f=0.003352810681182, lon_0=-131.25)
dggs = RHEALPixDGGS(ellipsoid=WGS84_TB16, north_square=0, south_square=0, N_side=3)
id = ('N', 3)
c = dggs.cell(id)

vertices = list(c.vertices(False))
print(vertices)
c5_vertices = list(dggs.cell(id + (5,)).vertices(False))
print(c5_vertices)
c3_vertices = list(dggs.cell(id + (3,)).vertices(False))
print(c3_vertices)
c25_vertices = list(dggs.cell(id + (2, 5)).vertices(False))
c55_vertices = list(dggs.cell(id + (5, 5)).vertices(False))
c85_vertices = list(dggs.cell(id + (8, 5)).vertices(False))
c63_vertices = list(dggs.cell(id + (6, 3)).vertices(False))
c33_vertices = list(dggs.cell(id + (3, 5)).vertices(False))
result = (vertices[0], vertices[1],
       c25_vertices[1], c25_vertices[2],
       c5_vertices[1], 
       c55_vertices[1], c55_vertices[2],
       c5_vertices[2], 
       c85_vertices[1], c85_vertices[2],
       vertices[2], vertices[3], 
       c63_vertices[3], c63_vertices[0],
       c3_vertices[3], 
       c33_vertices[3], c33_vertices[0],
       c3_vertices[3], c3_vertices[0], 
       vertices[0]
      )
del c5_vertices
del c3_vertices
del c25_vertices
del c55_vertices
del c63_vertices
del c33_vertices
print(result)

The result is not a continuous boundary, but something jumping around. Trying to figure out why, I had a better look at how the vertices and children are enumerated.
Vertices seem to always be in "top/left, top/right, bottom/right, bottom/left" order, no matter what cell they are called upon.
However, check in this picture the children of N1, N3 and N7:

image

Children do not enumerate in a consistent "cartesian" order over the polar caps, N10 is bottom right, N30 is bottom left, and N70 is top left. They do enumerate instead in a consistent order in the "middle belt", O, P, Q, R.

In the end, it probably does not matter much for the goal at hand. I've timed the results anyways (the CPU effort is related to grabbing the coordinates, not their particular order), and the timings are the same as calling boundary(10, False), so even getting the correct order would not end up improving performance, which was my original goal.

@rggibb
Copy link
Collaborator

rggibb commented Sep 3, 2020

You are quite right the cell's arent ordered in cartesian order, they are ordered on dggs face/zone order, and the top level faces are the sides of a cube.. If you unfold it onto a flat plane it looks like this:

17-1

The Nth and Sth square's letters are shown rotated in the vacant space, Then the zone numbering within each face follows a Z- space-filing curve:

17-2

and if we take it one more iteration and view it on a sphere looking at it from the NOP corner of the cube, it looks like this:

07-0

@rggibb
Copy link
Collaborator

rggibb commented Sep 4, 2020

Ok, in terms of the there being no advantage in compute timing for boundary with 10 points per edge, & getting the vertices. Possible strategies include:

  • am I corrct in saying that boundaries are computed per zone. So that since all boundaries are shared by two zones, then all boundaries are computed twice - with the exception of the boundaries on the edghes of the FOV,
    • this is inherant in the 'shapefile' structure for polygons but is there a way of optimising that in geoserver?
    • if geoserver can't optimise it we could look at optimising it in rhealpix. For instance cell.vertices, could be extended to provide all the vertices of the children or grandchildren etc of a designated cell. The call could return the result per zone for easy use by geoserver, but it would only compute each shared vertex once - which is actually a reduction in compute over-head by a factor of four.
  • in the pseudo-code above, most of the calls to C.vertices() only actually use two of the returned vertices - so the computation time for the other two vertices is wasted, we could have a default behaviour of returing all vertices, but also allow the caller to specify which vertices are wanted - maybe using the 0..3 index, with 0 being the nw_vertex and counting clockwise round the boarder.

@aaime ..thoughts?

@alpha-beta-soup alpha-beta-soup added the enhancement New feature or request label Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants