Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long coordinates #72

Open
eklinger opened this issue Nov 21, 2017 · 6 comments
Open

Long coordinates #72

eklinger opened this issue Nov 21, 2017 · 6 comments
Labels

Comments

@eklinger
Copy link

Hi,
I would love to use the RTree as an interval tree but would need int or long coordinates rather than float. Is there any reason it doesn't support long type? Thanks!

@ConradGroth
Copy link

ConradGroth commented Dec 15, 2017

Hi David,

we are currently evaluating your RTree implementation and found the API very nice, especially the rx java return types and the immutability, which lets us query the tree under parallel writes. Unfortunately the Rectangle internally uses floats, but we need double precision. Is there a reason for using floats? If it’s because of the memory consumption, is it maybe possible to provide another Rectangle with double precision?

Hi Conrad
Thanks for your interest in the project. I'm glad the reactive and immutability aspects are proving attractive.

The choice of float over double was made to reduce memory consumption, yes. Can you describe your use case to me a little so that I can understand why you need double precision? I'm a little surprised that you might need it because you can use the float representation as a first cut if you like in terms of searching and then refine as you like but including the full precision values in the value of the Entry (as opposed to the geometry). Are you referring to the Entry geometry requiring double precision or a search rectangle (or both)?

By the way, I much prefer having this sort of conversation in an issue on the github site so it turns up on searches for other people and so others can participate in the discussion if they want. Of course if you want to retain some confidentiality in this matter that's ok too.

Cheers
Dave

Thanks for you quick response.

I cannot tell you all the details about our data model, but one axis of our rectangles is the time in milliseconds and float reduces that to 24 usable bits (for the mantissa), which means a maximum of 2 hours between the lowest and the highest value in milliseconds precision.

I already found the issue #72 which is exactly what we need. I will add some information there too, to let other interested users participate in the discussion.

Regards
Conrad

@davidmoten
Copy link
Owner

davidmoten commented Dec 15, 2017

@eklinger @ConradGroth I'm happy to explore supporting other primitive types explicitly like long/double/integer. I'll have a look at the code base to see what's involved.

In the meantime, there are workarounds that may still give you what you need. When you add items to the r-tree you may not lose precision in a significant way. 2^23 bits still gives a grid of 8 million x 8 million to play with. Is that grid size insufficient for your use cases?

Just use the r-tree as a coarse search index and then refine with full resolution. Here's an example,

class Thing {
    long x1, y1, x2, y2;
    String content;
}
//build
RTree<Thing, Rectangle> tree = RTree.create();
tree = tree.add(
    new Thing(x1, y1, x2, y2, content), 
    Geometries.rectangle(floor(x1), floor(y1), ceil(x2), ceil(y2)));

//search
long xx1,xx2,yy1,yy2;
tree
    .search(Geometries.rectangle(floor(xx1), floor(yy1), ceil(xx2), ceil(yy2))) 
    .filter(entry -> gte(entry.value().x1, xx1) 
                    && lte(entry.value().x2, xx2) 
                    && gte(entry.value().y1, yy1) 
                    && lte(entry.value().y2, yy2))
  ...

I have not written these functions:

  • floor(x) returns largest float less than or equal to the long value x
  • ciel(x) returns smallest float greater than or equal to the long value x
  • gte(a, b) returns true if a is greater than or equal to b where a is float and b long
  • lte(a, b) returns true if a is less than or equal to b where a is float and b long

Most likely, BigDecimal would be used to implement these functions.

@davidmoten
Copy link
Owner

Here are the functions:

    private static final MathContext FLOOR = new MathContext(7, RoundingMode.FLOOR);
    private static final MathContext CEILING = new MathContext(7, RoundingMode.CEILING);

    public static float floor(long x) {
        return new BigDecimal(x).round(FLOOR).floatValue();
    }

    public static float ceil(long x) {
        return new BigDecimal(x).round(CEILING).floatValue();
    }

    public static boolean gte(float a, long b) {
        return new BigDecimal(a).compareTo(new BigDecimal(b)) >=0;
    }
    ...

@davidmoten
Copy link
Owner

davidmoten commented Dec 16, 2017

I cannot tell you all the details about our data model, but one axis of our rectangles is the time in milliseconds and float reduces that to 24 usable bits (for the mantissa), which means a maximum of 2 hours between the lowest and the highest value in milliseconds precision.

@ConradGroth just thought I'd mention that the full range of long values represents an enormous range of time. If you map a realistic sub-range of time (say 20 years?) on to float you may get sufficient precision especially with the technique above. A quick estimate tells me that you could end up with 10 seconds between neighbouring float values for a 20 year range.

Mapping your axes to more compact ranges is a good idea also because euclidean distance is used to split rectangles by the algorithm and you want the axes equally weighted in that distance measurement if possible.

@davidmoten
Copy link
Owner

I've just about completed double support for all geometries including flatbuffers serialization.

I'm not planning on supporting axes of other types because of the plethora of combinations which would involve an explosion of types (RectangleFloatDouble, RectangleDoubleFloat, RectangleLongFloat, RectangleFloatLong, Circle*,Line*, Point*, etc..). Let me know if double doesn't suit your needs.

There are breaking changes in the API but are reasonably small (all geometries return their coordinates as double now regardless of how they are stored (float or double)). When you want a Geometry instance that supports double precision just pass double parameters to the Geometries creation method:

// create a double precision rectangle
Rectangle r = Geometries.rectangle(1.0, 1.0, 2.0, 2.0);

// create a single precision circle
Rectangle r = Geometries.circle(1.0f, 1.0f, 5.0f);

An interesting side-effect is that performance has improved (on my i5 laptop at least) with the switch to using double for all calculations. I'll add some details on perf improvements soon.

I've released 0.8.1beta to Maven Central if you'd like to trial @eklinger @ConradGroth .

@ConradGroth
Copy link

ConradGroth commented Jan 5, 2018

Thanks for your answers. Indeed the internal usage of float is no problem, if you do an exact filtering after searching. Also the methods floor and ceil are not really necessary, as we can rely on the rounding which is done by the JVM during conversion from double/long to float, because the coordinates of the search rectangle are rounded the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants