Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubZero across different clusters #31

Open
PotatoSpud opened this issue Dec 13, 2018 · 4 comments
Open

SubZero across different clusters #31

PotatoSpud opened this issue Dec 13, 2018 · 4 comments

Comments

@PotatoSpud
Copy link

Using SubZero across different Hazelcast clusters has problems. The serialization ids used on one cluster will not be consistent with that of another. I have got around this issue by creating my own KryoStrategy's that use an id that is generated from the fully qualified classname.
It gets tricky for two reasons though:

  1. Templated classes can be a challenge but there is a way.
  2. Need to avoid those ids that Hazelcast uses internally.

I can attempt a fork, if there is an interest in this solution.

Best regards
Aongus

@jerrinot
Copy link
Owner

hi, what's your about the strategy to generate unique IDs from classnames?

@PotatoSpud
Copy link
Author

This is not perfect as the hashs may not be completely unique. However, the chances of a clash are very low.
So I added a new version of TypedKryoStrategy and GlobalKryoStrategy as follows:

public class IndigoTypedKryoStrategy<T> extends KryoStrategy<T> {

    private final Class<T>       clazz;
    private final UserSerializer userSerializer;

    public IndigoTypedKryoStrategy(final Class<T> clazz, final UserSerializer registrations) {
        this.clazz = clazz;
        this.userSerializer = registrations;
    }

    @Override
    public void registerCustomSerializers(final Kryo kryo) {
        this.userSerializer.registerSingleSerializer(kryo, this.clazz);
    }

    @Override
    void writeObject(final Kryo kryo, final Output output, final T object) {
        kryo.writeObject(output, object);
    }

    @Override
    T readObject(final Kryo kryo, final Input input) {
        return kryo.readObject(input, this.clazz);
    }

    @Override
    public int newId() {
        return HashUtil.serializionIdHash(this.clazz.getName());
    }
}
public class IndigoGlobalKryoStrategy<T> extends KryoStrategy<T> {
    private final UserSerializer userSerializer;
    private final int            id;
    private static final String  GLOBAL = "global";

    public IndigoGlobalKryoStrategy(final UserSerializer registrations) {
        this.userSerializer = registrations;
        String identifier = GLOBAL;
        try {
            final Type sooper = this.getClass().getGenericSuperclass();
            final Type t = ((ParameterizedType) sooper).getActualTypeArguments()[0];
            identifier = t.getTypeName();
        } catch (final Exception e) { /** fall through */
        }
        this.id = HashUtil.serializionIdHash(identifier);
    }

    @Override
    public void registerCustomSerializers(final Kryo kryo) {
        this.userSerializer.registerAllSerializers(kryo);
    }

    @Override
    void writeObject(final Kryo kryo, final Output output, final T object) {
        kryo.writeClassAndObject(output, object);
    }

    @SuppressWarnings("unchecked")
    @Override
    T readObject(final Kryo kryo, final Input input) {
        return (T) kryo.readClassAndObject(input);
    }

    @Override
    public int newId() {
        return this.id;
    }
}

The MurmurHash3_x86_32 algo was lifted from Hazelcast itself but any decent hash would do the work:

public class HashUtil {
    public static int serializionIdHash(final String text) {
        final byte[] bytes = text.getBytes();
        int hash = HashUtil.MurmurHash3_x86_32(bytes, 0, bytes.length);
        // Avoid Hazelcast's internal registrations and our own space
        if ((hash > -400) && (hash < 100)) {
            hash += 500;
        }
        return hash;
    }
}

Hope this helps
Aongus

@jerrinot
Copy link
Owner

jerrinot commented Jan 15, 2019

@PotatoSpud: I am not crazy about the probabilistic nature of this. It smells like a birthday paradox to me - the chance of a conflict increases quite fast as the number of classes is growing.

Is there any better way? Maybe a strategy with hard-coded IDs for well-known classes (think of JDK classes) and then a combination of:

  1. Explicit ID configuration for custom domain classes
  2. using FQDN for classes without explicit ID assignment

Any other idea?

@PotatoSpud
Copy link
Author

@jerrinot: Agreed, it is bound to create problems using my above approach.

For your suggestions:

  1. This may help reduce the string length and significantly reducing the possibility of a clash. Is this what you had in mind?
  2. Not sure how this would work exactly. Conceptually FQDN should be fine.

If you can get away from explicit ID(int) assignment, you are half way home. I understand more clearly how the serialization works, it is the de-serialization that is confounding. The class names must be presented again I assume so that subsequent IDs can be understood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants