Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add World::try_clone() #372

Closed
wants to merge 7 commits into from
Closed

Add World::try_clone() #372

wants to merge 7 commits into from

Conversation

caspark
Copy link
Contributor

@caspark caspark commented May 16, 2024

I would like to clone World (see also #333), and as far as I can tell this is currently only possible in a roundabout way by using serialization, deserialization and the determinism patch from #364 (because I want the resulting world state to be identical so that subsequent simulation is deterministic, for rollback networking purposes).

So, this PR implements making World clonable.

In short:

  1. Users must register the types of all components used in the world into a new Cloner struct.
  2. The Cloner is passed to World::try_clone(). Most of World is copied using derived Clone implementations, but archetypes use the Cloner to bulk copy component data:
    a. For Copy components, a bitwise copy is made of all an archetype's data.
    b. For Clone components, each entity is copied one by one.
    c. (There's a straightforward future extension path here of allowing custom cloning strategies to allow users to e.g. turn structural sharing on/off, but I don't need that yet so I haven't added it.)
  3. If a type is not registered with the Cloner, an error capturing the unrecognized type is returned from World::try_clone(). Presumably most users will choose to unwrap it, but I figure it's nice to give folks the option of handling it.

(The implementation is based on the approach used by @adamreichold in rs-ecs, as he suggested in a comment on #332.)

I did see DynamicClone in bundle.rs, however that only clones a single instance of a type - this approach should be faster for Copy types - though I haven't done any speed tests.

I have written documentation and a few tests (& run them with miri), but I am new to this codebase and not experienced with unsafe Rust - so please check whether I have violated any invariants! Happy to adjust naming/etc as desired too.

Comment on lines +198 to +199
pending: self.pending.clone(),
free_cursor: AtomicIsize::new(self.free_cursor.load(Ordering::Relaxed)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be played safe and the concurrently modifiable state should not be cloned? For example, would pending updated applied to both worlds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I do not fully understand exactly what invariants entity allocation must uphold, but with that said:

  • I cloned this state in the name of getting a "bit-for-bit identical clone", so that entities are allocated in a consistent order and have the same generations (I inferred this was necessary from @Ralith at Determinism through serialization cycle #332 (comment))
  • Since World::try_clone() takes &mut self and Entities is pub(crate), I believe there's no way for this to open up any holes that lead to unsafety.
    • Perhaps it might be worth making World::try_clone() error if the World needs to be flushed (probably via replacing Entities::clone() with Entities::try_clone(), on the assumption that cloning a world that needs to be flushed is probably a mistake on the part of the user? (I don't understand exactly when World::flush() (which is pub) would be expected to be called by the user, so I can't really judge whether there's a valid use case for cloning an unflushed world)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the free_cursor and I do not fear undefined behaviour here, but rather surprising behaviour if there are pending updates which still need to be applied.

Perhaps it might be worth making World::try_clone() error if the World needs to be flushed (probably via replacing Entities::clone() with Entities::try_clone(), on the assumption that cloning a world that needs to be flushed is probably a mistake on the part of the user? (I don't understand exactly when World::flush() (which is pub) would be expected to be called by the user, so I can't really judge whether there's a valid use case for cloning an unflushed world)

Yes, I think erring out if the world has pending changes (and documenting that requirement) would be a more ergonomic API.

src/world.rs Outdated Show resolved Hide resolved
src/world.rs Outdated Show resolved Hide resolved
@Ralith
Copy link
Owner

Ralith commented May 16, 2024

Have you tried implementing this in terms of the public API instead, like the example serialization implementations do? The same primitives used to support column-major serialization should be suitable for efficient bulk copies. This may allow you to avoid introducing any new unsafety as well.

@caspark
Copy link
Contributor Author

caspark commented May 17, 2024

Pushed some more changes that should make CI go green. (Embarrassingly, I had run miri and format - but then made more changes and forgot to run them again!)

Re using the column based APIs: initially when I tried that path I didn't understand enough of how hecs worked yet, but I just tried again. Here's my implementation of cloning World using the public column-based APIs, suitable for copy-pasting into examples/: https://gist.github.com/caspark/62a092d3068ffba7781008b2828b5320

Shortcomings of using the column-based APIs (ordered from worst to okay):

  1. Entity allocator state is not preserved, so entity generations and entity allocation order can be different in the clone. Could potentially be worked around by implementing this in impl World, removing the logic to reserve entities in the cloned world and duplicating-much-of World::spawn_column_batch() into the clone function for World? I am not sure if that would work because I think (in theory) ArchetypeSet's hashmap might end up with different ordering.
  2. To copy a slice of Copy types in one go (instead of instance-by-instance), we'd need to add something like BatchWriter::extend(&mut, slice: &[T]) where T: Copy, which uses ptr::copy_nonoverlapping(). That looks doable, but BatchWriter currently operates in terms of IterMut<MaybeUninit<T>> - so I think that'd have to be rewritten to deal with pointers directly? (which'd still be some new/changed unsafe code, though likely a smaller chunk than Archetype::try_clone() in this PR).
    • Perhaps the current implementation in the gist does actually compile down to copying from the slice directly? Hard for me to say as I don't have much experience in inspecting that sort of thing.
  3. Since that approach operates on static types (rather than "dynamic" types as in this PR), each type needs to be specified directly, and specified again a second time. That's not a big deal in and of itself (can always provide/write a macro) but it slightly complicates fixing 1. above: if preserving entity allocator state means moving the gist's clone_world into impl World then that function would need to accept a CloningContext like the column based serialization does.

Overall, it seems like it would work with sufficient poking and prodding, but I am not sure whether it's a clear win or not: it'd have some unsafe code too, and preserving entity allocator state in that type of approach seems harder (both to implement for me, and for you as a guarantee to maintain into the future).

What do you think @Ralith ?

@Ralith
Copy link
Owner

Ralith commented May 17, 2024

Entity allocator state is not preserved, so entity generations and entity allocation order can be different in the clone.

This is an unrelated problem, and one we should address in separate work, e.g. #364.

To copy a slice of Copy types in one go (instead of instance-by-instance), we'd need to [...]

LLVM is very good at optimizing memcpy-like loops into a literal memcpy or similarly efficient code. Handling Copy types separately in the absence of disassembly and benchmarking is a premature optimization, especially if it complicates the API

Since that approach operates on static types (rather than "dynamic" types as in this PR), each type needs to be specified directly, and specified again a second time.

This can be easily avoided using type erasure, much like you do in the existing PR. Something like:

struct WorldCloner {
    registry: TypeIdMap<&'static dyn Fn(&Archetype, &mut ColumnBatchBuilder)>,
}

impl WorldCloner {
    pub fn register<T: Component + Clone>(&mut self) {
        self.registry.insert(TypeId::of::<T>(), &|src, dest| {
            let mut column = dest.writer::<T>().unwrap();
            for component in &*src.get::<&T>().unwrap() {
                _ = column.push(component.clone());
            }
        });
    }
}

@caspark
Copy link
Contributor Author

caspark commented Jul 6, 2024

Sorry for falling off the face of the earth for a while - I've been busy using my fork for my project so responding here kept sliding down the todo list.

  1. Re tackling preserving entity allocator state in a separate PR - sure, that seems reasonable.
  2. Re handling copy types separately being a premature optimization - okay, yep, fair.
  3. This seems plausible, though some private types need to be exposed and ColumnBatchType needs to gain add_dynamic().

I'll close this PR and raise a new PR to implement cloning as an example using the public API.

@caspark caspark closed this Jul 6, 2024
@caspark caspark mentioned this pull request Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants