Described in this document is the abstract overview of changes done by Folia. Folia splits the chunks within all loaded worlds into independently ticking regions so that the regions are ticked independently and in parallel. Described first will be intra region operations, and then inter region operations.
In order to ensure that regions are independent, the rules for maintaining regions must ensure that a ticking region has no directly adjacent neighbour regions which are ticking. The following rules guarantee the invariant is upheld:
- Any ticking region may not grow while it is ticking.
- Any ticking region must initially own a small buffer of chunks outside its perimeter.
- Regions may not begin to tick if they have a neighbouring adjacent region.
- Adjacent regions must eventually merge to form a single region.
Additionally, to ensure that a region is not composed of independent regions (which would hinder parallelism), regions composed of more than one independent area must be eventually split into independent regions when possible.
Finally, to ensure that ticking regions may store and maintain data about the current region (i.e tick count, entities within the region, chunks within the region, block/fluid tick lists, and more), regions have their own data object that may only be accessed while ticking the region and by the thread ticking the region. Also, there are callbacks to merging or splitting regions so that the data object may be updated appropriately.
The implementation of these rules is described by REGION_LOGIC.md.
The end result of applying these rules is that a ticking region can ensure that only the current thread has write access to any data contained within the region, and that at any given time the number of independent regions is close to maximum.
Intra region operations refer to any operations that only deal with data for a single region by the owning region, or to merge/split logic.
Independent regions tick independently and in parallel. To tick independently means that regions maintain their own deadlines for scheduling the next tick. For example, consider two regions A and B such that A's next tick start is at t=15ms and B's next tick start is at t=0ms. Consider the following sequence of events:
- At t = 0ms, B begins to tick.
- At t = 15ms, A begins to tick.
- At t = 20ms, B is finished its tick. It is then scheduled to tick again at t = 50ms.
- At t = 50ms, B begins its 2nd tick.
- At t = 70ms, B finishes its 2nd tick and is scheduled to tick again at t = 100ms.
- At t = 95ms, A finishes its first tick. It is scheduled to tick again at t = 95ms.
It is important to note that at no time was B's schedule affected by the fact that A fell behind its 20TPS target.
To implement the described behavior, each region maintains a repeating task on a scheduled executor (See SchedulerThreadPool) that schedules tasks according to an earliest start time first scheduling algorithm. The algorithm is similar to EDF, but schedules according to start time. However, given that the deadline for each tick is 50ms + the start time, it behaves identically to the EDF algorithm.
The EDF-like algorithm is selected so that as long as the thread pool is not maximally utilised, that all regions that take <= 50ms to tick will maintain 20TPS. However, the scheduling algorithm is neither NUMA aware nor CPU core aware - it will not make attempts (when n regions > m threads) to pin regions to certain cores.
Since regions tick independently, they maintain their own tick counters. The implications of this are described in the next section.
In standard Vanilla, there are several important tick counters: Current Tick, Game Time Tick, and Daylight Time Tick. The Current Tick counter is used for determining the tick number since the server has booted. The Game Time Tick is maintained per world and is used to schedule block ticks for redstone, fluids, and other physics events. The Daylight Time Tick is simply the number of ticks since noon, maintained per world.
In Folia, the Current Tick is maintained per region. The Game Time Tick is split into two counters: Redstone Time and Global Game Time. Redstone Time is maintained per region. Global Game Time and Daylight Time are maintained by the "global region."
At the start of each region tick, the global game time tick and daylight time tick are copied from the global region and any time the current region retrieves those values, it will retrieve from the copy received at the start of tick. This is to ensure that for any two calls to retrieve the tick number throughout the tick, that those two calls report the same tick number.
The global game time is maintained for a couple of reasons:
- There needs to be a counter representing how many ticks a world has existed for, since the game does track total number of days the world has gone on for.
- Significant amounts of new entity AI code uses game time (for a reason I cannot divine) to store absolute deadlines of tasks. It is not impossible to write code to adjust the deadlines of all of these tasks, but the amount of work is significant.
The global region is a single scheduled task that is always scheduled to run at 20TPS that is responsible for maintaining data that is not tied to any specific region: game rules, global game time, daylight time, console command handling, world border, weather, and others. Unlike the other regions, the global region does not need to perform any special logic for merging or splitting because it is never split or merged - there is only one global region at any time. The global region does not own any region specific data.
Since redstone and current ticks are maintained per region, there needs to be appropriate logic to adjust the tick deadlines used by the block/fluid tick scheduler and anything else that schedules by redstone/current absolute tick time so that the relative deadline is unaffected.
When merging a region x (from) into a region y (into or to), we can either adjust both the deadlines of x and y or just one of x and y. It is simply easier to adjust one, and arbitrarily the region x is chosen. Then, the deadlines of x must be adjusted so that considering the current ticks of y that the relative deadlines remain unchanged.
Consider a deadline d1 = from tick + relative deadline in region x. We then want the adjusted deadline d2 to be d2 = to tick + relative deadline in region y, so that the relative tick deadline is maintained. We can achieve this by applying an offset o to d1 so that d1 + o = d2, and the offset used is o = tick to - tick from. This offset must be calculated for redstone tick and current tick separately, since the logic to increase redstone tick can be turned off by the Level#tickTime field.
Finally, the split case is easy - when a split occurs, the independent regions from the split inherit the redstone/current tick from the parent region. Thus, the relative deadlines are maintained as there is no tick number change.
In all cases, redstone or any other events scheduled by current tick remain unaffected when regions split or merge as the relative deadline is maintained by applying an offset in the merge case and by copying the tick number in the split case.
Inter region refer to operations that work with other regions that are not the current ticking region that are in a completely unknown state. These regions may be transient, may be ticking, or may not even exist.
In order to assist in inter region operations, several utilities are provided. In NMS, these utilities are the EntityScheduler, the RegionizedTaskQueue, the global region task queue, and the region-local data provider RegionizedData. The Folia API has similar analogues, but does not have a region-local data provider as the NMS data provider holds critical locks and is invoked in critical areas of code when performing any callback logic and is thus highly susceptible to fatal plugin errors involving lengthy I/O or world state modification.
The EntityScheduler allows tasks to be scheduled to be executed on the region that owns the entity. This is particularly useful when dealing with entity teleportation, as once an entity begins an asynchronous teleport the entity cannot tick until the teleport has completed, and the timing is undefined.
The RegionizedTaskQueue allows tasks to be scheduled to be executed on the next tick of a region that owns a specific location, or creating such region if it does not exist. This is useful for tasks that may need to edit or retrieve world/block/chunk data outside the current region.
The global region task queue is simply used to perform edits on data that the global region owns, such as game rules, day time, weather, or to execute commands using the console command sender.
The RegionizedData class allows regions to define region-local data, which allow regions to store data without having to consider concurrent data access from other regions. For example, current per region entity/chunk/block/fluid tick lists are maintained so that regions do not need to consider concurrent access to these data sets.
The utilities allow various cross-region issues to be resolved in a
simple fashion, such as editing block/entity/world state from any region
by using tasks queues, or by avoiding concurrency issues by using
RegionizedData. More advanced operations such as teleportation,
player respawning, and portalling, all make use of these utilities
to ensure the operation is thread-safe.
Entities need special logic in order to teleport safely between other regions or other dimensions. In all cases however, the call to teleport/place an entity must be invoked on the region owning the entity. The EntityScheduler can be used to easily schedule code to execute in such a context.
In a simple teleportation, the entity already exists in a world at a location and the target location and dimension are known. This operation is split into two parts: transform and async place. In this case, the transform operation removes the entity from the current world, then adjusts the position. The async place operation schedules a task to the target location using the RegionizedTaskQueue to add the entity to the target dimension at the target position.
The various implementation details such as non-player entities being copied in the transform operation are left out, as those are not relevant for the high level overview.
Things such as player login and player respawn are generally considered simple teleportation. The player login case only differs since the player does not exist in any world at the start, and that the async transform must additionally find a place to spawn the player. The player respawn is similar to the player login as the respawn differs by having the player in the world at the time of respawn.
Portal teleport differs from simple teleportation as portalling does not know the exact location of the teleport. Thus, the transform step does not update the entity position, but rather a new operation is inserted between transform and async place: async search/create which is responsible for finding and/or creating the exit portal.
Additionally, the current Vanilla code can refuse a portal if the entity is non-player and the nether exit portal does not already exist. But since the portal location is only determined by the async place, it is too late to abort - so, portal logic has been re-done so that there is no difference between players and entities. Now both entities and players create exit portals, whether it be for the nether or end.
Since the teleport happens over multiple steps, the server shutdown process must deal with uncompleted teleportations manually.
The shutdown process occurs by spawning a separate shutdown thread, which then runs the shutdown logic:
- Shutdown the tick region scheduler, stopping any further ticks
- Halt metrics processing
- Disable plugins
- Stop accepting new connections
- Send disconnect (but do not remove) packets to all players
- Halt the chunk systems for all worlds
- Execute shutdown logic for all worlds by finish all pending teleports for all regions, then saving all chunks in the world, and finally saving the level data for the world (level.dat and other .dat files).
- Save all players
- Shutting down the resource manager
- Releasing the level lock
- Halting remaining executors (Util executor, region I/O threads, etc)
The important differences to Vanilla is that the player kick and world saving logic is replaced by steps 5-8.
For step 5, the players cannot be kicked before teleportations are finished, as kicking would save the player dat file. So, save is moved after.
For step 6, the chunk system halt is done before saving so that all chunk generation is halted. This will reduce the load on the server as it shuts down, which may be critical in memory-constrained scenarios.
For step 7, teleportations are completed differently depending on the type: simple or portal.
Simple teleportations are completed by forcing the entity being teleported to be added to the entity chunk specified by the target location. This allows the entity to be saved at the target position, as if the teleportation did complete before shutdown.
Portal teleportations are completed by forcing the entity being teleported to be added to the entity chunk specified from where the entity teleported from. Since the target location is not known, the entity can only be placed back at the origin. While this behavior is not ideal, the shutdown logic must account for any broken world state - which means that finding or create the target exit portal may not be an option.
The teleportation completion must be performed before the world save so that the teleport completed entities save.
For step 8, only save players after the teleportations are completed.
The remaining steps are Vanilla.