From 503737dab03df23a197d5868ba74a400c4d5c82e Mon Sep 17 00:00:00 2001 From: Michael Cuevas Date: Thu, 18 Jul 2024 14:56:24 -0700 Subject: [PATCH] format: apply lints to markdown files Summary: # Context I was really annoyed w/ the variable formatting of Markdown files. I decided to apply formatting to all the markdown files to make things consistent. # This diff Formats all the markdown files to be consistent. The next diff will enable an option in linttool to enforce formatting in all Markdown files under `eden/fs/**/*` Reviewed By: zertosh Differential Revision: D59930918 fbshipit-source-id: 20964f531fbe6be919e8cc391caf148d5c107ae1 --- eden/fs/benchmarks/README.md | 6 +- eden/fs/benchmarks/language/README.md | 6 +- eden/fs/docs/Caching.md | 21 +- eden/fs/docs/Data_Model.md | 70 ++- eden/fs/docs/Futures.md | 84 ++- eden/fs/docs/Globbing.md | 16 +- eden/fs/docs/Glossary.md | 121 ++--- eden/fs/docs/InodeLifetime.md | 188 ++++--- eden/fs/docs/InodeLocks.md | 85 ++- eden/fs/docs/InodeStorage.md | 148 +++--- eden/fs/docs/Inodes.md | 191 ++++--- eden/fs/docs/Overview.md | 97 ++-- eden/fs/docs/Paths.md | 107 ++-- eden/fs/docs/Process_State.md | 231 ++++---- eden/fs/docs/Redirections.md | 78 ++- eden/fs/docs/Rename.md | 64 +-- eden/fs/docs/Takeover.md | 168 +++--- eden/fs/docs/Threading.md | 41 +- eden/fs/docs/Windows.md | 185 +++---- eden/fs/docs/WindowsFsck.md | 491 +++++++++--------- eden/fs/docs/img/README.md | 6 +- eden/fs/docs/slides/Checkout.md | 73 ++- eden/fs/docs/stats/DynamicStats.md | 156 ++++-- eden/fs/docs/stats/EdenStats.md | 341 ++++++------ eden/fs/docs/stats/LocalStoreStats.md | 15 +- eden/fs/docs/stats/ObjectStoreStats.md | 30 +- eden/fs/docs/stats/OverlayStats.md | 68 +-- .../fs/docs/stats/SaplingBackingStoreStats.md | 103 ++-- eden/fs/docs/stats/Stats.md | 4 +- eden/fs/monitor/README.md | 28 +- 30 files changed, 1636 insertions(+), 1586 deletions(-) diff --git a/eden/fs/benchmarks/README.md b/eden/fs/benchmarks/README.md index 7017c3745b87f..0b5fe0fe850c4 100644 --- a/eden/fs/benchmarks/README.md +++ b/eden/fs/benchmarks/README.md @@ -1,5 +1,5 @@ # "Macro" Benchmarks -This directory contains benchmarks of EdenFS through its filesystem -and Thrift APIs. Several of these benchmarks allow comparison of -EdenFS's performance to native filesystems. +This directory contains benchmarks of EdenFS through its filesystem and Thrift +APIs. Several of these benchmarks allow comparison of EdenFS's performance to +native filesystems. diff --git a/eden/fs/benchmarks/language/README.md b/eden/fs/benchmarks/language/README.md index 4d537ec43bf06..9541f9e584861 100644 --- a/eden/fs/benchmarks/language/README.md +++ b/eden/fs/benchmarks/language/README.md @@ -1,5 +1,5 @@ # C++ Language Benchmarks -Sometimes it's useful to microbenchmark the compiler and standard -library itself. These microbenchmarks allow us to compare fundamental -costs across operating systems, compilers, and standard libraries. +Sometimes it's useful to microbenchmark the compiler and standard library +itself. These microbenchmarks allow us to compare fundamental costs across +operating systems, compilers, and standard libraries. diff --git a/eden/fs/docs/Caching.md b/eden/fs/docs/Caching.md index 8a9c3e99906a6..6bb983186354a 100644 --- a/eden/fs/docs/Caching.md +++ b/eden/fs/docs/Caching.md @@ -1,5 +1,4 @@ -Caching in Eden -=============== +# Caching in Eden [This captures the state of Eden as of November, 2018. The information below may change.] @@ -32,10 +31,10 @@ quick succession, and reloading the blob each time would be inefficient. The design of this cache attempts to satisfy competing objectives: -* Minimize blob reloads under Eden's various access patterns -* Fit in a mostly-capped memory budget -* Avoid performance cliffs under pathological access patterns -* Maximize memory available to the kernel's own caches, since they have the +- Minimize blob reloads under Eden's various access patterns +- Fit in a mostly-capped memory budget +- Avoid performance cliffs under pathological access patterns +- Maximize memory available to the kernel's own caches, since they have the highest leverage. The cache has a maximum size (default 40 MiB as of this writing), and blobs are @@ -50,7 +49,7 @@ experimentation. One interesting aspect of the blob cache is that Eden has a sense of whether a request is likely to occur again. For example, if the kernel does not support caching readlink calls over FUSE, then any symlink blob should be kept in Eden's -cache until evicted. If the kernel *does* cache readlink, then the blob can be +cache until evicted. If the kernel _does_ cache readlink, then the blob can be released as soon it's been read, making room for other blobs. A more complicated example is that of a series of reads across a large file. @@ -61,11 +60,11 @@ blob, Eden evicts the blob from its cache. Blobs are evicted from cache when: -* The blob cache is full and exceeds its minimum entry count. -* The blob has been read by the kernel and the kernel cache is populated. -* A file inode is materialized and future requests will be satisfied by the +- The blob cache is full and exceeds its minimum entry count. +- The blob has been read by the kernel and the kernel cache is populated. +- A file inode is materialized and future requests will be satisfied by the overlay. -* The kernel has evicted an inode from its own inode cache after reading some of +- The kernel has evicted an inode from its own inode cache after reading some of the blob. ## Blob Metadata diff --git a/eden/fs/docs/Data_Model.md b/eden/fs/docs/Data_Model.md index f2207bd084243..cf4b6838ba005 100644 --- a/eden/fs/docs/Data_Model.md +++ b/eden/fs/docs/Data_Model.md @@ -1,32 +1,30 @@ -Data Model -========== +# Data Model EdenFS is designed to serve file and directory state from an underlying source -control system. In order to do this, it has two parallel representations of the +control system. In order to do this, it has two parallel representations of the state: one that tracks the original immutable source control state, and one that tracks the current mutable file and directory structure being shown in the checkout. -Source Control Model -==================== +# Source Control Model EdenFS's model of source control state mimics the model used by -[Git](https://git-scm.com/) and EdenSCM. The source control repository is -viewed as an object storage system with 3 main object types: commits, trees -(aka directories), and blobs (aka files). +[Git](https://git-scm.com/) and EdenSCM. The source control repository is viewed +as an object storage system with 3 main object types: commits, trees (aka +directories), and blobs (aka files). The Git documentation has an [in-depth overview of the object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects). EdenFS expects to be able to look up objects by ID, where an object ID is an -opaque 20-byte key. In practice, both Git and EdenSCM are content-addressed +opaque 20-byte key. In practice, both Git and EdenSCM are content-addressed object stores, where the object IDs are computed from the object contents. However, EdenFS does not strictly care about this property, and simply requires being able to look up an object from its ID. These 3 types of objects are chained together in a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) to allow -representing the full commit history in a repository. Each commit contains the +representing the full commit history in a repository. Each commit contains the ID(s) of its parent commit(s), the ID of the tree that represents its root directory, plus additional information like the commit message and author information. @@ -35,52 +33,50 @@ information. Commit objects are referenced by variable-width identifiers whose meaning is defined by the concrete BackingStore implementation. For example, in Mercurial -and Git, they're 20-byte binary (40-byte hex) strings. Each mount remembers -its parent root ID across EdenFS restarts. +and Git, they're 20-byte binary (40-byte hex) strings. Each mount remembers its +parent root ID across EdenFS restarts. -Tree objects represent a directory and contain a list of the directory -contents. Each entry in the directory has the name of the child entry as well -as the object ID, which refers either to another tree object for a subdirectory -or to a blob object for a regular file. Each entry also contains some -additional information, such as flags tracking whether the entry is a file or -directory, whether it is executable, etc. +Tree objects represent a directory and contain a list of the directory contents. +Each entry in the directory has the name of the child entry as well as the +object ID, which refers either to another tree object for a subdirectory or to a +blob object for a regular file. Each entry also contains some additional +information, such as flags tracking whether the entry is a file or directory, +whether it is executable, etc. Additionally, tree entry objects can also contain information about the file -size and hashes of the file contents. This allows EdenFS to efficiently -respond to file attribute requests without having to fetch the entire blob data -from source control. Note that these fields are not present in Git's object -model, but are available when the underlying data is fetched from an EdenSCM -Mononoke server. +size and hashes of the file contents. This allows EdenFS to efficiently respond +to file attribute requests without having to fetch the entire blob data from +source control. Note that these fields are not present in Git's object model, +but are available when the underlying data is fetched from an EdenSCM Mononoke +server. ![Example Tree Object](img/tree_object.svg) -The blob type is the final object type and is the simplest. The blob object -type simply contains the raw file contents. Note that blob objects are used to -represent both regular files as well as symbolic links. For symbolic links, the +The blob type is the final object type and is the simplest. The blob object type +simply contains the raw file contents. Note that blob objects are used to +represent both regular files as well as symbolic links. For symbolic links, the blob contents are the symlink contents. ![Example Blob Object](img/blob_object.svg) EdenFS's classes representing these source control objects can be found in the -[`eden/fs/model`](../model) directory. The `Tree` class represents a source +[`eden/fs/model`](../model) directory. The `Tree` class represents a source control tree, and the `Blob` class represents a source control blob. Note that EdenFS is primarily concerned about showing the current working -directory state, and this mainly only requires using Tree and Blob objects. In +directory state, and this mainly only requires using Tree and Blob objects. In general, EdenFS does not need to process source control history related operations, and therefore does not deal much with commit objects. +# Parallels with the Inode State -Parallels with the Inode State -============================== - -The classes in `eden/fs/model` represent source control objects. These objects +The classes in `eden/fs/model` represent source control objects. These objects are immutable, as once a commit is checked in to source control it cannot be modified, only updated by a newer commit. In order to represent the current file and directory state of a checkout, EdenFS -has a separate set of inode data structures. These generally parallel the -source control model data structures: a `TreeInode` represents a directory, and -its contents may be backed by a `Tree` object loaded from source control. A -`FileInode` represents a file, and its contents may be backed by a `Blob` -object loaded from source control. +has a separate set of inode data structures. These generally parallel the source +control model data structures: a `TreeInode` represents a directory, and its +contents may be backed by a `Tree` object loaded from source control. A +`FileInode` represents a file, and its contents may be backed by a `Blob` object +loaded from source control. diff --git a/eden/fs/docs/Futures.md b/eden/fs/docs/Futures.md index 6f25b791c0bd1..c4bc79d36d333 100644 --- a/eden/fs/docs/Futures.md +++ b/eden/fs/docs/Futures.md @@ -1,42 +1,73 @@ # Futures and Asynchronous Code -This document assumes some working knowledge of folly::Future and folly::SemiFuture. Please read the [Future overview](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md) first. +This document assumes some working knowledge of folly::Future and +folly::SemiFuture. Please read the +[Future overview](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md) +first. ## Why Future? -EdenFS is largely concurrent and asynchronous. The traditional way to write this kind of code would be explicit state machines with requests and callbacks. It's easy to forget to call a callback or call one twice under rarely-executed paths like error handling. +EdenFS is largely concurrent and asynchronous. The traditional way to write this +kind of code would be explicit state machines with requests and callbacks. It's +easy to forget to call a callback or call one twice under rarely-executed paths +like error handling. -To make asynchronous code easier to reason about, Folly provides `folly::Future` and `folly::Promise`. Each Future and Promise form a pair, where `folly::Future` holds the eventual value and Promise is how the value is published. Readers can either block on the result (offering their thread to any callbacks that may run) or schedule a callback to be run when the value is available. `folly::Promise` is fulfilled on the writing side. +To make asynchronous code easier to reason about, Folly provides `folly::Future` +and `folly::Promise`. Each Future and Promise form a pair, where `folly::Future` +holds the eventual value and Promise is how the value is published. Readers can +either block on the result (offering their thread to any callbacks that may run) +or schedule a callback to be run when the value is available. `folly::Promise` +is fulfilled on the writing side. ## Why SemiFuture? -The biggest problem with Future is that callbacks may run either on the thread calling `Future::then` or on the thread calling `Promise::set`. Callbacks have to be written carefully, and if they acquire locks, any site that calls `Future::then` or `Promise::set` must not hold those locks. +The biggest problem with Future is that callbacks may run either on the thread +calling `Future::then` or on the thread calling `Promise::set`. Callbacks have +to be written carefully, and if they acquire locks, any site that calls +`Future::then` or `Promise::set` must not hold those locks. -`folly::SemiFuture` is a reaction to these problems. It's a Future without a `SemiFuture::then` method. Assuming no use of unsafe APIs (including any `InlineExecutor`), callbacks will never run on the thread that calls `Promise::set`. Any system with an internal thread pool that cannot tolerate arbitrary callbacks running on its threads should use `SemiFuture`. +`folly::SemiFuture` is a reaction to these problems. It's a Future without a +`SemiFuture::then` method. Assuming no use of unsafe APIs (including any +`InlineExecutor`), callbacks will never run on the thread that calls +`Promise::set`. Any system with an internal thread pool that cannot tolerate +arbitrary callbacks running on its threads should use `SemiFuture`. ## Why ImmediateFuture? -`folly::Future` and `folly::SemiFuture` introduce significant overhead. A `Future`/`Promise` pair hold a heap-allocated, atomic refcounted `FutureCore`. In EdenFS, it's common to make an asynchronous call that hits cache and can answer immediately. Heap allocating the result is comparatively expensive. We introduced `facebook::eden::ImmediateFuture` for those cases. ImmediateFuture either stores the result value inline or holds a SemiFuture. +`folly::Future` and `folly::SemiFuture` introduce significant overhead. A +`Future`/`Promise` pair hold a heap-allocated, atomic refcounted `FutureCore`. +In EdenFS, it's common to make an asynchronous call that hits cache and can +answer immediately. Heap allocating the result is comparatively expensive. We +introduced `facebook::eden::ImmediateFuture` for those cases. ImmediateFuture +either stores the result value inline or holds a SemiFuture. ## When should I use which Future? There are reasons to use each Future. -  | `Future` | `SemiFuture` | `ImmediateFuture` ---- | --- | --- | --- -Storage is heap-allocated | yes | yes | no -Callbacks run as early as the result is available | yes | no | no -Callbacks may run on the fulfiller's thread | yes | no | no -Callbacks may run immediately or asynchronously | yes | no | yes -sizeof, cost of move() | void* | void* | Depends on sizeof(T) with minimum of 40 bytes as of Oct 2021 +|   | `Future` | `SemiFuture` | `ImmediateFuture` | +| ------------------------------------------------- | -------- | ------------ | ------------------------------------------------------------ | +| Storage is heap-allocated | yes | yes | no | +| Callbacks run as early as the result is available | yes | no | no | +| Callbacks may run on the fulfiller's thread | yes | no | no | +| Callbacks may run immediately or asynchronously | yes | no | yes | +| sizeof, cost of move() | void\* | void\* | Depends on sizeof(T) with minimum of 40 bytes as of Oct 2021 | -`folly::Future` should be used when it's important the callback runs as early as possible. For example, measuring the duration of internal operations. +`folly::Future` should be used when it's important the callback runs as early as +possible. For example, measuring the duration of internal operations. -SemiFuture or ImmediateFuture should be used when it's important that chained callbacks never run on internal thread pools. +SemiFuture or ImmediateFuture should be used when it's important that chained +callbacks never run on internal thread pools. -ImmediateFuture should be used when the value is small and avoiding an allocation is important for performance. Large structs can use unique_ptr or shared_ptr. +ImmediateFuture should be used when the value is small and avoiding an +allocation is important for performance. Large structs can use unique_ptr or +shared_ptr. -It's important to note that, when a callback and its closures hold reference counts or are larger than the result value, it can be worth using Future, because the callbacks are collapsed into a value as early as possible. SemiFuture, even if the SemiFuture is held by an ImmediateFuture, will not collapse any chained callbacks until the SemiFuture is attached to an executor. +It's important to note that, when a callback and its closures hold reference +counts or are larger than the result value, it can be worth using Future, +because the callbacks are collapsed into a value as early as possible. +SemiFuture, even if the SemiFuture is held by an ImmediateFuture, will not +collapse any chained callbacks until the SemiFuture is attached to an executor. ## Safetyness and caveats @@ -73,14 +104,16 @@ As a general rule of thumb, any use of `folly::InlineLikeExecutor` is widely unsafe and should never be used. This is primarily due to forcing `Promise::set` to execute the `folly::Future` callbacks in the context of the fulfiller' thread -For instance, if we re-use the previous example, but where the `threadPool` is an -`InlineLikeExecutor` the `setValue` will also execute both continuation before -returning. +For instance, if we re-use the previous example, but where the `threadPool` is +an `InlineLikeExecutor` the `setValue` will also execute both continuation +before returning. This has been known to cause deadlocks in the past. This includes: - - `folly::SemiFuture::toUnsafeFuture` and any `Unsafe` methods as these are merely wrappers on `.via(&InlineExecutor::instance())`, - - `folly::Promise::getFuture` for the same reason, - - `folly::SemiFuture::via(&QueuedImmediateExecutor::instance())` + +- `folly::SemiFuture::toUnsafeFuture` and any `Unsafe` methods as these are + merely wrappers on `.via(&InlineExecutor::instance())`, +- `folly::Promise::getFuture` for the same reason, +- `folly::SemiFuture::via(&QueuedImmediateExecutor::instance())` `folly::InlineLikeExecutor` also have the downside to be incompatible with `folly::coro::Task` which is Folly's coroutine implementation. @@ -98,5 +131,6 @@ execute eagerly unless attached to an executor (and thus becoming ## TODO -* Unsafely mapping ImmediateFuture onto Future with .via(QueuedImmediateExecutor)? -* What about coroutines? +- Unsafely mapping ImmediateFuture onto Future with + .via(QueuedImmediateExecutor)? +- What about coroutines? diff --git a/eden/fs/docs/Globbing.md b/eden/fs/docs/Globbing.md index 663777dac54ad..3e63b549e1c5a 100644 --- a/eden/fs/docs/Globbing.md +++ b/eden/fs/docs/Globbing.md @@ -2,12 +2,12 @@ EdenFS supports glob patterns through the following interfaces: -* Ignore files (e.g. `.gitignore`) -* `globFiles` Thrift API +- Ignore files (e.g. `.gitignore`) +- `globFiles` Thrift API ## Ignore Files -EdenFS uses *ignore files* to exclude files in the `getScmStatus` Thrift API +EdenFS uses _ignore files_ to exclude files in the `getScmStatus` Thrift API (used by `hg status`, for example). The syntax for EdenFS' ignore files is compatible with the syntax for [`gitignore` files][gitignore] used by the Git version control system, even when an EdenFS checkout is backed by a Mercurial @@ -17,12 +17,12 @@ repository. EdenFS interprets the following tokens specially within glob patterns: -* `**`: Match zero, one, or more path components. -* `*`: Match zero, one, or more valid path component characters. -* `?`: Match exactly one valid path component characters. -* `[`: Match exactly one path component character in the given set of +- `**`: Match zero, one, or more path components. +- `*`: Match zero, one, or more valid path component characters. +- `?`: Match exactly one valid path component characters. +- `[`: Match exactly one path component character in the given set of characters. The set is terminated by `]`. -* `[!`, `[^`: Match exactly one path component character *not* in the given set +- `[!`, `[^`: Match exactly one path component character _not_ in the given set of characters. The set is terminated by `]`. EdenFS glob patterns are compatible with [`gitignore` patterns][gitignore] used diff --git a/eden/fs/docs/Glossary.md b/eden/fs/docs/Glossary.md index 20256efe1d166..c346574ae63d7 100644 --- a/eden/fs/docs/Glossary.md +++ b/eden/fs/docs/Glossary.md @@ -6,17 +6,17 @@ The backing repository is the local, on-disk, source control [repository](#repository) from which EdenFS fetches source control data for a [checkout](#checkout). -When fetching data from Mercurial or Git, EdenFS requires a separate, local, bare -repository to be kept somewhere for EdenFS to use to fetch source control data. -This bare repository is the backing repository. Multiple checkouts can share -the same backing repository. +When fetching data from Mercurial or Git, EdenFS requires a separate, local, +bare repository to be kept somewhere for EdenFS to use to fetch source control +data. This bare repository is the backing repository. Multiple checkouts can +share the same backing repository. ### Backing Store The term backing store is sometimes used to refer to the underlying data source -used to fetch source control object information. This term comes from the +used to fetch source control object information. This term comes from the [`BackingStore`](../store/BackingStore.h) class which provides an API for -fetching data from source control. This is an abstract API which can, in theory, +fetching data from source control. This is an abstract API which can, in theory, support multiple different source control types, such as EdenSCM, Mercurial, or Git. @@ -24,7 +24,7 @@ Fetching data from the backing store is generally expected to be an expensive option which may end up fetching data from a remote host. The backing store generally refers to EdenFS's internal implementation for -fetching source control data. The [backing repository](#backing-repository) is +fetching source control data. The [backing repository](#backing-repository) is the concrete, local, on-disk storage of the underlying source control state. ### Checkout @@ -33,7 +33,7 @@ When we use the term "checkout" in EdenFS we mean a local client-side source control checkout, particularly the working directory state. We use this in contrast with the term ["repository"](#repository), which we -generally use to refer to the source control metadata storage. The source +generally use to refer to the source control metadata storage. The source control repository stores information about historical commits and objects, whereas the checkout displays the current working directory state for the currently checked-out commit. @@ -42,41 +42,41 @@ EdenFS exposes checkouts to users; it fetches underlying source control data from a repository. Our usage of this terminology has evolved somewhat over the course of EdenFS's -development. Early in development we also used the terms "client" and "mount -point" to refer to checkouts. In a handful of locations you may still see +development. Early in development we also used the terms "client" and "mount +point" to refer to checkouts. In a handful of locations you may still see references to these older terms (in particular, the `EdenMount` class), but for most new code and documentation we have attempted to be consistent with the use of the term "checkout". ### Inode -An inode represents a file or directory in the filesystem. This terminology is -common to Unix filesystems. The +An inode represents a file or directory in the filesystem. This terminology is +common to Unix filesystems. The [inode wikipedia entry](https://en.wikipedia.org/wiki/Inode) has a more complete description. ### Journal The Journal is the data structure that EdenFS uses to record recent modifying -filesystem I/O operations. This is used to implement APIs like +filesystem I/O operations. This is used to implement APIs like `getFilesChangedSince()`, which is in turn used by watchman to tell clients about recent filesystem changes. ### Loaded / Unloaded The terms "loaded" and "unloaded" are used to refer to whether EdenFS has state -for a particular inode loaded in memory or not. If EdenFS has a `FileInode` or +for a particular inode loaded in memory or not. If EdenFS has a `FileInode` or `TreeInode` object in memory for a particular file or directory, then that file -is referred to as loaded. Otherwise, that file or directory is considered unloaded. +is referred to as loaded. Otherwise, that file or directory is considered +unloaded. -By default, when a checkout is first mounted, most inodes are unloaded. -EdenFS then lazily loads inodes on-demand as they are accessed. +By default, when a checkout is first mounted, most inodes are unloaded. EdenFS +then lazily loads inodes on-demand as they are accessed. ### Local Store -The local store refers to EdenFS's local cache of source control data. -This data is stored in the [EdenFS state directory](#state-directory) at -`.eden/storage`. +The local store refers to EdenFS's local cache of source control data. This data +is stored in the [EdenFS state directory](#state-directory) at `.eden/storage`. Over time EdenFS has been moving away from tracking data in the local store, instead relying more on the underlying source control data fetching mechanisms @@ -90,75 +90,76 @@ inodes have contents identical to a source control object. When a checkout is first cloned, all inodes are non-materialized, as we know that the root directory corresponds to the root source control tree for the -current commit. Each of its children correspond to its corresponding children -in source control, so they are also non-materialized. +current commit. Each of its children correspond to its corresponding children in +source control, so they are also non-materialized. When a file is modified from its source control state, it becomes materialized. This is because we can no longer fetch the file contents from source control. -Following this logic, brand new files that are created locally immediately -start in the materialized state. Also, if a file no longer corresponds to a -known source control object, the parent directory also no longer corresponds to -a known source control tree. This means that when a child inode is -materialized, its ancestors are also materialized recursively upwards until the -root of the repo or an already materialized tree is reached. +Following this logic, brand new files that are created locally immediately start +in the materialized state. Also, if a file no longer corresponds to a known +source control object, the parent directory also no longer corresponds to a +known source control tree. This means that when a child inode is materialized, +its ancestors are also materialized recursively upwards until the root of the +repo or an already materialized tree is reached. -Materialized files are stored in the [overlay](#overlay). -Non-materialized files do not need to be stored in the overlay, as their -contents can always be fetched from the source control repository. +Materialized files are stored in the [overlay](#overlay). Non-materialized files +do not need to be stored in the overlay, as their contents can always be fetched +from the source control repository. For more details see the [Inode Materialization](Inodes.md#inode-materialization) documentation. In ProjFS mounts, there is an additional special case for materialized files. -Files that have been renamed are considered materialized inodes. Technically, -we still know the source control object associated with the inode, however, -we no longer store this association in the overlay. ProjFS always will -make a read request for these files with the original path and reads are -only served from source control objects in ProjFS. +Files that have been renamed are considered materialized inodes. Technically, we +still know the source control object associated with the inode, however, we no +longer store this association in the overlay. ProjFS always will make a read +request for these files with the original path and reads are only served from +source control objects in ProjFS. ### Populated + Inodes or files are considered "populated" when their contents have been -observed by the kernel, but the file has not yet been modified. -For ProjFS mounts "populated" this means the contents are present on -the filesystem and reads are going to be directly handled by ProjFS -until we invalidate the file. In FUSE and NFS mounts this means the kernel -may have the file contents in its caches, though FUSE or NFS may have -decided to evict them from cache of its own will. +observed by the kernel, but the file has not yet been modified. For ProjFS +mounts "populated" this means the contents are present on the filesystem and +reads are going to be directly handled by ProjFS until we invalidate the file. +In FUSE and NFS mounts this means the kernel may have the file contents in its +caches, though FUSE or NFS may have decided to evict them from cache of its own +will. -For ProjFS mounts, "populated" is roughly equivalent to the hydrated -placeholder state for files. For directories "populated" is roughly equivalent -to placeholders without materialized children. +For ProjFS mounts, "populated" is roughly equivalent to the hydrated placeholder +state for files. For directories "populated" is roughly equivalent to +placeholders without materialized children. -No populated files and directories correspond to materialized inodes and -vice versa. These states are intentionally independent. i.e. populated -is defined as files the kernel knows about - materialized ones. +No populated files and directories correspond to materialized inodes and vice +versa. These states are intentionally independent. i.e. populated is defined as +files the kernel knows about - materialized ones. ### Overlay The overlay is where EdenFS stores information about [materialized](#materialized--non-materialized) files and directories. -Each checkout has its own separate overlay storage. This data is stored in the +Each checkout has its own separate overlay storage. This data is stored in the [EdenFS state directory](#state-directory) at `.eden/clients/CHECKOUT_NAME/local` The term "overlay" comes from the fact that it behaves like an -[overlay filesystem](https://en.wikipedia.org/wiki/Union_mount) (also known as -a union filesystem), where local modifications are overlayed on top of the +[overlay filesystem](https://en.wikipedia.org/wiki/Union_mount) (also known as a +union filesystem), where local modifications are overlayed on top of the underlying source control state. ### Repository -The term "repository" is used to refer to the source control system's storage -of source control commit, directory, and file data. +The term "repository" is used to refer to the source control system's storage of +source control commit, directory, and file data. -Contrast this to the term [checkout](#checkout) above, which refers -specifically to the working directory state. +Contrast this to the term [checkout](#checkout) above, which refers specifically +to the working directory state. ### State Directory -The state directory is where EdenFS stores all of its local state. The -default location of this directory can be controlled in the system -configuration (`/etc/eden/edenfs.rc`) or the user-specific configuration -(`$HOME/.edenrc`), but it generally defaults to `$HOME/.eden/`. However, -it defaults to `$HOME/local/.eden` in some Meta environments. +The state directory is where EdenFS stores all of its local state. The default +location of this directory can be controlled in the system configuration +(`/etc/eden/edenfs.rc`) or the user-specific configuration (`$HOME/.edenrc`), +but it generally defaults to `$HOME/.eden/`. However, it defaults to +`$HOME/local/.eden` in some Meta environments. diff --git a/eden/fs/docs/InodeLifetime.md b/eden/fs/docs/InodeLifetime.md index f635291fd9070..58bd7a473b9a7 100644 --- a/eden/fs/docs/InodeLifetime.md +++ b/eden/fs/docs/InodeLifetime.md @@ -1,60 +1,53 @@ -Inode Ownership -=============== +# Inode Ownership -Inodes are managed via `InodePtr` objects. `InodePtr` is a smart-pointer class +Inodes are managed via `InodePtr` objects. `InodePtr` is a smart-pointer class that maintains a reference count on the underlying `InodeBase` object, similar to `std::shared_ptr`. However, unlike `std::shared_ptr`, inodes are not necessarily deleted -immediately when their reference count drops to zero. Instead, they may remain +immediately when their reference count drops to zero. Instead, they may remain in memory for a while in case they are used again soon. -Owners ------- +## Owners -- `InodeMap` holds a reference to the root inode. This ensures that the root +- `InodeMap` holds a reference to the root inode. This ensures that the root inode remains in existence for as long as the `EdenMount` exists. -- Each inode holds a reference to its parent `TreeInode`. This ensures that - if an inode exists, all of its parents all the way to the mount point root - also exist. +- Each inode holds a reference to its parent `TreeInode`. This ensures that if + an inode exists, all of its parents all the way to the mount point root also + exist. - For all other call sites, callers obtain a reference to an inode when they - look it up. The lookup functions return `InodePtr` objects that the call - site should retain for as long as they need access to the inode. + look it up. The lookup functions return `InodePtr` objects that the call site + should retain for as long as they need access to the inode. -Non-Owners ----------- +## Non-Owners -- `InodeMap` does not hold a reference to the inodes it contains. Otherwise, - it would never be possible to unload or destroy any inodes. Instead, the - `InodeMap` holds raw pointers to inode objects. When `Inode` objects are +- `InodeMap` does not hold a reference to the inodes it contains. Otherwise, it + would never be possible to unload or destroy any inodes. Instead, the + `InodeMap` holds raw pointers to inode objects. When `Inode` objects are unloaded they are always explicitly removed from the `InodeMap`'s list of loaded inodes. -- A `TreeInode` does not hold a reference to any of its children. Otherwise, - this would cause circular reference, since each child holds a reference to - its parent `TreeInode`. The `TreeInode` is always explicitly informed when - one of its children inodes is unloaded, so it can remove the raw pointer to - the child from its child entries map. +- A `TreeInode` does not hold a reference to any of its children. Otherwise, + this would cause circular reference, since each child holds a reference to its + parent `TreeInode`. The `TreeInode` is always explicitly informed when one of + its children inodes is unloaded, so it can remove the raw pointer to the child + from its child entries map. - -Inode Lookup -============ +# Inode Lookup Inodes may be looked up in one of two ways, either by name or by inode number. -`TreeInode::getOrLoadChild()` is the API for doing inode lookups by name, -and `InodeMap::lookupinode()` is the API for doing inode lookups by inode -number. +`TreeInode::getOrLoadChild()` is the API for doing inode lookups by name, and +`InodeMap::lookupinode()` is the API for doing inode lookups by inode number. -Either of these two APIs may have to create the inode object. Alternatively, -if the specified inode already exists, they will increment the reference count -to the existing object and return it. It is possible the inode is already -present in the `InodeMap`, but was previously unreferenced, so these APIs may -increment the reference count from 0 to 1. +Either of these two APIs may have to create the inode object. Alternatively, if +the specified inode already exists, they will increment the reference count to +the existing object and return it. It is possible the inode is already present +in the `InodeMap`, but was previously unreferenced, so these APIs may increment +the reference count from 0 to 1. -Simultaneous Lookups --------------------- +## Simultaneous Lookups The `InodeMap` class keeps track of all currently loaded inodes as well as information about inodes that have inode numbers allocated but are not loaded. @@ -63,9 +56,7 @@ This allows `InodeMap` to avoid starting two load attempts for the same inode. If a second lookup attempt occurs for an inode already being loaded, `InodeMap` handles notifying both waiting callers when the single load attempt completes. - -Inode Unloading -=============== +# Inode Unloading Inode unloading can be triggered by several events: @@ -80,30 +71,29 @@ reference count goes to zero. If the inode is unlinked and its FUSE reference count is also zero, we also destroy the inode immediately. -In other cases we generally leave the inode object loaded, but it would be -valid to decide to unload it based on other criteria (for instance, we could -decide to immediately unload unreferenced inodes if we are low on memory). +In other cases we generally leave the inode object loaded, but it would be valid +to decide to unload it based on other criteria (for instance, we could decide to +immediately unload unreferenced inodes if we are low on memory). ## FUSE reference count going to zero When the FUSE reference count goes to zero, we should destroy the inode immediately if it is unlinked and its pointer reference count is also zero. -To simplify synchronization, we currently collapse this case into the one -above: we only decrement the FUSE reference count on a loaded inode when we are -holding a normal `InodePtr` reference to the inode. Therefore, we will always -see the normal reference count drop to zero at some point after the FUSE -reference count drops to zero, and we process the unload at that time. +To simplify synchronization, we currently collapse this case into the one above: +we only decrement the FUSE reference count on a loaded inode when we are holding +a normal `InodePtr` reference to the inode. Therefore, we will always see the +normal reference count drop to zero at some point after the FUSE reference count +drops to zero, and we process the unload at that time. ## On demand -We will likely add a periodic background task to unload unreferenced inodes -that have not been accessed in some time. This unload operation could also be +We will likely add a periodic background task to unload unreferenced inodes that +have not been accessed in some time. This unload operation could also be triggered in response to other events (for instance, a thrift call, or going over some memory usage limit). -Synchronization and the Acquire Count -------------------------------------- +## Synchronization and the Acquire Count Synchronization of inode loading and unloading is slightly tricky, particularly for unloading. @@ -111,108 +101,104 @@ for unloading. ### Loading When loading an inode, we always hold the `InodeMap` lock to check if the inode -in question is already loaded or if a load is in progress. Once the inode is +in question is already loaded or if a load is in progress. Once the inode is loaded, we acquire its parent `TreeInode`'s `contents_` lock, then the `InodeMap` lock (in that order), so we can insert the inode into it's parent's entry list and into the `InodeMap`'s list of loaded inodes. ### Updating Reference Counts -`InodePtr` itself does not hold any extra locks when performing reference -count updates. The main inode reference count is updated with atomic -operations, but without any other locks held. +`InodePtr` itself does not hold any extra locks when performing reference count +updates. The main inode reference count is updated with atomic operations, but +without any other locks held. However, there is one important item to note here: updates done via `InodePtr` -copying can never increment the reference count from 0 to 1. The lookup APIs +copying can never increment the reference count from 0 to 1. The lookup APIs (`TreeInode::getOrLoadChild()` and `InodeMap::lookupInode()`) are the only two -places that can ever increment the reference count from 0 to 1. Both of these +places that can ever increment the reference count from 0 to 1. Both of these lookup APIs hold a lock when potentially updating the reference count from 0 to + 1. `TreeInode::getOrLoadChild()` holds the parent `TreeInode`'s `contents_` lock, -and `InodeMap::lookupInode()` holds the `InodeMap` lock. This means that if -you hold both of these locks and you see that an inode's reference count is +and `InodeMap::lookupInode()` holds the `InodeMap` lock. This means that if you +hold both of these locks and you see that an inode's reference count is currently 0, no other thread can acquire a reference count to that inode. ### Preventing Multiple Unload Attempts Holding the parent `TreeInode`'s `contents_` lock and the `InodeMap` lock ensures that no other thread can acquire a new reference on an inode, but that -alone does not mean it is safe to destroy the inode. We still need to prevent +alone does not mean it is safe to destroy the inode. We still need to prevent multiple threads from both trying to destroy an inode. For instance, consider if thread A destroys the last `InodePtr` to an inode, -dropping its reference count to 0. However, before thread A has a chance to -grab the `TreeInode` and `InodeMap` locks and decide if it wants to unload the -inode, thread B looks up the inode, increasing the reference count from 0 to 1, -but then immediately destroys its `InodePtr`, dropping the reference count back +dropping its reference count to 0. However, before thread A has a chance to grab +the `TreeInode` and `InodeMap` locks and decide if it wants to unload the inode, +thread B looks up the inode, increasing the reference count from 0 to 1, but +then immediately destroys its `InodePtr`, dropping the reference count back to 0. In this situation thread A and thread B have both just dropped the reference -count to 0. We need to make sure that only one of these two threads can try to +count to 0. We need to make sure that only one of these two threads can try to destroy the inode. -This is achieved through another counter, called the "acquire" counter. -This counter is incremented each time the inode reference count goes from 0 to -1, and decremented each time the reference count goes from 1 to 0. However, -unlike the main reference count, the acquire counter is only modified while -holding some additional locks. +This is achieved through another counter, called the "acquire" counter. This +counter is incremented each time the inode reference count goes from 0 to 1, and +decremented each time the reference count goes from 1 to 0. However, unlike the +main reference count, the acquire counter is only modified while holding some +additional locks. -Increments to the acquire counter are only done while holding either the -parent `TreeInode`'s `contents_` lock (in the case of -`TreeInode::getOrLoadChild()`) or the `InodeMap` lock (in the case of -`InodeMap::lookupInode()`). +Increments to the acquire counter are only done while holding either the parent +`TreeInode`'s `contents_` lock (in the case of `TreeInode::getOrLoadChild()`) or +the `InodeMap` lock (in the case of `InodeMap::lookupInode()`). -Decrements to the acquire counter are only done while holding both the -parent `TreeInode`'s `contents_` lock and the `InodeMap` lock. +Decrements to the acquire counter are only done while holding both the parent +`TreeInode`'s `contents_` lock and the `InodeMap` lock. When thread A and thread B both see that the main reference count drops to 0, they both attempt to acquire both the `TreeInode` and `InodeMap` locks. Whichever thread acquires the locks first will see that the acquire count is non-zero (since both threads incremented it when bumping the main reference -count from 0 to 1). This thread decrements the acquire count and does nothing -else since the acquire count is non zero. The second thread can then acquire -the locks, decrement the acquire count and see that it is now zero. This -second thread can then perform the unload (while still holding both locks). +count from 0 to 1). This thread decrements the acquire count and does nothing +else since the acquire count is non zero. The second thread can then acquire the +locks, decrement the acquire count and see that it is now zero. This second +thread can then perform the unload (while still holding both locks). -EdenMount Destruction -===================== +# EdenMount Destruction All inode objects store a pointer to the `EdenMount` that they are a part of. This means that the `EdenMount` itself cannot be destroyed until all of its inodes are destroyed. -We achieve this via the root `TreeInode`'s reference count. During normal +We achieve this via the root `TreeInode`'s reference count. During normal operation, the `EdenMount` holds a reference to the root `TreeInode` -(technically the `InodeMap` holds the reference, but the `EdenMount` owns -the `InodeMap`). When the `EdenMount` needs to be destroyed, we release the -reference count on the root inode. When the root inode becomes unreferenced we -know that all of its children have been destroyed, and it is now safe to -destroy the `EdenMount` object itself. +(technically the `InodeMap` holds the reference, but the `EdenMount` owns the +`InodeMap`). When the `EdenMount` needs to be destroyed, we release the +reference count on the root inode. When the root inode becomes unreferenced we +know that all of its children have been destroyed, and it is now safe to destroy +the `EdenMount` object itself. -All of this is triggered through the `EdenMount::destroy()` function. This +All of this is triggered through the `EdenMount::destroy()` function. This function marks the mount as shutting down, which causes the `InodeMap` to -immediately unload any inodes that become newly unreferenced. We then trigger -an immediate unload scan to unload any inodes that were already unreferenced. -Once this is done, we release the `InodeMap`'s reference count on the root -inode, allowing it to become unreferenced once all of its children are -destroyed. - +immediately unload any inodes that become newly unreferenced. We then trigger an +immediate unload scan to unload any inodes that were already unreferenced. Once +this is done, we release the `InodeMap`'s reference count on the root inode, +allowing it to become unreferenced once all of its children are destroyed. -FUSE Reference Counts -===================== +# FUSE Reference Counts In addition to the reference count tracking how many `InodePtr` objects are currently referring to an inode, `InodeBase` also keeps track of how many -outstanding references to this inode exist in the FUSE layer (this is the -number of `lookup()`/`create()`/`mkdir()`/`symlink()`/`link()` calls made for -this inode, minus the number of times it was forgotten via `forget()`). +outstanding references to this inode exist in the FUSE layer (this is the number +of `lookup()`/`create()`/`mkdir()`/`symlink()`/`link()` calls made for this +inode, minus the number of times it was forgotten via `forget()`). However, the FUSE reference count is not directly related to the inode object lifetime. -Inode objects may be unloaded even when the FUSE reference count is non-zero. -In this case, the `InodeMap` retains enough information needed to re-create the +Inode objects may be unloaded even when the FUSE reference count is non-zero. In +this case, the `InodeMap` retains enough information needed to re-create the `Inode` object if the inode number is later looked up again by the FUSE API. The FUSE reference count is only adjusted while holding a normal InodePtr diff --git a/eden/fs/docs/InodeLocks.md b/eden/fs/docs/InodeLocks.md index 2f55598375e4f..9fa66f606d600 100644 --- a/eden/fs/docs/InodeLocks.md +++ b/eden/fs/docs/InodeLocks.md @@ -1,29 +1,28 @@ -Inode-related Locks -------------------- +## Inode-related Locks ## InodeBase's `location_` Lock: -No other locks should be acquired while holding this lock. -Two `location_` locks should never be held at the same time. +No other locks should be acquired while holding this lock. Two `location_` locks +should never be held at the same time. This field cannot be updated without holding both the EdenMount's rename lock and the `location_` lock for the InodeBase in question. -Note that `InodeBase::getLogPath()` acquires `location_` locks. This function -is used in log statements in many places, including in places where other locks -are held. It is therefore important to ensure that the `location_` lock -remains at the very bottom of our lock-ordering stack. +Note that `InodeBase::getLogPath()` acquires `location_` locks. This function is +used in log statements in many places, including in places where other locks are +held. It is therefore important to ensure that the `location_` lock remains at +the very bottom of our lock-ordering stack. ## InodeMap `data_` Lock: No other locks should be acquired while holding this lock, apart from InodeBase -`location_` locks (InodeBase `location_` locks are only held with the -InodeMap lock already held for the purpose of calling `inode->getLogPath()` in -logging statements). +`location_` locks (InodeBase `location_` locks are only held with the InodeMap +lock already held for the purpose of calling `inode->getLogPath()` in logging +statements). In general, it should only be held very briefly while doing lookups/inserts on -the map data structures. Once we need to load an Inode, the InodeMap lock is -released for the duration of the load operation itself. It is re-acquired when +the map data structures. Once we need to load an Inode, the InodeMap lock is +released for the duration of the load operation itself. It is re-acquired when the load completes so we can insert the new Inode into the map. ## InodeMetadataTable `state_` Lock: @@ -35,8 +34,7 @@ InodeTable's index data structures. ## FileInode Lock: -The InodeBase `location_` lock may be acquired while holding a FileInode's -lock. +The InodeBase `location_` lock may be acquired while holding a FileInode's lock. ## TreeInode `contents_` Lock: @@ -52,47 +50,46 @@ lock. In some situations, the same thread acquires multiple `contents_` locks together. - - Some code paths hold a parent TreeInode's `contents_` lock while accessing - its children, and then acquires a child TreeInode's `contents_` lock while - still holding the parent TreeInode's lock. +- Some code paths hold a parent TreeInode's `contents_` lock while accessing its + children, and then acquires a child TreeInode's `contents_` lock while still + holding the parent TreeInode's lock. - - The `rename()` code may hold up to 3 TreeInode locks. It always holds the - `contents_` lock on both the source TreeInode and the destination - TreeInode. Additionally, if the destination name refers to an existing - TreeInode, the rename() holds its `contents_` lock as well, to ensure that - it is empty, and to prevent new entries from being created inside this - directory once the rename starts. +- The `rename()` code may hold up to 3 TreeInode locks. It always holds the + `contents_` lock on both the source TreeInode and the destination TreeInode. + Additionally, if the destination name refers to an existing TreeInode, the + rename() holds its `contents_` lock as well, to ensure that it is empty, and + to prevent new entries from being created inside this directory once the + rename starts. To prevent deadlocks, the lock ordering constraints for TreeInode `contents_` are as follows: -- If you are not holding the mountpoint rename lock, you can only acquire - a TreeInode `contents_` lock if the other `contents_` locks you are holding - are for this TreeInode's immediate parents (e.g., if you are already - holding another `contents_` lock, it must be for this TreeInode's parent. If - you are holding two other `contents_` locks, it must be for this TreeInode's - parent and grandparent). +- If you are not holding the mountpoint rename lock, you can only acquire a + TreeInode `contents_` lock if the other `contents_` locks you are holding are + for this TreeInode's immediate parents (e.g., if you are already holding + another `contents_` lock, it must be for this TreeInode's parent. If you are + holding two other `contents_` locks, it must be for this TreeInode's parent + and grandparent). - Note, however, that acquiring multiple TreeInode contents locks is discouraged. - When possible, it is preferred to release the lock on the parent TreeInode - before locking the child. Acquiring locks on more than 2 levels of the tree - hierarchy is technically safe from a lock ordering perspective, but is also - strongly discouraged. + Note, however, that acquiring multiple TreeInode contents locks is + discouraged. When possible, it is preferred to release the lock on the parent + TreeInode before locking the child. Acquiring locks on more than 2 levels of + the tree hierarchy is technically safe from a lock ordering perspective, but + is also strongly discouraged. - If you are holding the mountpoint rename lock, it is safe to acquire multiple - TreeInode locks at a time. However, if there is an ancestor/child - relationship between any of the TreeInodes, the ancestor lock must be - acquired first. This avoids lock ordering issues with other threads that are - not holding the rename lock. Among unrelated TreeInodes, no particular - ordering is required. + TreeInode locks at a time. However, if there is an ancestor/child relationship + between any of the TreeInodes, the ancestor lock must be acquired first. This + avoids lock ordering issues with other threads that are not holding the rename + lock. Among unrelated TreeInodes, no particular ordering is required. ## EdenMount's Rename Lock: -This lock is a very high level lock in our lock ordering stack--it is -acquired before any other individual inode-specific locks. +This lock is a very high level lock in our lock ordering stack--it is acquired +before any other individual inode-specific locks. -This lock is held for the duration of a rename or unlink operation. No -InodeBase `location_` fields may be updated without holding this lock. +This lock is held for the duration of a rename or unlink operation. No InodeBase +`location_` fields may be updated without holding this lock. ## EdenMount's Current Snapshot Lock: diff --git a/eden/fs/docs/InodeStorage.md b/eden/fs/docs/InodeStorage.md index 1311f5157a8b2..9f70a84bd27ca 100644 --- a/eden/fs/docs/InodeStorage.md +++ b/eden/fs/docs/InodeStorage.md @@ -6,44 +6,44 @@ We have some guiding principles that affect the design of Eden and its durability properties. We intend for Eden to reliably preserve user data if the Eden processes aborts -or is killed. If the process dies, none of the user's data should be lost. -Eden crashing ought to be rare, but, especially while it's in development, it's +or is killed. If the process dies, none of the user's data should be lost. Eden +crashing ought to be rare, but, especially while it's in development, it's realistic to expect things to go wrong, including stray `killall edenfs` commands. -However, we do not guarantee consistent data if a VM suddenly powers off -or if a disk fails. It is a substantial amount of work, and probably a -performance penalty, to be durable under those conditions. +However, we do not guarantee consistent data if a VM suddenly powers off or if a +disk fails. It is a substantial amount of work, and probably a performance +penalty, to be durable under those conditions. Fortunately, thanks to commit cloud, the risk of losing days of work due to disk -or machine shutdown is low. While many engineer-hours will be spent working in +or machine shutdown is low. While many engineer-hours will be spent working in an Eden checkout, the amount of work that builds up prior to a commit is -hopefully bounded. (And perhaps someday we will automatically snapshot your +hopefully bounded. (And perhaps someday we will automatically snapshot your working copy!) ## Concepts Git and Mercurial have abstract, hash-indexed tree data structures representing -a file hierarchy. (You'll find the corresponding code in `eden/fs/model`.) +a file hierarchy. (You'll find the corresponding code in `eden/fs/model`.) Version control trees and files have a subset of the possible states that a real -filesystem can be in. For example, neither Git nor Mercurial version a file's +filesystem can be in. For example, neither Git nor Mercurial version a file's user or group ownership, and the only versioned permission bit is -user-executable. Also, version control systems do not support hard links. +user-executable. Also, version control systems do not support hard links. In a non-Eden, traditional version control system, checkout operations immediately materialize that abstract tree data structure into actual -directories and files on disk. The downside of course is that checkout becomes +directories and files on disk. The downside of course is that checkout becomes O(repo) in disk operations and the entire tree is physically allocated on disk. What makes Eden useful is that it only fetches trees and blobs from version -control as the filesystem is explored. This makes checkout O(changes). But it +control as the filesystem is explored. This makes checkout O(changes). But it raises some questions about how to expose traditional filesystem concepts like timestamps, permission bits, and inode numbers. ## Inode States As the filesystem is explored through FUSE, inodes are allocated to represent a -accessed source control trees and files. A given inode can then transition +accessed source control trees and files. A given inode can then transition between states as filesystem operations are performed on it. ### Metadata State Machine @@ -51,42 +51,42 @@ between states as filesystem operations are performed on it. Eden inodes transition between a series of states: Once the parent tree has been loaded, the names, types, and hashes of its -children are known. At this point, questions like "does this entry exist?" or +children are known. At this point, questions like "does this entry exist?" or "what is its hash?" can be answered, in addition to providing any metadata we -have from the backing version control system. (For example, Mononoke will +have from the backing version control system. (For example, Mononoke will provide file sizes and SHA-1 hashes so Eden does not have to actually load the files and compute them.) To satisfy readdir() or stat() calls, however, we must give the entry an inode -number. Once an inode number has been allocated to an entry and handed out via +number. Once an inode number has been allocated to an entry and handed out via the filesystem, it must be remembered as long as programs can reasonably expect -them to be consistent. (e.g. for the program's lifetime or until a qualifying -"anything could happen" operation like `hg checkout`. See `#pragma -once` addendum below.) +them to be consistent. (e.g. for the program's lifetime or until a qualifying +"anything could happen" operation like `hg checkout`. See `#pragma once` +addendum below.) Inode metadata such as timestamps and permission bits, once accessed, should be -remembered as long as the inode numbers are. See `make` addendum below. When +remembered as long as the inode numbers are. See `make` addendum below. When Eden forgets an inode number, the timestamps and permission bits are forgotten -too. Moreover, when the inode number is forgotten, the inode numbers of its +too. Moreover, when the inode number is forgotten, the inode numbers of its children must be forgotten. There is only one type of inode metadata change that matters from the -perspective of version control: the user executable bit on files. If that bit +perspective of version control: the user executable bit on files. If that bit changes, the file and all of its parents must be marked potentially-modified. Other metadata changes are local-only and can be ignored by version control operations. -At the risk of repeating myself, here are some other rules. If a source control -tree entry has an inode number, its parent must also have an inode number. If -an inode is marked potentially-modified, its parent must also be marked -potentially-modified. Why? Because Eden needs to be able to crawl from the root +At the risk of repeating myself, here are some other rules. If a source control +tree entry has an inode number, its parent must also have an inode number. If an +inode is marked potentially-modified, its parent must also be marked +potentially-modified. Why? Because Eden needs to be able to crawl from the root tree and rapidly enumerate the potentially-modified set, even at process startup. During a checkout operation (or otherwise) we may determine that the contents of -a file or tree now matches its unmodified state. If so, to reduce the size of +a file or tree now matches its unmodified state. If so, to reduce the size of the tree Eden is tracking, it may dematerialize the tree (from the parents -down). Dematerialization must preserve inode numbers for any entries that may +down). Dematerialization must preserve inode numbers for any entries that may currently be referenced by FUSE, but since checkout is an "anything could happen" operation, inodes for other unmodified files could be forgotten. @@ -98,7 +98,7 @@ The previous section talks about inode numbers and inode metadata (e.g. timestamps, user, group, and mode bits). The other half of an inode is its data: the contents of a file (or symlink) and -the entries of a tree. (Note that it's possible for an inode's data to be +the entries of a tree. (Note that it's possible for an inode's data to be modified but metadata untouched or vice versa.) When an entry's parent is loaded, the child's name, type, and hash are known, @@ -131,19 +131,18 @@ control state at the end. [TODO: not sure where to put this section] This document talks about an inode entering and leaving the 'materialized' -state. It's a bit of an unintuitive concept. If an inode is materialized, -it is potentially modified relative to its original source control object, as +state. It's a bit of an unintuitive concept. If an inode is materialized, it is +potentially modified relative to its original source control object, as indicated by its parent's entry's source control hash. Note that being materialized is orthogonal to whether a file is considered -modified or not. If a file has been overwritten with its original contents, it +modified or not. If a file has been overwritten with its original contents, it will be materialized (at least temporarily) but not show up as modified from the -perspective of version control. On the other hand, if a subtree has been -renamed (imagine root/foo -> root/bar), then everything inside the subtree will -not be materialized, but will show up as modified from a status or diff -operation. +perspective of version control. On the other hand, if a subtree has been renamed +(imagine root/foo -> root/bar), then everything inside the subtree will not be +materialized, but will show up as modified from a status or diff operation. -If an inode is materialized, its parent must also be materialized. The +If an inode is materialized, its parent must also be materialized. The materialized status is used to rapidly determine which set of files is worth looking at when performing a status or diff operation. @@ -157,42 +156,42 @@ durability goals above? The InodeMap keeps track of loaded inodes and inodes that FUSE still has a reference to. -Note that the term "loaded" is used ambiguously in Eden. When talking about +Note that the term "loaded" is used ambiguously in Eden. When talking about whether an inode is loaded, it means that the InodeMap has in-memory data -tracking its state. On the other hand, a FileInode can have loaded its backing +tracking its state. On the other hand, a FileInode can have loaded its backing blob or not. (TODO: should we rename InodeMap's "loaded" and "unloaded" terminology to "known" and "remembered"?) -#### loadedInodes_ +#### loadedInodes\_ Inode tree nodes currently loaded in memory. -* For files, that includes their hashes, blob loading state, file handles - into the overlay, timestamps, and permission bits. -* For trees, that includes tree hashes, entries, timestamps. -* For both, the entry type, fuse reference count, internal reference count, +- For files, that includes their hashes, blob loading state, file handles into + the overlay, timestamps, and permission bits. +- For trees, that includes tree hashes, entries, timestamps. +- For both, the entry type, fuse reference count, internal reference count, location. -* If a child is in loadedInodes_, its parent must be in loadedInodes_ too. +- If a child is in loadedInodes*, its parent must be in loadedInodes* too. -#### unloadedInodes_ +#### unloadedInodes\_ -In-memory map from inode number to remembered inode state. When an inode is +In-memory map from inode number to remembered inode state. When an inode is unloaded, if it has a nonzero FUSE reference count, it is registered into this table, which contains: -* its FUSE refcount -* its hash (if not materialized) -* its permission bits -* parent inode number and child name (if not unlinked) +- its FUSE refcount +- its hash (if not materialized) +- its permission bits +- parent inode number and child name (if not unlinked) -If a child is in unloadedInodes_, its parent must be in unloadedInodes_ too. +If a child is in unloadedInodes*, its parent must be in unloadedInodes* too. -An inode cannot be in both loadedInodes_ and unloadedInodes_ at the same time. +An inode cannot be in both loadedInodes* and unloadedInodes* at the same time. If an inode has a nonzero FUSE reference count, it should exist in either -loadedInodes_ or unloadedInodes_. +loadedInodes* or unloadedInodes*. #### Overlay @@ -200,9 +199,9 @@ The Overlay is an on-disk map from inode number to its timestamps plus the file's or tree's contents. If a tree's child entry does not have a hash (that is, it's marked as -materialized), then data for that inode must be in the overlay. Because of this +materialized), then data for that inode must be in the overlay. Because of this invariant, we must write the child's overlay data prior to setting it -materialized in the parent. When dematerializing, we must mark the child as +materialized in the parent. When dematerializing, we must mark the child as dematerialized in the parent before deleting the child's overlay data, in case the process crashes in between those two operations. @@ -211,30 +210,33 @@ the process crashes in between those two operations. [This section may be incomplete.] Unknown ⟶ Loading: + - (First, load parent.) - If parent has this entry marked materialized, load child from overlay and - immediately transition to loaded. Otherwise... -- Insert entry in unloadedInodes_ + immediately transition to loaded. Otherwise... +- Insert entry in unloadedInodes\_ - Begin fetching object from ObjectStore Loading ⟶ Loaded: + - If this is a tree, when the load completes, check the overlay. - The overlay might have some remembered inode numbers. - TODO: if eden crashed while materializing up a tree, that state needs to be corrected or dropped here. - Construct Inode type -- Remove from unloadedInodes_ and insert into loadedInodes_ +- Remove from unloadedInodes* and insert into loadedInodes* Loaded ⟶ Unloaded: + - If the mount is being unmounted - If unlinked, remove it from the overlay (it can never be accessed again) - Otherwise, update metadata in Overlay - Otherwise (we probably need to remember the inode number) - If unlinked, remove it from the overlay - Otherwise, - - If fuseCount is nonzero, insert inode in unloadedInodes_ - - If inode is a tree and any of its children are in unloadedInodes_, - insert inode in unloadedInodes_ + - If fuseCount is nonzero, insert inode in unloadedInodes\_ + - If inode is a tree and any of its children are in unloadedInodes*, insert + inode in unloadedInodes* - Otherwise... forget everything about the inode. ### TreeInode State Machine @@ -242,10 +244,12 @@ Loaded ⟶ Unloaded: TreeInode can only make two state transitions: Unmaterialized ⟶ Materialized: + - When a tree is modified, it is marked materialized (recursively up the tree) - Its contents are written to the Overlay Materialized ⟶ Unmaterialized: + - When Eden notices the entries match the backing source control Tree, and it has no materialized children, it is marked dematerialized. - Note that the Tree's parent must be updated prior to removing the child's @@ -253,7 +257,7 @@ Materialized ⟶ Unmaterialized: ### FileInode State Machine -FileInode's transitions are relatively isolated and uninteresting. See the +FileInode's transitions are relatively isolated and uninteresting. See the comments in FileInode.h for details, but I'll enumerate the currently legal transitions here. @@ -270,7 +274,7 @@ transitions here. ### atime It is very hard and probably not useful for Eden to try to accurately maintain -last-access times for files. In fact, FUSE does not really try: +last-access times for files. In fact, FUSE does not really try: https://sourceforge.net/p/fuse/mailman/message/34448996/ @@ -286,20 +290,20 @@ rocksdb/src/db/memtable_list.h:40:7: error: previous definition of 'class rocksd The issue was that Eden would occasionally allocate a new inode number for a nonmaterialized file, and `#pragma once` relies on consistent inode numbers to -avoid including the same file twice. Previously, we had some open questions +avoid including the same file twice. Previously, we had some open questions about whether Eden really did need to provide 100% consistent inode numbers for nonmaterialized files, but it seems the answer is yes, at least while the mount is up (including graceful takeover). ### make -Make uses the filesystem to remember whether to rebuild a target. It does so -by comparing the mtime of the target with its dependencies. If the target is -newer than all dependency, it is not rebuilt. +Make uses the filesystem to remember whether to rebuild a target. It does so by +comparing the mtime of the target with its dependencies. If the target is newer +than all dependency, it is not rebuilt. For Eden to avoid spurious rebuilds with make projects, it must strive to remember mtimes allocated to unmodified files (and thus presumably the -unmodified file's inode number). If checking out from unmodified tree A to -tree B forgets that directory's inode numbers and the inode numbers of its -children, the mtimes allocated to the source files could appear to advance, -causing spurious builds. +unmodified file's inode number). If checking out from unmodified tree A to tree +B forgets that directory's inode numbers and the inode numbers of its children, +the mtimes allocated to the source files could appear to advance, causing +spurious builds. diff --git a/eden/fs/docs/Inodes.md b/eden/fs/docs/Inodes.md index 99bfb022169f9..5404c44e7afe6 100644 --- a/eden/fs/docs/Inodes.md +++ b/eden/fs/docs/Inodes.md @@ -1,16 +1,15 @@ -Inodes and the Linux VFS Layer -============================== +# Inodes and the Linux VFS Layer EdenFS represents the current file and directory state of a checkout using -[inodes](https://en.wikipedia.org/wiki/Inode). The inode data model is used by +[inodes](https://en.wikipedia.org/wiki/Inode). The inode data model is used by Linux's internal [VFS](https://www.kernel.org/doc/html/latest/filesystems/vfs.html) layer. -The VFS layer allows Linux to support many different filesystem -implementations. Each implementation provides a mechanism for exposing a -filesystem hierarchy to users, regardless of the underlying data -representation. Linux supports a variety of filesystem implementations for -storing data on disk, such as [ext4](https://en.wikipedia.org/wiki/Ext4), +The VFS layer allows Linux to support many different filesystem implementations. +Each implementation provides a mechanism for exposing a filesystem hierarchy to +users, regardless of the underlying data representation. Linux supports a +variety of filesystem implementations for storing data on disk, such as +[ext4](https://en.wikipedia.org/wiki/Ext4), [btrfs](https://en.wikipedia.org/wiki/Btrfs), and [xfs](https://en.wikipedia.org/wiki/XFS), as well as implementations that store data remotely over a network, such as @@ -18,105 +17,96 @@ data remotely over a network, such as [CIFS](https://en.wikipedia.org/wiki/Server_Message_Block). [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) is simply another -Linux filesystem implementation. Rather than storing data on a disk, it calls +Linux filesystem implementation. Rather than storing data on a disk, it calls out to a userspace process to allow that userspace process to choose how to -store and manage data. EdenFS uses FUSE to expose checkouts to users on Linux. -On macOS EdenFS uses [FUSE for macOS](https://osxfuse.github.io/), which -behaves very similarly to Linux FUSE and shares the same inode model. +store and manage data. EdenFS uses FUSE to expose checkouts to users on Linux. +On macOS EdenFS uses [FUSE for macOS](https://osxfuse.github.io/), which behaves +very similarly to Linux FUSE and shares the same inode model. ![Linux VFS Model](img/fuse_vfs.svg) One key aspect of inodes is that each inode is identified by a unique inode -number. The inode number is what uniquely identifies a specific file or -directory in the filesystem, not a file path. A specific inode may be present -in the file system at multiple paths (i.e., -[hard links](https://en.wikipedia.org/wiki/Hard_link)). Alternatively, an -inode may be unlinked from the filesystem and may continue to exist without a -path as long as there is still an open file handle referring to it. +number. The inode number is what uniquely identifies a specific file or +directory in the filesystem, not a file path. A specific inode may be present in +the file system at multiple paths (i.e., +[hard links](https://en.wikipedia.org/wiki/Hard_link)). Alternatively, an inode +may be unlinked from the filesystem and may continue to exist without a path as +long as there is still an open file handle referring to it. A inode may be either a regular file or directory, or even other special types of files such as symbolic links, block or character devices, sockets, and named pipes. -Each inode contains some common attributes, such as the owning user ID and -group ID, permissions, and file access, change, and modification timestamps. -The remaining inode contents depend on the inode type. In general a directory -inode contains a list of children entries, with each child entry consisting of -a name and the inode number for that child. Regular files contain the file -data. +Each inode contains some common attributes, such as the owning user ID and group +ID, permissions, and file access, change, and modification timestamps. The +remaining inode contents depend on the inode type. In general a directory inode +contains a list of children entries, with each child entry consisting of a name +and the inode number for that child. Regular files contain the file data. ![File and Directory Inodes](img/inode_contents.svg) - -EdenFS Inodes -============= +# EdenFS Inodes EdenFS contains two primary classes that represent inode state: `TreeInode` represents a directory inode, and `FileInode` represents any non-directory inode, including regular files, symlinks, and sockets. Both `TreeInode` and `FileInode` objects share a common `InodeBase` base class. -Inode objects are reference counted using the `InodePtr` smart-pointer type. -The [Inode Lifetime document](InodeLifetime.md) describes inode lifetime -management in more detail. +Inode objects are reference counted using the `InodePtr` smart-pointer type. The +[Inode Lifetime document](InodeLifetime.md) describes inode lifetime management +in more detail. The `TreeInode` and `FileInode` objects are grouped together in a hierarchy: -each `TreeInode` has a list of its children. In addition there is a separate +each `TreeInode` has a list of its children. In addition there is a separate `InodeMap` object to allow looking up inode objects by their inode number. ![Inode Tree](img/inode_tree.svg) -Note that since EdenFS loads file information lazily some children entries of -a `TreeInode` may not be loaded. When a `TreeInode` object is loaded we know -the names of all children inodes, but we do not actually create inode objects -in memory for the children until they are accessed. - +Note that since EdenFS loads file information lazily some children entries of a +`TreeInode` may not be loaded. When a `TreeInode` object is loaded we know the +names of all children inodes, but we do not actually create inode objects in +memory for the children until they are accessed. -Parent Pointers ---------------- +## Parent Pointers EdenFS inodes currently do maintain a pointer to the parent `TreeInode` that -contains them. This parent inode pointer is used for a few reasons: -* To determine the inode path so that we can record file change information in +contains them. This parent inode pointer is used for a few reasons: + +- To determine the inode path so that we can record file change information in the journal, since journal events are recorded by path name. -* To materialize the parent directory whenever an inode is materialized. -* To include the inode path in some debug log messages, for easier debugging. +- To materialize the parent directory whenever an inode is materialized. +- To include the inode path in some debug log messages, for easier debugging. This does mean that EdenFS does not currently support hard links, as we require -that an inode can only have a single parent directory. Source control systems +that an inode can only have a single parent directory. Source control systems generally cannot represent hard links, so hard links can never be checked in to -the repository. However build systems and other tools do sometimes want to be +the repository. However build systems and other tools do sometimes want to be able to create new hard links on the filesystem, even if they cannot ever be -committed. In the future it may be worth relaxing this requirement in order to +committed. In the future it may be worth relaxing this requirement in order to support hard links in EdenFS. - -Inode Number Allocation ------------------------ +## Inode Number Allocation Inode numbers are chosen by the underlying filesystem implementation when files -are created. In traditional disk-based filesystems the inode number may -represent information about where the data for the inode is stored; for -instance a block ID or an index into a table or hash map. - -Since EdenFS only lazily loads file information when it is accessed, EdenFS -does not necessarily know the full file or directory structure when a checkout -is first mounted or when it is changed to point to new commit state. Therefore -EdenFS assigns inode numbers on demand when an inode is first accessed. EdenFS +are created. In traditional disk-based filesystems the inode number may +represent information about where the data for the inode is stored; for instance +a block ID or an index into a table or hash map. + +Since EdenFS only lazily loads file information when it is accessed, EdenFS does +not necessarily know the full file or directory structure when a checkout is +first mounted or when it is changed to point to new commit state. Therefore +EdenFS assigns inode numbers on demand when an inode is first accessed. EdenFS simply allocates inode numbers using a monotonically increasing 64-bit ID. Note that EdenFS does need to remember the inode numbers that have been -assigned. This information is stored in the directory state in the overlay. -Note that this means that simply listing a directory does cause it to be -written to the overlay, since we have to record the inode numbers that -have been assigned to its children. In this case we will store the directory -in the overlay, but as long as the child contents are still the same as the -original source control tree we will record the source control tree ID in the -overlay too. - +assigned. This information is stored in the directory state in the overlay. Note +that this means that simply listing a directory does cause it to be written to +the overlay, since we have to record the inode numbers that have been assigned +to its children. In this case we will store the directory in the overlay, but as +long as the child contents are still the same as the original source control +tree we will record the source control tree ID in the overlay too. -Inode Materialization -===================== +# Inode Materialization One key attribute of EdenFS inodes is whether the inode is [materialized](Glossary.md#materialized--non-materialized) or not. @@ -135,49 +125,48 @@ repository using the `ObjectStore` API. ![Non-materialized Inode Data](img/non_materialized_inode.svg) However, if a file is modified we can no longer fetch the contents from source -control. EdenFS then removes the source control object ID from the inode +control. EdenFS then removes the source control object ID from the inode information, and instead stores the file contents in the [overlay](Glossary.md#overlay). ![Materialized Inode Data](img/materialized_inode.svg) -`FileInode`s are materialized when their contents are updated. -`TreeInode`s are materialized when a child entry is added or removed. Note -that as discussed above in [Inode Allocation](#inode-allocation) directory -information is often stored in the overlay even when the directory is -technically not materialized so that we can track the inode numbers that have -been allocated to the directory children. +`FileInode`s are materialized when their contents are updated. `TreeInode`s are +materialized when a child entry is added or removed. Note that as discussed +above in [Inode Allocation](#inode-allocation) directory information is often +stored in the overlay even when the directory is technically not materialized so +that we can track the inode numbers that have been allocated to the directory +children. -Even if the file is later modified to have its original contents again -EdenFS may keep the file materialized, as we do not have an efficient way -to determine that the contents now correspond to an existing source control -object ID again. EdenFS may de-materialize the file later during a subsequent -checkout or commit operation if it detects that the file contents are the same -as the object at that location in the newly checked out commit. +Even if the file is later modified to have its original contents again EdenFS +may keep the file materialized, as we do not have an efficient way to determine +that the contents now correspond to an existing source control object ID again. +EdenFS may de-materialize the file later during a subsequent checkout or commit +operation if it detects that the file contents are the same as the object at +that location in the newly checked out commit. Note that an inode's materialization state is orthogonal to the -[Loaded / Unloaded](#loaded--unloaded) state. All 4 possible combinations of +[Loaded / Unloaded](#loaded--unloaded) state. All 4 possible combinations of loaded/unloaded and materialized/non-materialized states are possible. Materialized inodes can be unloaded, and this state is common when EdenFS first starts and re-mounts an existing checkout that contains materialized inodes. Additionally, note that whether an inode is materialized is also completely -independent from whether it is "modified" from a source control perspective. -For instance, renaming a non-materialized file to a new location does not cause -it to be materialized, since the file contents still correspond to a known -source control object. However, from a source control perspective the file -must now be reported as modified. Operations like `hg reset` that update the -pointer to the currently checked out commit without changing the working -directory state can also result in files that must now be reported as modified -from the current commit, even if they are non-materialized. - -Parent Materialization ----------------------- - -Whenever a file or directory inode is materialized its parent inode must also -be materialized: since the file no longer corresponds to a known source control +independent from whether it is "modified" from a source control perspective. For +instance, renaming a non-materialized file to a new location does not cause it +to be materialized, since the file contents still correspond to a known source +control object. However, from a source control perspective the file must now be +reported as modified. Operations like `hg reset` that update the pointer to the +currently checked out commit without changing the working directory state can +also result in files that must now be reported as modified from the current +commit, even if they are non-materialized. + +## Parent Materialization + +Whenever a file or directory inode is materialized its parent inode must also be +materialized: since the file no longer corresponds to a known source control object, the parent directory also no longer corresponds to a known source -control tree. The object IDs in the source control tree fully identify the +control tree. The object IDs in the source control tree fully identify the contents of each directory entry, so whenever the contents of any directory change the directory contents itself are effectively modified. @@ -187,14 +176,14 @@ a number of children entries: ![TreeInode Pre-materialization](img/tree_inode_pre_materialization.svg) If the `install.sh` file is updated, its `FileInode` is materialized and the -source control object ID is removed from it. The entry for this child also -must be updated in its parent `TreeInode`, and the source control object ID -must be removed from this parent `TreeInode`: +source control object ID is removed from it. The entry for this child also must +be updated in its parent `TreeInode`, and the source control object ID must be +removed from this parent `TreeInode`: ![TreeInode Post-materialization](img/tree_inode_post_materialization.svg) Note that this process must be done recursively all the way up to the root inode: the parent `TreeInode` of inode 97 must also be materialized, since one -of its children was materialized. This materialization process continues -walking upwards to the root inode until it finds a `TreeInode` that has already -been materialized, as it can stop its upwards walk there. +of its children was materialized. This materialization process continues walking +upwards to the root inode until it finds a `TreeInode` that has already been +materialized, as it can stop its upwards walk there. diff --git a/eden/fs/docs/Overview.md b/eden/fs/docs/Overview.md index c18f0839b6c75..2e51b3ff2135c 100644 --- a/eden/fs/docs/Overview.md +++ b/eden/fs/docs/Overview.md @@ -1,47 +1,44 @@ -EdenFS Overview -=============== +# EdenFS Overview EdenFS is a virtual filesystem designed for efficiently serving large source control repositories. In particular, EdenFS is targeted at massive -[monorepos](https://en.wikipedia.org/wiki/Monorepo), where a single -repository may contain numerous projects, potentially spanning many millions of -files in total. In most situations individual developers may only need to -interact with a fraction of the files in the repository when working on their -specific projects. EdenFS speeds up workflows in this case by lazily fetching -file data, so that it only needs to fetch file information for portions of the -repository that are actually used. +[monorepos](https://en.wikipedia.org/wiki/Monorepo), where a single repository +may contain numerous projects, potentially spanning many millions of files in +total. In most situations individual developers may only need to interact with a +fraction of the files in the repository when working on their specific projects. +EdenFS speeds up workflows in this case by lazily fetching file data, so that it +only needs to fetch file information for portions of the repository that are +actually used. EdenFS aims to speed up several different types of operations: -* Determining files modified from the current source control state. - (e.g., computing the output for `hg status` or `git status`) -* Switching the filesystem state from one commit to another. - (e.g., performing an `hg checkout` or `git checkout` operation). -* Tracking and delivering notifications about modified files. - EdenFS can deliver notifications of file changes events through Watchman, - to allow downstream tools like build tools and IDEs to build functionality - that depends on file notification events. + +- Determining files modified from the current source control state. (e.g., + computing the output for `hg status` or `git status`) +- Switching the filesystem state from one commit to another. (e.g., performing + an `hg checkout` or `git checkout` operation). +- Tracking and delivering notifications about modified files. EdenFS can deliver + notifications of file changes events through Watchman, to allow downstream + tools like build tools and IDEs to build functionality that depends on file + notification events. Additionally, EdenFS also provides several additional features like efficiently -returning file content hashes. This allows downstream build tools to retrieve +returning file content hashes. This allows downstream build tools to retrieve file hashes without actually needing to read and hash the file contents. +## Operating System Interface -Operating System Interface --------------------------- - -EdenFS is supported on Linux, macOS, and Windows. The mechanism used to -interact with the filesystem layer is different across these three different -platforms. +EdenFS is supported on Linux, macOS, and Windows. The mechanism used to interact +with the filesystem layer is different across these three different platforms. On Linux, EdenFS uses [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) to provide filesystem functionality. -On macOS, EdenFS uses either [FUSE for -macOS](https://osxfuse.github.io/) (which behaves very similarly to Linux FUSE) -or [NFSv3](https://datatracker.ietf.org/doc/html/rfc1813). As Apple moves to +On macOS, EdenFS uses either [FUSE for macOS](https://osxfuse.github.io/) (which +behaves very similarly to Linux FUSE) or +[NFSv3](https://datatracker.ietf.org/doc/html/rfc1813). As Apple moves to deprecate kernel extensions, EdenFS on macOS will move towards using NFS exclusively. @@ -50,38 +47,34 @@ On Windows, EdenFS uses Microsoft's This behaves fairly differently from FUSE and NFS, but EdenFS still shares most of the same internal logic for tracking file state. -Parts of this design discussion focus primarily on the Linux and -macOS implementations. On Windows, the interface to the OS behaves a bit -differently, but internally EdenFS still tracks its state using the same inode -structure that is used on Linux and macOS. +Parts of this design discussion focus primarily on the Linux and macOS +implementations. On Windows, the interface to the OS behaves a bit differently, +but internally EdenFS still tracks its state using the same inode structure that +is used on Linux and macOS. - -High-Level Design -================= +# High-Level Design The following documents describe the design of relatively high-level aspects of EdenFS's behavior: -* [Process Overview](./Process_State.md) -* [Source Control Data Model](./Data_Model.md) -* [Inodes](./Inodes.md) -* [Glossary](./Glossary.md) - +- [Process Overview](./Process_State.md) +- [Source Control Data Model](./Data_Model.md) +- [Inodes](./Inodes.md) +- [Glossary](./Glossary.md) -Design Specifics -================ +# Design Specifics The following documents cover specific features and implementation details in more depth: -* [Configuration](./Config.md) -* [Caching](./Caching.md) -* [Globbing](./Globbing.md) -* [Inode Lifetime Management](./InodeLifetime.md) -* [Inode Locking](./InodeLocks.md) -* [Inode Storage](./InodeStorage.md) -* [Path Handling](./Paths.md) -* [Rename Handling](./Rename.md) -* [Redirections](./Redirections.md) -* [Threading](./Threading.md) -* [Windows](./Windows.md) +- [Configuration](./Config.md) +- [Caching](./Caching.md) +- [Globbing](./Globbing.md) +- [Inode Lifetime Management](./InodeLifetime.md) +- [Inode Locking](./InodeLocks.md) +- [Inode Storage](./InodeStorage.md) +- [Path Handling](./Paths.md) +- [Rename Handling](./Rename.md) +- [Redirections](./Redirections.md) +- [Threading](./Threading.md) +- [Windows](./Windows.md) diff --git a/eden/fs/docs/Paths.md b/eden/fs/docs/Paths.md index 5f37341baf405..586d53b3b12c8 100644 --- a/eden/fs/docs/Paths.md +++ b/eden/fs/docs/Paths.md @@ -15,75 +15,74 @@ is stored or non-stored (`Piece`). ### `PathComponent`/`PathComponentPiece` -* Represents a name within a directory -* Illegal to - * Contain directory separator ("/" or "\" on Windows) - * Be empty - * Be a relative component (".." or "..") +- Represents a name within a directory +- Illegal to + - Contain directory separator ("/" or "\" on Windows) + - Be empty + - Be a relative component (".." or "..") ### `RelativePath`/`RelativePathPiece` -* Represents any number of `PathComponent(Piece)`s strung together -* Illegal to begin with or be composed with an `AbsolutePath(Piece)` -* Allowed to be empty +- Represents any number of `PathComponent(Piece)`s strung together +- Illegal to begin with or be composed with an `AbsolutePath(Piece)` +- Allowed to be empty ### `AbsolutePath`/`AbsolutePathPiece` -* Must begin with a "/" or "\\?\" on Windows -* On Windows, the path separator is always a "\" -* May be composed with `PathComponent`s and `RelativePath`s -* May not be composed with other `AbsolutePath`s +- Must begin with a "/" or "\\?\" on Windows +- On Windows, the path separator is always a "\" +- May be composed with `PathComponent`s and `RelativePath`s +- May not be composed with other `AbsolutePath`s ## Construction -* Paths can be constructed from the following types: - * `folly::StringPiece` - * Stored path - * Non-stored path - * Default constructed to an empty value -* Paths can be move-constructed from `std::string` values and Stored values. +- Paths can be constructed from the following types: + - `folly::StringPiece` + - Stored path + - Non-stored path + - Default constructed to an empty value +- Paths can be move-constructed from `std::string` values and Stored values. ## Comparisons -* Comparisons can be made between Stored and Piece variations of the same type, +- Comparisons can be made between Stored and Piece variations of the same type, meaning one can compare a `RelativePath` to a `RelativePathPiece`, but cannot compare a `RelativePath` to an `AbsolutePath`. ## Iterator -* `ComposedPathIterator` - Used for iteration of a `RelativePath`/`AbsolutePath` +- `ComposedPathIterator` - Used for iteration of a `RelativePath`/`AbsolutePath` using various iteration methods (`paths()`, `allPaths()`, `suffixes()`, `findParents()`). An iterator over prefixes of a composed path. Iterating yields a series of composed path elements. For example, iterating the path "foo/bar/baz" will yield this series of Piece elements: - 1. "/" but only for `AbsolutePath` ("\\?\" on Windows) - 2. "foo" - 3. "foo/bar" - 4. "foo/bar/baz" -* Note: You may use the `dirname()` and `basename()` methods to focus on the + 1. "/" but only for `AbsolutePath` ("\\?\" on Windows) + 2. "foo" + 3. "foo/bar" + 4. "foo/bar/baz" +- Note: You may use the `dirname()` and `basename()` methods to focus on the portions of interest. -* `PathComponentIterator`- Used for iteration of a ComposedPath using the +- `PathComponentIterator`- Used for iteration of a ComposedPath using the iteration method `components()`. An iterator over components of a composed path. Iterating yields a series of independent path elements. For example, iterating the relative path "foo/bar/baz" will yield this series of PathComponentPiece elements: - 1. "foo" - 2. "bar" - 3. "baz" -* Note: Iterating the absolute path "/foo/bar/baz" would also yield the same + 1. "foo" + 2. "bar" + 3. "baz" +- Note: Iterating the absolute path "/foo/bar/baz" would also yield the same sequence. ## Lifetime -All the stored paths are merely a wrapper around an `std::string`, and the -piece version are also just a wrapper on top of a `folly::StringPiece` (which -has similar semantic as `std::string_view`), that is, a piece -merely holds a view of to the underlying `std::string` buffer. When a -stored path is being moved, the held `std::string` is also moved, which in most -cases prevents copying and re-allocating a string, this makes the move -operation fairly cheap and since the pieces were a view on that -first string memory allocation, these are still viewing valid and allocated -memory. +All the stored paths are merely a wrapper around an `std::string`, and the piece +version are also just a wrapper on top of a `folly::StringPiece` (which has +similar semantic as `std::string_view`), that is, a piece merely holds a view of +to the underlying `std::string` buffer. When a stored path is being moved, the +held `std::string` is also moved, which in most cases prevents copying and +re-allocating a string, this makes the move operation fairly cheap and since the +pieces were a view on that first string memory allocation, these are still +viewing valid and allocated memory. However, `std::string` have an optimization where small strings aren't heap allocated, but are stored in the `std::string` object itself, this is called SSO @@ -99,40 +98,40 @@ stored path is small enough that the SSO kicks-in. ## Utility Functions -* `stringPiece()` - Returns the path as a `folly::StringPiece` -* `copy()` - Returns a stored (deep) copy of this path -* `piece()` - Returns a non-stored (shallow) copy of this path -* `value()` - Returns a reference to the underlying stored value -* `basename()` - Given a path like "a/b/c", returns "c" -* `dirname()` - Given a path like "a/b/c", returns "a/b" -* `getcwd()` - Gets the current working directory as an `AbsolutePath` -* `canonicalPath()` - Removes duplicate "/" characters, resolves "/./" and +- `stringPiece()` - Returns the path as a `folly::StringPiece` +- `copy()` - Returns a stored (deep) copy of this path +- `piece()` - Returns a non-stored (shallow) copy of this path +- `value()` - Returns a reference to the underlying stored value +- `basename()` - Given a path like "a/b/c", returns "c" +- `dirname()` - Given a path like "a/b/c", returns "a/b" +- `getcwd()` - Gets the current working directory as an `AbsolutePath` +- `canonicalPath()` - Removes duplicate "/" characters, resolves "/./" and "/../" components. "//foo" is converted to "/foo". Does not resolve symlinks. If the path is relative, the current working directory is prepended to it. This succeeds even if the input path does not exist -* `joinAndNormalize()` - canonicalize a path string relative to a relative path +- `joinAndNormalize()` - canonicalize a path string relative to a relative path base -* `relpath()` - Converts an arbitrary unsanitized input string to a normalized +- `relpath()` - Converts an arbitrary unsanitized input string to a normalized `AbsolutePath`. This resolves symlinks, as well as "." and "." components in the input path. If the input path is a relative path, it is converted to an absolute path. This throws if the input path does not exist or if a parent directory is inaccessible -* `expandUser()` - Returns a new path with `~` replaced by the path to the +- `expandUser()` - Returns a new path with `~` replaced by the path to the current user's home directory. This function does not support expanding the home dir of arbitrary users, and will throw an exception if the string starts with `~` but not `~/`. The resulting path will be passed through `canonicalPath()` and returned -* `normalizeBestEffort()` - Attempts to normalize a path by first attempting +- `normalizeBestEffort()` - Attempts to normalize a path by first attempting `relpath()` and falling back to `canonicalPath()` on failure. -* `splitFirst()` - Splits a path into the first component and the remainder of +- `splitFirst()` - Splits a path into the first component and the remainder of the path. If the path has only one component, the remainder will be empty. If the path is empty, an exception is thrown -* `ensureDirectoryExists()` - Ensures that the specified path exists as a +- `ensureDirectoryExists()` - Ensures that the specified path exists as a directory. This creates the specified directory if necessary, creating any parent directories as required as well. Returns true if the directory was created, and false if it already existed. Throws an exception on error, including if the path or one of its parent directories is a file rather than a directory -* `removeRecursively()` - Recursively removes a directory tree. Returns false if +- `removeRecursively()` - Recursively removes a directory tree. Returns false if the directory did not exist in the first place, and true if the directory was successfully removed. Throws an exception on error. diff --git a/eden/fs/docs/Process_State.md b/eden/fs/docs/Process_State.md index a8ac63acf0ae4..544b6860bdfc9 100644 --- a/eden/fs/docs/Process_State.md +++ b/eden/fs/docs/Process_State.md @@ -1,41 +1,38 @@ -The EdenFS Daemon -================= +# The EdenFS Daemon -EdenFS runs as a normal user space process. In general each user on a system -will have their own long-lived EdenFS daemon process. The EdenFS daemon -provides two primary interfaces for other processes to interact with it. +EdenFS runs as a normal user space process. In general each user on a system +will have their own long-lived EdenFS daemon process. The EdenFS daemon provides +two primary interfaces for other processes to interact with it. The first of these is the file system interface ([FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) on Linux, [NFS](https://datatracker.ietf.org/doc/html/rfc1813)/[macFUSE](https://osxfuse.github.io/) on macOS, and [Projected FS](https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system) -on Windows), through which it exposes virtual filesystems. Other -applications can interact with files and directories in EdenFS checkouts just -like they would on any other normal local filesystem. This allows other -applications to transparently interact with EdenFS checkouts without needing -any specific knowledge of EdenFS. +on Windows), through which it exposes virtual filesystems. Other applications +can interact with files and directories in EdenFS checkouts just like they would +on any other normal local filesystem. This allows other applications to +transparently interact with EdenFS checkouts without needing any specific +knowledge of EdenFS. -Additionally, EdenFS also exposes a thrift interface. This allows EdenFS-aware +Additionally, EdenFS also exposes a thrift interface. This allows EdenFS-aware applications to perform additional functionality that is not available through -standard filesystem APIs. For instance, EdenFS exposes thrift APIs for -checking out a different source control commit, comparing the current file -system state to a given source control commit, efficiently performing glob -queries against file names, getting file hashes, etc. +standard filesystem APIs. For instance, EdenFS exposes thrift APIs for checking +out a different source control commit, comparing the current file system state +to a given source control commit, efficiently performing glob queries against +file names, getting file hashes, etc. ![High Level System Overview](img/system_overview.svg) +# Mount Points -Mount Points -============ - -A single EdenFS daemon can manage multiple checkouts for the user. On Linux -and Mac each checkout is exposed as a separate filesystem -[mount point](https://en.wikipedia.org/wiki/Mount_%28computing%29). On -Windows, each checkout is a separate ProjectedFS virtualization root. +A single EdenFS daemon can manage multiple checkouts for the user. On Linux and +Mac each checkout is exposed as a separate filesystem +[mount point](https://en.wikipedia.org/wiki/Mount_%28computing%29). On Windows, +each checkout is a separate ProjectedFS virtualization root. When the user clones a new checkout, EdenFS creates a new file system mount -point to expose the checkout. To remove a checkout users should use the +point to expose the checkout. To remove a checkout users should use the `edenfsctl rm` command: mount points cannot be removed normally with `rmdir`, but must instead be unmounted. @@ -45,42 +42,40 @@ a system, serving 3 different checkouts, named `fbsource1`, `fbsource2`, and ![Checkout Mount Points](img/edenfs_mounts.svg) -Each red box represents a mount point, the kernel's interface for -exposing a file system to user-space applications. The orange "FUSE" boxes -represent the FUSE interface between the EdenFS daemon and the kernel for -processing file system requests for a particular mount point. Two additional -mount points mapping to local on-disk file systems are also shown, at `/` and -`/data`. In this diagram EdenFS stores its own state in a directory under the -`/data` partition, which it accesses through the `/data` mount point. +Each red box represents a mount point, the kernel's interface for exposing a +file system to user-space applications. The orange "FUSE" boxes represent the +FUSE interface between the EdenFS daemon and the kernel for processing file +system requests for a particular mount point. Two additional mount points +mapping to local on-disk file systems are also shown, at `/` and `/data`. In +this diagram EdenFS stores its own state in a directory under the `/data` +partition, which it accesses through the `/data` mount point. Note that EdenFS could have been implemented by having a single daemon per -checkout, instead of a single daemon per user. There are various trade-offs -between these two design choices. Using a daemon per checkout would have -potentially provided better isolation between checkouts--if something goes -wrong with the EdenFS daemon managing one checkout then other EdenFS checkouts -can still continue running normally. The main factor that led us to choose a -single daemon per user is that this allows better sharing of resources between -checkouts. If users have multiple different checkouts of the same underlying +checkout, instead of a single daemon per user. There are various trade-offs +between these two design choices. Using a daemon per checkout would have +potentially provided better isolation between checkouts--if something goes wrong +with the EdenFS daemon managing one checkout then other EdenFS checkouts can +still continue running normally. The main factor that led us to choose a single +daemon per user is that this allows better sharing of resources between +checkouts. If users have multiple different checkouts of the same underlying repository we can more easily share source control state across all checkouts, -and avoid re-fetching and keeping multiple copies of data in memory. Having a +and avoid re-fetching and keeping multiple copies of data in memory. Having a single daemon per user also makes process management and upgrades slightly simpler, as there is only a single process per user to maintain. - -High Level Data Structures -========================== +# High Level Data Structures Within the EdenFS daemon there are several different high level components. ![Data Structures](img/edenfs_data_structures.svg) -In the code there is a single `EdenServer` object that manages all state for -the daemon. It has a single `EdenServiceHandler` object, which acts as the -thrift service handler, and is responsible for responding to all thrift -requests received over EdenFS's Unix domain thrift socket. +In the code there is a single `EdenServer` object that manages all state for the +daemon. It has a single `EdenServiceHandler` object, which acts as the thrift +service handler, and is responsible for responding to all thrift requests +received over EdenFS's Unix domain thrift socket. Additionally, there are several data per-[checkout](Glossary.md#checkout) data -structures. The `EdenMount` class contains all in-memory state for a checkout, +structures. The `EdenMount` class contains all in-memory state for a checkout, and the `EdenServer` maintains a map of all currently active `EdenMount` objects. @@ -90,57 +85,54 @@ checkout file and directory state, an `Overlay` object that tracks locally modified files and directories, and a `Journal` that records all recent modifying file I/O operations. -There is also some per-[repository](Glossary.md#repository) state. Note that +There is also some per-[repository](Glossary.md#repository) state. Note that from EdenFS's perspective, the repository is the location where it fetches source control data, whereas the checkout is the working directory view that -EdenFS exposes to users. A user may have multiple checkouts of the same +EdenFS exposes to users. A user may have multiple checkouts of the same repository, each providing a different working directory view of the repository data, potentially with different commits checked out in each. -The `ObjectStore` class provides the internal API that EdenFS uses to fetch -data from a source control repository. The `ObjectStore` is split into three -components internally: an in-memory cache that locally caches -fetched data, and a `LocalStore` to provide on-disk cache RocksDB, -a `BackingStore` that is responsible for actually fetching data from the repository. -They are in a chain of responsibility structure - if one of them fails, -it will send the request to the next layer. The primary `BackingStore` -implementation used by EdenFS is the `SaplingBackingStore` class, which fetches data -from an Sapling repository. (The name `SaplingBackingStore` dates back to before Sapling -was differentiated from Mercurial.) EdenFS also has a `GitBackingStore` implementation -that can fetch data from a git repository. However, as the git CLI does not currently have -any knowledge of EdenFS this is not particularly usable in practice: while -EdenFS can show a view of a git repository, operations like `git status` or +The `ObjectStore` class provides the internal API that EdenFS uses to fetch data +from a source control repository. The `ObjectStore` is split into three +components internally: an in-memory cache that locally caches fetched data, and +a `LocalStore` to provide on-disk cache RocksDB, a `BackingStore` that is +responsible for actually fetching data from the repository. They are in a chain +of responsibility structure - if one of them fails, it will send the request to +the next layer. The primary `BackingStore` implementation used by EdenFS is the +`SaplingBackingStore` class, which fetches data from an Sapling repository. (The +name `SaplingBackingStore` dates back to before Sapling was differentiated from +Mercurial.) EdenFS also has a `GitBackingStore` implementation that can fetch +data from a git repository. However, as the git CLI does not currently have any +knowledge of EdenFS this is not particularly usable in practice: while EdenFS +can show a view of a git repository, operations like `git status` or `git checkout` are not EdenFS aware. +# Persistent State Management -Persistent State Management -=========================== - -EdenFS stores its own state on local disk, in the EdenFS state directory. The -default location for this directory depends on your system configuration, but -is often `~/.eden` or `~/local/.eden`. On Windows, this is commonly located at +EdenFS stores its own state on local disk, in the EdenFS state directory. The +default location for this directory depends on your system configuration, but is +often `~/.eden` or `~/local/.eden`. On Windows, this is commonly located at `C:\Users\\.eden`. It is possible for a user to run multiple EdenFS daemons if you specify a -different state directory for each daemon. The state directory path can be -specified with the `--config-dir` parameter to the `edenfsctl` tool. In -general most users will not need to do this, but it can be useful for -developers in order to run a test version of EdenFS for development, separate -from their main EdenFS instance. +different state directory for each daemon. The state directory path can be +specified with the `--config-dir` parameter to the `edenfsctl` tool. In general +most users will not need to do this, but it can be useful for developers in +order to run a test version of EdenFS for development, separate from their main +EdenFS instance. The state directory contains a number of different items: ### `config.json` -This file contains the list of currently configured checkouts. It contains a +This file contains the list of currently configured checkouts. It contains a JSON dictionary mapping the absolute path to the checkout to the name of the EdenFS `clients/` state directory that contains the state for this checkout. -Early on in EdenFS development we stored most configuration as JSON, but we -have since migrated most configuration to -[TOML](https://github.com/toml-lang/toml). This is the last remaining -configuration file that is still JSON, and eventually its state should probably -be moved into the `config.toml` file below. +Early on in EdenFS development we stored most configuration as JSON, but we have +since migrated most configuration to [TOML](https://github.com/toml-lang/toml). +This is the last remaining configuration file that is still JSON, and eventually +its state should probably be moved into the `config.toml` file below. ### `config.toml` @@ -148,23 +140,23 @@ This file contains other instance-wide EdenFS configuration settings. ### `logs/edenfs.log` -This is the main EdenFS log file. This contains logs from the EdenFS daemon, -in [Google Logging](https://github.com/google/glog) format: +This is the main EdenFS log file. This contains logs from the EdenFS daemon, in +[Google Logging](https://github.com/google/glog) format: `LmmDD HH:MM:SS.USECS THREAD FILE:LINE] MSG` -* L: A 1-character code describing the log level (e.g., E for error, W for +- L: A 1-character code describing the log level (e.g., E for error, W for warning, I for info, V for "verbose" debug messages) -* mm: 2-digit month -* DD: 2-digit day -* HH: 2-digit hour, 24-hour format -* MM: 2-digit minute -* SS: 2-digit second -* USECS: 6-digit microseconds -* THREAD: Thread ID -* FILE: Filename (just the last component) -* LINE: Line number -* MSG: The actual log message +- mm: 2-digit month +- DD: 2-digit day +- HH: 2-digit hour, 24-hour format +- MM: 2-digit minute +- SS: 2-digit second +- USECS: 6-digit microseconds +- THREAD: Thread ID +- FILE: Filename (just the last component) +- LINE: Line number +- MSG: The actual log message In addition, output from commands spawned by EdenFS will also be directed to this log file. @@ -176,28 +168,28 @@ This directory primarily contains a cache of imported source control data. ### `clients/` The `clients` subdirectory contains one subdirectory per checkout managed by -EdenFS. The `config.json` file listed above contains the mapping between the +EdenFS. The `config.json` file listed above contains the mapping between the absolute paths of checkouts to their `clients/` state directory. Note that this directory is called `clients/` largely for historical reasons. -Early on in EdenFS development we used the term "client" to refer to a -checkout. This is one of very few remaining locations where this old -terminology is still used. Eventually it might be nice to rename this -directory, but we would need to build some tooling to help migrate existing -EdenFS state directories in order to do so. +Early on in EdenFS development we used the term "client" to refer to a checkout. +This is one of very few remaining locations where this old terminology is still +used. Eventually it might be nice to rename this directory, but we would need to +build some tooling to help migrate existing EdenFS state directories in order to +do so. ### `clients/NAME/config.toml` Inside each checkout state directory the `config.toml` file contains some -details about the checkout configuration. In particular, this includes +details about the checkout configuration. In particular, this includes information about the backing repository where source control data for this checkout can be found. ### `clients/NAME/SNAPSHOT` -This file contains the ID of the source control commit that is currently -checked out in this checkout. Each time a user checks out a different commit -EdenFS will update this file. +This file contains the ID of the source control commit that is currently checked +out in this checkout. Each time a user checks out a different commit EdenFS will +update this file. ### `clients/NAME/local/` @@ -205,34 +197,31 @@ This directory contains the overlay state for the checkout: information about files and directories that have been locally modified. For files that have never been modified, EdenFS knows it can retrieve the file -contents from source control. Therefore EdenFS normally only need to track a -source control object ID that can be used to fetch the file contents. However, -once a file has been modified locally there is no longer a source control -object ID that can be used to fetch the file contents. Therefore EdenFS must -store the full file contents, and it does so in this directory. +contents from source control. Therefore EdenFS normally only need to track a +source control object ID that can be used to fetch the file contents. However, +once a file has been modified locally there is no longer a source control object +ID that can be used to fetch the file contents. Therefore EdenFS must store the +full file contents, and it does so in this directory. Files that have been locally modified are referred to as -["materialized"](Glossary.md#materialized--non-materialized). Files are -tracked in the overlay if and only if they are materialized. - +["materialized"](Glossary.md#materialized--non-materialized). Files are tracked +in the overlay if and only if they are materialized. -The Privhelper Process -====================== +# The Privhelper Process For the most part EdenFS runs as a normal, unprivileged process with the -permissions of the user that started it. However, on Linux and Mac mounting -and unmounting directories requires elevated privileges, and cannot be done by +permissions of the user that started it. However, on Linux and Mac mounting and +unmounting directories requires elevated privileges, and cannot be done by normal users. In order to overcome this, EdenFS runs as a pair of processes on Linux and Mac: the main process running as the user, and a privileged helper process running -with root privileges. The privileged helper process exists only to perform -mount and unmount operations. +with root privileges. The privileged helper process exists only to perform mount +and unmount operations. -In order to start the privhelper process EdenFS must be started as root. This +In order to start the privhelper process EdenFS must be started as root. This can either be done by installing EdenFS as -[setuid](https://en.wikipedia.org/wiki/Setuid) root, or by invoking EdenFS -using `sudo`. Once EdenFS starts it immediately forks off the privileged -helper process and then drops privileges. The main EdenFS process can then -send mount and unmount requests to the privhelper process over a Unix domain -socket pair. +[setuid](https://en.wikipedia.org/wiki/Setuid) root, or by invoking EdenFS using +`sudo`. Once EdenFS starts it immediately forks off the privileged helper +process and then drops privileges. The main EdenFS process can then send mount +and unmount requests to the privhelper process over a Unix domain socket pair. diff --git a/eden/fs/docs/Redirections.md b/eden/fs/docs/Redirections.md index 7365b92b1d6b5..8c1ffa5bf57bb 100644 --- a/eden/fs/docs/Redirections.md +++ b/eden/fs/docs/Redirections.md @@ -1,74 +1,70 @@ -Redirections -============ +# Redirections EdenFS's main performance advantages come from lazily fetching data from source control, which is beneficial when checking out and reading files checked in to -source control. However, many applications also want to modify files inside the +source control. However, many applications also want to modify files inside the checkout, or write new files. Unfortunately these modifying I/O operations are usually slower when using -EdenFS, compared to writing directly to local disk. This is because these I/O +EdenFS, compared to writing directly to local disk. This is because these I/O operations have to traverse through the kernel multiple times, instead of just once. When writing to a normal on-disk filesystem, the I/O operation is normally -handled directly in the kernel, which will store the data to disk. However, -when writing to an EdenFS mount point the kernel must send the I/O request to -EdenFS. EdenFS will then perform the write operation by updating the -corresponding file in its overlay. The overlay state is stored on local disk, -so this requires a separate I/O operation to the kernel, which will write the -overlay data to disk. Once the I/O operation is done, EdenFS records the I/O -operation its journal before responding to the FUSE request so that the kernel -can complete the initial I/O operation that triggered this entire chain of -events. +handled directly in the kernel, which will store the data to disk. However, when +writing to an EdenFS mount point the kernel must send the I/O request to EdenFS. +EdenFS will then perform the write operation by updating the corresponding file +in its overlay. The overlay state is stored on local disk, so this requires a +separate I/O operation to the kernel, which will write the overlay data to disk. +Once the I/O operation is done, EdenFS records the I/O operation its journal +before responding to the FUSE request so that the kernel can complete the +initial I/O operation that triggered this entire chain of events. ![FUSE I/O Write Path](img/edenfs_fuse_writes.svg) These extra hops from the kernel to EdenFS and then back to the kernel add -overhead. This generally makes it preferable to avoid performing large amounts +overhead. This generally makes it preferable to avoid performing large amounts of write I/O in an EdenFS checkout whenever possible. Unfortunately many build tools and existing user programs expect to be able to -write output files directly into specific directories inside a checkout. For +write output files directly into specific directories inside a checkout. For instance, [Buck](https://buck.build/) normally prefers to keep its build output -in a directory named `buck-out` inside the top-level source directory. A build +in a directory named `buck-out` inside the top-level source directory. A build operation can generate many thousands of files, containing many gigabytes of data. In order to make it easier to use EdenFS with these tools, EdenFS provides a mechanism to allow specific subdirectories to bypass EdenFS, and be stored -directly on local disk. The only caveat is that the redirected subdirectories -must be new subdirectories that only contain generated files, and do not -contain any files tracked in source control. +directly on local disk. The only caveat is that the redirected subdirectories +must be new subdirectories that only contain generated files, and do not contain +any files tracked in source control. The set of redirected subdirectories can be controlled through the `edenfsctl redirect` subcommand, or through a special `.eden-redirections` -configuration file in the top-level directory of the repository. Each time a -new commit is checked out the `.eden-redirections` file is parsed and the -current set of redirected directories is updated appropriately. +configuration file in the top-level directory of the repository. Each time a new +commit is checked out the `.eden-redirections` file is parsed and the current +set of redirected directories is updated appropriately. Directory redirection is implemented slightly differently on different -platforms, but the configuration mechanism is the same across all platforms. -On Linux redirections are primarily implemented using bind mounts, where a -local disk subdirectory is bind-mounted on top of the desired subdirectory in -the EdenFS checkout. Directory redirections can also be implemented using -symlinks, although this has some drawbacks compared to bind mounts, -particularly around the behavior of referring to `..` when inside the symlink -directory. +platforms, but the configuration mechanism is the same across all platforms. On +Linux redirections are primarily implemented using bind mounts, where a local +disk subdirectory is bind-mounted on top of the desired subdirectory in the +EdenFS checkout. Directory redirections can also be implemented using symlinks, +although this has some drawbacks compared to bind mounts, particularly around +the behavior of referring to `..` when inside the symlink directory. -The `Buck` build tool will automatically detect if it is being used inside of -an EdenFS checkout, and will configure a redirection for the `buck-out` -directory. This allows all generated build output to be written directly to -local disk, avoiding going through EdenFS. +The `Buck` build tool will automatically detect if it is being used inside of an +EdenFS checkout, and will configure a redirection for the `buck-out` directory. +This allows all generated build output to be written directly to local disk, +avoiding going through EdenFS. ![Redirected I/O Write Path](img/write_redirection.svg) Note that this does mean that all write operations inside the `buck-out` -subdirectory also bypass the EdenFS journal, and therefore cannot be reported -to subscribers through [Watchman](https://facebook.github.io/watchman/). -However, in most situations this is generally desirable: there is a high amount -of write I/O traffic to the build output directory during builds, and most -filesystem subscribers are not interested in these update events and want to -avoid the overhead if receiving these updates. Even in non-EdenFS checkouts -Watchman is typically configured to avoid watching build output directories -when possible. +subdirectory also bypass the EdenFS journal, and therefore cannot be reported to +subscribers through [Watchman](https://facebook.github.io/watchman/). However, +in most situations this is generally desirable: there is a high amount of write +I/O traffic to the build output directory during builds, and most filesystem +subscribers are not interested in these update events and want to avoid the +overhead if receiving these updates. Even in non-EdenFS checkouts Watchman is +typically configured to avoid watching build output directories when possible. diff --git a/eden/fs/docs/Rename.md b/eden/fs/docs/Rename.md index 26cddfb14c77f..ab36165e5776a 100644 --- a/eden/fs/docs/Rename.md +++ b/eden/fs/docs/Rename.md @@ -3,62 +3,62 @@ A few notes about renames: # Rename Lock There is a mountpoint-wide rename lock that is held during any rename or unlink -operation. An Inode's path cannot be changed without holding this lock. +operation. An Inode's path cannot be changed without holding this lock. However, we currently do not hold the rename lock when creating new files or -directories. Therefore TreeEntry `contents_.entries` fields may change even -when the rename lock is not held. (We could potentially revisit this choice -later and require holding the rename lock even when creating new inodes.) +directories. Therefore TreeEntry `contents_.entries` fields may change even when +the rename lock is not held. (We could potentially revisit this choice later and +require holding the rename lock even when creating new inodes.) # Renaming over directories Rename supports renaming one directory over an existing directory, as long as -the destination directory is empty. This means we must (a) be able to safely +the destination directory is empty. This means we must (a) be able to safely check if the directory is currently empty, and (b) be able to prevent new files or directories from being created inside the destination directory once the rename has started. We currently achieve this by acquiring the destination directory's `contents_` -lock. This does mean that a rename operation may hold up to 3 TreeInode locks +lock. This does mean that a rename operation may hold up to 3 TreeInode locks concurrently: the source directory, the destination parent directory, and the -destination child directory. The [InodeLocks](InodeLocks.md) document -describes the lock ordering requirements for acquiring these 3 locks. +destination child directory. The [InodeLocks](InodeLocks.md) document describes +the lock ordering requirements for acquiring these 3 locks. This also means that create() and mkdir() operations must check if the parent -directory is unlinked *after* acquiring the parent directory's contents lock. +directory is unlinked _after_ acquiring the parent directory's contents lock. # Handling unloaded children When rename() (and unlink()/rmdir()) is invoked, the parent directories have already been loaded (typically having been identified via inode number). -However, the affected children may not have been loaded yet, and are referred -to by name. +However, the affected children may not have been loaded yet, and are referred to +by name. We had a few choices for how to deal with this situation. For now we have opted to always load the child entries in question before -performing the rename. This is slightly tricky, as loading the child may take +performing the rename. This is slightly tricky, as loading the child may take some time, and another rename or unlink operation may also be in progress, and -may affect the child in question before our operation can take place. One -option would have been to hold the rename lock while waiting on the children to -be loaded. However, this would have blocked all other rename/unlink/rmdir -operations for the duration of the load, which seems undesirable. Instead, we -wait for the load to complete, then double check to confirm that the named -entry that we desire is actually loaded. The original inode we loaded may have -been renamed or unlinked, so we may find an unloaded entry or no entry at all. -If we find an unloaded entry we have to repeat the load operation. We -therefore may have to retry loading the requested children multiple times -before we can make progress, but we should eventually succeed or fail. Once -the children are loaded the rename itself is then fairly straightforward. +may affect the child in question before our operation can take place. One option +would have been to hold the rename lock while waiting on the children to be +loaded. However, this would have blocked all other rename/unlink/rmdir +operations for the duration of the load, which seems undesirable. Instead, we +wait for the load to complete, then double check to confirm that the named entry +that we desire is actually loaded. The original inode we loaded may have been +renamed or unlinked, so we may find an unloaded entry or no entry at all. If we +find an unloaded entry we have to repeat the load operation. We therefore may +have to retry loading the requested children multiple times before we can make +progress, but we should eventually succeed or fail. Once the children are loaded +the rename itself is then fairly straightforward. Another option would have been to allow the rename even though the requested -children are not loaded. The main downside with this approach is that we still -need to confirm if the destination child is an empty directory or not. This +children are not loaded. The main downside with this approach is that we still +need to confirm if the destination child is an empty directory or not. This would have meant either loading the destination child inode anyway, or storing -some extra data to track if an unloaded inode is an empty directory or not. -This also makes the inode loading code more complicated, as an inode may be -unlinked or renamed while it is already in the process of being loaded. When -the load completes we would need to double-check which parent TreeInode the new -entry needs to be inserted into. All-in-all this felt more complicated than -simply always loading the affected children before performing -rename/unlink/rmdir operations. +some extra data to track if an unloaded inode is an empty directory or not. This +also makes the inode loading code more complicated, as an inode may be unlinked +or renamed while it is already in the process of being loaded. When the load +completes we would need to double-check which parent TreeInode the new entry +needs to be inserted into. All-in-all this felt more complicated than simply +always loading the affected children before performing rename/unlink/rmdir +operations. diff --git a/eden/fs/docs/Takeover.md b/eden/fs/docs/Takeover.md index 435ff080880d2..4ae8c97bef5ac 100644 --- a/eden/fs/docs/Takeover.md +++ b/eden/fs/docs/Takeover.md @@ -12,61 +12,53 @@ support PrjFS mounts. There are 5 main components in the takeover directory: thrift serialization library, client, server, data, and handler. - ### Thrift serialization library -There are three main message classes that are exchanged over the takeover socket: - -* `struct TakeoverVersionQuery` - This is sent from the client to the server to -inform the server what features of the takeover protocol the client supports. -This allows us to evolve the protocol. This struct contains two fields: -`versions` and `capabilities`. - * `versions` is a legacy field. We use to use to distinguish versions of the - takeover protocol by number. This list of versions is suppose to contain all - the numbered versions of the protocol that the client supports. These days - we expect the versions list to be a singleton list containing the last - version number (7). - * `capabilities` These days we have migrated the protocol to be capability - based instead of version based. Instead of telling the server which versions - the client supports, the client tells the server directly which features or - capabilities of the takeover protocol the client supports. These - capabilities are represented as a bit mask. Each bit represents a certain - capability. You can see the different capabilities in `TakeoverData.h`. Some - capabilities are required these days since we have deprecated old versions - of the protocol and some capabilities have dependencies. See - `TakeoverData::computeCompatibleCapabilities` and the comments in - `TakeoverData.h`. -* empty "ready" ping - An empty ping sent by the server to ensure the client is -still alive and ready to receive takeover data -* `union SerializedTakeoverData` - This is a legacy union. Modern versions of -the protocol instead use `SerializedTakeoverResult` instead. This is a union of -a list of `SerializedMountInfo` or a string error. - * `struct SerializedMountInfo` - see info in the lower section. -* `union SerializedTakeoverResult`. This is either a `SerializedTakeoverInfo` -or a string error. We migrated from `SerializedTakeoverData` to this so that -we can pass general (non mount point specific) data in the non error case. - * `struct SerializedTakeoverInfo` - This is the struct representing all the - data we send for takeover in the non error case. This contains two fields: - `mounts` and `fileDescriptors`. `mounts` is a list of `SerializedMountInfo` - and `fileDescriptors` is a list of `FileDescriptorType`. - * `struct SerializedMountInfo` - Contains the mount path, the type - of mount in use, a state directory, a list of bind mount paths - (which is no longer used), connection information, and - a `SerializedInodeMap` - * `struct SerializedInodeMap` - A list of `SerializedInodeMapEntry` - unloaded inodes - * `struct SerializedInodeMapEntry` - contains inode - information like inodeNumber, parentInode, name, isUnlinked, - numFuseReferences, hash, and mode. - * `struct SerializedFileHandleMap` - currently empty - * `union FileDescriptorType` this is an enum of all the types of file - descriptors we can send during takeover (excluding mount specific fds). The - list of these represents the order the non mount specific file descriptors - are sent in the underlying sendmsg call. Mount point file descriptors are - sent in the same order as the list of SerializedMountInfo for the mount - points. - - +There are three main message classes that are exchanged over the takeover +socket: + +- `struct TakeoverVersionQuery` - This is sent from the client to the server to + inform the server what features of the takeover protocol the client supports. + This allows us to evolve the protocol. This struct contains two fields: + `versions` and `capabilities`. _ `versions` is a legacy field. We use to use + to distinguish versions of the takeover protocol by number. This list of + versions is suppose to contain all the numbered versions of the protocol that + the client supports. These days we expect the versions list to be a singleton + list containing the last version number (7). _ `capabilities` These days we + have migrated the protocol to be capability based instead of version based. + Instead of telling the server which versions the client supports, the client + tells the server directly which features or capabilities of the takeover + protocol the client supports. These capabilities are represented as a bit + mask. Each bit represents a certain capability. You can see the different + capabilities in `TakeoverData.h`. Some capabilities are required these days + since we have deprecated old versions of the protocol and some capabilities + have dependencies. See `TakeoverData::computeCompatibleCapabilities` and the + comments in `TakeoverData.h`. +- empty "ready" ping - An empty ping sent by the server to ensure the client is + still alive and ready to receive takeover data +- `union SerializedTakeoverData` - This is a legacy union. Modern versions of + the protocol instead use `SerializedTakeoverResult` instead. This is a union + of a list of `SerializedMountInfo` or a string error. \* + `struct SerializedMountInfo` - see info in the lower section. +- `union SerializedTakeoverResult`. This is either a `SerializedTakeoverInfo` or + a string error. We migrated from `SerializedTakeoverData` to this so that we + can pass general (non mount point specific) data in the non error case. _ + `struct SerializedTakeoverInfo` - This is the struct representing all the data + we send for takeover in the non error case. This contains two fields: `mounts` + and `fileDescriptors`. `mounts` is a list of `SerializedMountInfo` and + `fileDescriptors` is a list of `FileDescriptorType`. _ + `struct SerializedMountInfo` - Contains the mount path, the type of mount in + use, a state directory, a list of bind mount paths (which is no longer used), + connection information, and a `SerializedInodeMap` _ + `struct SerializedInodeMap` - A list of `SerializedInodeMapEntry` unloaded + inodes _ `struct SerializedInodeMapEntry` - contains inode information like + inodeNumber, parentInode, name, isUnlinked, numFuseReferences, hash, and mode. + _ `struct SerializedFileHandleMap` - currently empty _ + `union FileDescriptorType` this is an enum of all the types of file + descriptors we can send during takeover (excluding mount specific fds). The + list of these represents the order the non mount specific file descriptors are + sent in the underlying sendmsg call. Mount point file descriptors are sent in + the same order as the list of SerializedMountInfo for the mount points. ### Client @@ -90,11 +82,10 @@ and instead just receive the takeover data response. After we get the takeover data response, we either throw an exception if we do not get a message, or we deserialize the message and check its contents. We -throw an exception if the number of sockets sent is not the expected size -(num of mount points + 2 for the lock file and the thrift socket + 1 optional -socket for mountd). Otherwise, if all is well, we save the lock file, -thrift socket, mountd socket and all the mount points. - +throw an exception if the number of sockets sent is not the expected size (num +of mount points + 2 for the lock file and the thrift socket + 1 optional socket +for mountd). Otherwise, if all is well, we save the lock file, thrift socket, +mountd socket and all the mount points. ### Server @@ -104,49 +95,48 @@ the `EdenServer`'s main `EventBase` for driving its I/O. It has a few functions: -* public function: - * start - This is called when the EdenFS daemon first starts. It begins +- public function: + - start - This is called when the EdenFS daemon first starts. It begins listening on the takeover socket, waiting for a client to connect and - request to initiate a graceful restart. When a client connects, it verifies + request to initiate a graceful restart. When a client connects, it verifies that the client process is from the same user ID, and that the client and - server support a compatible takeover protocol capabilities. If the - capabilities are compatible, then the server starts to initiate shutdown - by calling `server_->getTakeoverHandler()->startTakeoverShutdown()`. - After the shutdown is completed, the takeover server pings the takeover - client to ensure it is still waiting for the data. If the ping is - unsuccessful (timeout, error, etc), the takeover server stops the takeover - process and returns the untransmitted `TakeoverData` in an exception in - order to let the `EdenServer` recover itself and start serving again. - Finally, it closes its storage (local and backing stores) and sends the - takeover data over the takeover socket by serializing the information - (version, lock file, thrift socket, mount file descriptor) or error, - and sending it. -* private functions: - * `connectionAccepted` - callback function for allocating a connection - handler when the server gets a client. - * `acceptError` - callback function that simply logs on an accept() error on + server support a compatible takeover protocol capabilities. If the + capabilities are compatible, then the server starts to initiate shutdown by + calling `server_->getTakeoverHandler()->startTakeoverShutdown()`. After the + shutdown is completed, the takeover server pings the takeover client to + ensure it is still waiting for the data. If the ping is unsuccessful + (timeout, error, etc), the takeover server stops the takeover process and + returns the untransmitted `TakeoverData` in an exception in order to let the + `EdenServer` recover itself and start serving again. Finally, it closes its + storage (local and backing stores) and sends the takeover data over the + takeover socket by serializing the information (version, lock file, thrift + socket, mount file descriptor) or error, and sending it. +- private functions: + - `connectionAccepted` - callback function for allocating a connection handler + when the server gets a client. + - `acceptError` - callback function that simply logs on an accept() error on the takeover socket - * `connectionDone` - callback function that is declared in the .h file but + - `connectionDone` - callback function that is declared in the .h file but currently is not defined. ### Data This holds the set of capabilities supported by this build. It also holds the lock file, the server socket, mountd, expected order to serialize those file -descriptors, the mount points, and a takeover complete promise that -will be fulfilled by the `TakeoverServer` code once the `TakeoverData` has been -sent to the remote process. It has a function to serialize and deserialize -the `TakeoverData`. - +descriptors, the mount points, and a takeover complete promise that will be +fulfilled by the `TakeoverServer` code once the `TakeoverData` has been sent to +the remote process. It has a function to serialize and deserialize the +`TakeoverData`. ### Handler TakeoverHandler is a pure virtual interface for classes that want to implement graceful takeover functionality. This is primarily implemented by the -`EdenServer` class. However, there are also alternative implementations used -for unit testing. +`EdenServer` class. However, there are also alternative implementations used for +unit testing. -It has two pure virtual functions: `startTakeoverShutdown()` and `closeStorage()`. +It has two pure virtual functions: `startTakeoverShutdown()` and +`closeStorage()`. `startTakeoverShutdown()` will be called when a graceful shutdown has been requested, with a remote process attempting to take over the currently running @@ -157,7 +147,7 @@ When implemented, this should return a Future that will produce the ready to transfer its mounts. `closeStorage()` will be called before sending the `TakeoverData` to the client, -conditionally on a successful ready handshake (if applicable). This function +conditionally on a successful ready handshake (if applicable). This function should close storage used by the server. In the case of an `EdenServer`, this -function allows for locks to be released in order for the new process to -take over this storage. +function allows for locks to be released in order for the new process to take +over this storage. diff --git a/eden/fs/docs/Threading.md b/eden/fs/docs/Threading.md index 9ce67fafef2a8..cd3bd98a99112 100644 --- a/eden/fs/docs/Threading.md +++ b/eden/fs/docs/Threading.md @@ -1,37 +1,38 @@ # Eden's Threading Strategy -There are `fuse:NumDispatcherThreads` (defaults to 16 as of Mar 2024) that block on reading -the FUSE socket. The reason we do blocking reads is to avoid two syscalls on an -incoming event: an epoll wakeup plus a read. Note that there is a FUSE socket -per mount. So if you have 3 mounts, there will be `3*fuse:NumDispatcherThreads` threads. +There are `fuse:NumDispatcherThreads` (defaults to 16 as of Mar 2024) that block +on reading the FUSE socket. The reason we do blocking reads is to avoid two +syscalls on an incoming event: an epoll wakeup plus a read. Note that there is a +FUSE socket per mount. So if you have 3 mounts, there will be +`3*fuse:NumDispatcherThreads` threads. The FUSE threads generally do any filesystem work directly rather than putting work on another thread. -The Thrift server uses `thrift_num_workers` IO threads (defaults to ncores). -We don't change the default number (ncores) of Thrift CPU threads. The -IO threads receive incoming requests, but serialization/deserialization and -actually handling the request is done on the CPU threads. +The Thrift server uses `thrift_num_workers` IO threads (defaults to ncores). We +don't change the default number (ncores) of Thrift CPU threads. The IO threads +receive incoming requests, but serialization/deserialization and actually +handling the request is done on the CPU threads. -There is another pool of (8 as of Dec 2017) threads on which the SaplingBackingStore -farms work out to (blocking) a Sapling retry processes. Because importing from -Sapling is high-latency and mostly blocking, we avoid doing any post-import -computation, so it's put into the following pool. Note that each SaplingBackingStore -has its own pool, and there is one SaplingBackingStore per underlying Sapling -repository. +There is another pool of (8 as of Dec 2017) threads on which the +SaplingBackingStore farms work out to (blocking) a Sapling retry processes. +Because importing from Sapling is high-latency and mostly blocking, we avoid +doing any post-import computation, so it's put into the following pool. Note +that each SaplingBackingStore has its own pool, and there is one +SaplingBackingStore per underlying Sapling repository. Eden also creates a CPU pool (12 threads as of Dec 2017) for miscellaneous -background tasks. These threads handle post-mount initialization, prefetching, +background tasks. These threads handle post-mount initialization, prefetching, and post-retry logic. The queue to the miscellaneous CPU pool must be unbounded because, if it could -block, there could be a deadlock between it and the other pools. To use a +block, there could be a deadlock between it and the other pools. To use a bounded queue and avoid deadlocks we'd have to guarantee anything that runs in -the miscellaneous CPU pool can then never block on the retry again. (Adding -to the retry queue blocks if it's full.) +the miscellaneous CPU pool can then never block on the retry again. (Adding to +the retry queue blocks if it's full.) ## Blocking -In general, we try to avoid blocking on other threads. The only places we ought -to block are talking to the filesystem and contending on locks. (Today, as +In general, we try to avoid blocking on other threads. The only places we ought +to block are talking to the filesystem and contending on locks. (Today, as mentioned above, we will block if inserting into the retry queue is full.) diff --git a/eden/fs/docs/Windows.md b/eden/fs/docs/Windows.md index bbb95199f5250..a02ae4d399cf7 100644 --- a/eden/fs/docs/Windows.md +++ b/eden/fs/docs/Windows.md @@ -6,41 +6,40 @@ own page. The rest of this document assumes prior knowledge about these two. ## Cached State -ProjectedFS was designed by Microsoft to have no overhead in -the common path: reading an already read or modified file. To achieve this, the -state of files is fully managed by ProjectedFS and is stored directly in the -working copy. EdenFS is only involved when providing the state of files that -ProjectedFS is not aware of. +ProjectedFS was designed by Microsoft to have no overhead in the common path: +reading an already read or modified file. To achieve this, the state of files is +fully managed by ProjectedFS and is stored directly in the working copy. EdenFS +is only involved when providing the state of files that ProjectedFS is not aware +of. For instance, the first time a file is being opened, ProjectedFS would first send EdenFS a [`PRJ_GET_PLACEHOLDER_INFO_CB`][PRJ_GET_PLACEHOLDER_INFO_CB] callback which will populate a placeholder file in the NTFS backing filesystem by calling the [PrjWritePlaceholderInfo][PrjWritePlaceholderInfo] API. -Similarly, on the first read, the -[`PRJ_GET_FILE_DATA_CB`][PRJ_GET_FILE_DATA_CB] is sent to EdenFS. EdenFS would -then write the file content by calling [`PrjWriteFileData`][PrjWriteFileData] -which will write the file to the working copy, the file is now considered to be -a hydrated placeholder. Subsequent open or reads will not involve EdenFS as -these will be served from the filesystem directly. +Similarly, on the first read, the [`PRJ_GET_FILE_DATA_CB`][PRJ_GET_FILE_DATA_CB] +is sent to EdenFS. EdenFS would then write the file content by calling +[`PrjWriteFileData`][PrjWriteFileData] which will write the file to the working +copy, the file is now considered to be a hydrated placeholder. Subsequent open +or reads will not involve EdenFS as these will be served from the filesystem +directly. While this allows for very fast reads to the working copy, it also leads to a surprising behavior: **files that have been read once will still be readable after EdenFS is stopped!** -One very important aspect of providing file data or metadata is that -ProjectedFS is the sole maintainer of the writeable working copy, and -thus EdenFS should only provide file data and metadata from the current -Mercurial commit. For instance, user created files should not be present in -directory enumeration, or more surprisingly, renamed files will always be -referred by ProjectedFS from their +One very important aspect of providing file data or metadata is that ProjectedFS +is the sole maintainer of the writeable working copy, and thus EdenFS should +only provide file data and metadata from the current Mercurial commit. For +instance, user created files should not be present in directory enumeration, or +more surprisingly, renamed files will always be referred by ProjectedFS from +their [pre-rename path and name](https://github.com/microsoft/ProjFS-Managed-API/issues/68). -For this reason, EdenFS rely solely on Mercurial trees to serve -ProjectedFS callbacks and will not consult the [inode](Inodes.md) -state. +For this reason, EdenFS rely solely on Mercurial trees to serve ProjectedFS +callbacks and will not consult the [inode](Inodes.md) state. -The rules are slightly different for directories as these will always be -queried even after the first directory listing. ProjectedFS will use three -callbacks for directory listing, starting with +The rules are slightly different for directories as these will always be queried +even after the first directory listing. ProjectedFS will use three callbacks for +directory listing, starting with [`PRJ_START_DIRECTORY_ENUMERATION_CB`][PRJ_START_DIRECTORY_ENUMERATION_CB] to open the directory. Reading it is done via the [`PRJ_GET_DIRECTORY_ENUMERATION_CB`][PRJ_GET_DIRECTORY_ENUMERATION_CB] callback @@ -51,49 +50,46 @@ Mercurial commit will not be receiving these callbacks. ## Inode State -While EdenFS on Windows makes little use of the inode state, it is -still fundamental to EdenFS inner working. To name a few, `getScmStatus`, -`checkoutRevision` or `globFiles` all rely on the inode state as they care -about the working copy state that ProjectedFS doesn't provide. +While EdenFS on Windows makes little use of the inode state, it is still +fundamental to EdenFS inner working. To name a few, `getScmStatus`, +`checkoutRevision` or `globFiles` all rely on the inode state as they care about +the working copy state that ProjectedFS doesn't provide. ### Write notifications Whenever a write operation is performed in the working copy (writing a file, renaming it, creating a directory, etc), the callback -[`PRJ_NOTIFICATION_CB`][PRJ_NOTIFICATION_CB] is invoked in EdenFS. This -callback is usually invoked after the write operation has taken place and thus -EdenFS cannot refuse the operation. - -The most subtle part about this callback is that ProjectedFS doesn't -provide any guarantee about the ordering of them. For instance, during a -concurrent directory hierarchy creation, a notification on a child directory -may be received prior to the notification of its parent directory! The same is -true for file and directory removal. - -In order for the inode state to stay in sync with the working copy -state, EdenFS handles all of the notification serially in a single background -thread. The handling of these notifications is done in a -non-blocking manner in EdenFS. On receiving a notification, EdenFS will first -inspect the state of the file/directory on which the notification occurs and -will then update the inode state accordingly: for a missing file, -it will remove it from inode hierarchy, for a missing directory, the entire -directory hierarchy will be removed, etc. - -This scheme means that during write heavy workloads, the inode -state will always be lagging behind the working copy. Since EdenFS only needs -the query the inode state while servicing Thrift requests, EdenFS -only needs to make sure that the inode state caught up with all the -changes to the working copy prior to servicing the Thrift requests. This is -done by simply enqueuing an empty notification and waiting for it to be -serviced. +[`PRJ_NOTIFICATION_CB`][PRJ_NOTIFICATION_CB] is invoked in EdenFS. This callback +is usually invoked after the write operation has taken place and thus EdenFS +cannot refuse the operation. + +The most subtle part about this callback is that ProjectedFS doesn't provide any +guarantee about the ordering of them. For instance, during a concurrent +directory hierarchy creation, a notification on a child directory may be +received prior to the notification of its parent directory! The same is true for +file and directory removal. + +In order for the inode state to stay in sync with the working copy state, EdenFS +handles all of the notification serially in a single background thread. The +handling of these notifications is done in a non-blocking manner in EdenFS. On +receiving a notification, EdenFS will first inspect the state of the +file/directory on which the notification occurs and will then update the inode +state accordingly: for a missing file, it will remove it from inode hierarchy, +for a missing directory, the entire directory hierarchy will be removed, etc. + +This scheme means that during write heavy workloads, the inode state will always +be lagging behind the working copy. Since EdenFS only needs the query the inode +state while servicing Thrift requests, EdenFS only needs to make sure that the +inode state caught up with all the changes to the working copy prior to +servicing the Thrift requests. This is done by simply enqueuing an empty +notification and waiting for it to be serviced. Since some clients ([Buck][Buck], [Watchman][Watchman]) often don't mind if the data returned is slightly out of date, all the Thrift queries accept a `SyncBehavior` argument that allows the client to control how long to wait for the inode to be synchronized with the working copy. Note that this only -guarantees that all the writes made prior to the Thrift request have been -synced up, writes that race with the Thrift query are not guaranteed to be -synced up. +guarantees that all the writes made prior to the Thrift request have been synced +up, writes that race with the Thrift query are not guaranteed to be synced up. ## Invalidations @@ -101,14 +97,13 @@ As mentioned above, ProjectedFS will only trigger callbacks in EdenFS the first time a file is read or opened, thus if during a checkout operation, a file that has been read changes, that file will need to be invalidated. This is done via the the [PrjDeleteFile][PrjDeleteFile] API. For directories, and as described -above, callbacks are only sent to directories present in the current commit, -and never sent to user created directories, thus EdenFS needs to add a -placeholder to them if the directory either changes, or is present in the -destination commit during the checkout operation. This is done via the +above, callbacks are only sent to directories present in the current commit, and +never sent to user created directories, thus EdenFS needs to add a placeholder +to them if the directory either changes, or is present in the destination commit +during the checkout operation. This is done via the [PrjMarkDirectoryAsPlaceholder][PrjMarkDirectoryAsPlaceholder] API. While -Microsoft's documentation doesn't document this API to be used for -invalidation, VFSForGit is using it to perform invalidation in the same way as -EdenFS. +Microsoft's documentation doesn't document this API to be used for invalidation, +VFSForGit is using it to perform invalidation in the same way as EdenFS. ## Pitfalls and caveats @@ -116,29 +111,29 @@ EdenFS. Invalidation has been the source of several bugs in EdenFS. Starting with passing a GUID that doesn't match the GUID of the root folder in -`PrjMarkDirectoryAsPlaceholder`. This sometimes leads to Windows throwing a -"The provider that supports file system virtualization is temporarily -unavailable" error. To avoid this issue, EdenFS stores the GUID used when -creating a mount in the mount configuration, and will use the same GUID for the -whole lifetime of the working copy. +`PrjMarkDirectoryAsPlaceholder`. This sometimes leads to Windows throwing a "The +provider that supports file system virtualization is temporarily unavailable" +error. To avoid this issue, EdenFS stores the GUID used when creating a mount in +the mount configuration, and will use the same GUID for the whole lifetime of +the working copy. Still on `PrjMarkDirectoryAsPlaceholder`, calling this API on a non-populated -directory will lead to recursive callbacks which have at times deadlocked -EdenFS due to trying to recursively take already held locks. +directory will lead to recursive callbacks which have at times deadlocked EdenFS +due to trying to recursively take already held locks. The `PrjDeleteFile` and [`PrjUpdateFileIfNeeded`][PrjUpdateFileIfNeeded] can only be used on an empty directory, or they will fail claiming that the -directory isn't empty. While this is expected for the former, this is -surprising for the latter. During callbacks, ProjectedFS passes the relative -path of the file as well as the +directory isn't empty. While this is expected for the former, this is surprising +for the latter. During callbacks, ProjectedFS passes the relative path of the +file as well as the [`PRJ_PLACEHOLDER_VERSION_INFO`][PRJ_PLACEHOLDER_VERSION_INFO] stored in the placeholder (which can be populated via `PrjWritePlaceholderInfo`), and EdenFS walks the Mercurial trees to serve the callback. An optimization would be shortcut this walk by storing the tree/file ID in the placeholder and using it to obtain the same data as the walk. Unfortunately, due to `PrjUpdateFileIfNeeded` not being able to update the placeholder of directories -containing untracked files, placeholders would become out of date after -checkout operations, rendering them unuseable. +containing untracked files, placeholders would become out of date after checkout +operations, rendering them unuseable. ### Renaming directories @@ -161,22 +156,34 @@ EdenFS is stopped. Some users have reported editing files long after EdenFS has stopped. At startup, EdenFS will scan the fully materialized directories to update its overlay to stay in sync with the filesystem state. - -[PrjFS]: https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system +[PrjFS]: + https://docs.microsoft.com/en-us/windows/win32/projfs/projected-file-system [FUSE]: https://en.wikipedia.org/wiki/Filesystem_in_Userspace [NFS]: https://datatracker.ietf.org/doc/html/rfc1813 [NTFS]: https://en.wikipedia.org/wiki/NTFS -[PRJ_GET_PLACEHOLDER_INFO_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_placeholder_info_cb -[PRJ_GET_FILE_DATA_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_file_data_cb -[PrjWriteFileData]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjwritefiledata -[PrjWritePlaceholderInfo]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjwriteplaceholderinfo -[PRJ_NOTIFICATION_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_notification_cb +[PRJ_GET_PLACEHOLDER_INFO_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_placeholder_info_cb +[PRJ_GET_FILE_DATA_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_file_data_cb +[PrjWriteFileData]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjwritefiledata +[PrjWritePlaceholderInfo]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjwriteplaceholderinfo +[PRJ_NOTIFICATION_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_notification_cb [Buck]: https://buck.build [Watchman]: https://facebook.github.io/watchman/ -[PrjDeleteFile]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjdeletefile -[PrjUpdateFileIfNeeded]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjupdatefileifneeded -[PRJ_START_DIRECTORY_ENUMERATION_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_start_directory_enumeration_cb -[PRJ_GET_DIRECTORY_ENUMERATION_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_directory_enumeration_cb -[PRJ_END_DIRECTORY_ENUMERATION_CB]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_end_directory_enumeration_cb -[PrjMarkDirectoryAsPlaceholder]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjmarkdirectoryasplaceholder -[PRJ_PLACEHOLDER_VERSION_INFO]: https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/ns-projectedfslib-prj_placeholder_version_info +[PrjDeleteFile]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjdeletefile +[PrjUpdateFileIfNeeded]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjupdatefileifneeded +[PRJ_START_DIRECTORY_ENUMERATION_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_start_directory_enumeration_cb +[PRJ_GET_DIRECTORY_ENUMERATION_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_get_directory_enumeration_cb +[PRJ_END_DIRECTORY_ENUMERATION_CB]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nc-projectedfslib-prj_end_directory_enumeration_cb +[PrjMarkDirectoryAsPlaceholder]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/nf-projectedfslib-prjmarkdirectoryasplaceholder +[PRJ_PLACEHOLDER_VERSION_INFO]: + https://docs.microsoft.com/en-us/windows/win32/api/projectedfslib/ns-projectedfslib-prj_placeholder_version_info diff --git a/eden/fs/docs/WindowsFsck.md b/eden/fs/docs/WindowsFsck.md index af11f6063bb77..13ff8380ec8b8 100644 --- a/eden/fs/docs/WindowsFsck.md +++ b/eden/fs/docs/WindowsFsck.md @@ -2,13 +2,13 @@ This was written in first person. I here refers to kmancini. -When Eden starts up on all platforms, we have an optional FSCK -(**F**ile **S**ystem **C**hec**K**) that makes sure Eden’s internal on-disk -state is not corrupted (think incorrect `hg status`) and fix it where possible. +When Eden starts up on all platforms, we have an optional FSCK (**F**ile +**S**ystem **C**hec**K**) that makes sure Eden’s internal on-disk state is not +corrupted (think incorrect `hg status`) and fix it where possible. On macOS and Linux this check is optional. We can skip it if Eden shutdown -cleanly. However, it is **not optional on Windows** for two reasons. Reason -#1: the notifications about files changing we get from the operating system on +cleanly. However, it is **not optional on Windows** for two reasons. Reason #1: +the notifications about files changing we get from the operating system on Windows are asynchronous. This means Eden might miss a file change before stopping. Reason #2: on Windows files can be modified while Eden isn’t running. @@ -24,9 +24,9 @@ Notifications from the operating system about files changing were always asynchronous on Windows. So FSCK was already prepared to handle Eden completely missing modifications to files. But last year we rolled out the BufferedOverlay ([post](https://fb.workplace.com/groups/edenfs/permalink/2004171086419780/)). -This makes Eden faster in the hot path by buffering writes to disk in-memory -but can mean that Eden can exit with partially persisted internal state. FSCK -needed to be updated to handle this extra case of asynchrony. +This makes Eden faster in the hot path by buffering writes to disk in-memory but +can mean that Eden can exit with partially persisted internal state. FSCK needed +to be updated to handle this extra case of asynchrony. Because FSCK was already slow, careful work was done to limit the number of filesystem operations per file to 1. FSCK had grown bloat over the years as we @@ -45,67 +45,67 @@ an Eden restart. This was fixed last fall by ignoring case in FSCK like we do everywhere else on Windows, but unearthed multiple other bugs: -* **“renamed-ness” of files was ignored by FSCK.** This means that if FSCK missed -a rename it might bring back the old location of the file, the file might have -the wrong contents, or `hg status` would report incorrect information for these -files. -* Removed files could be brought back by FSCK, and `hg status` could report -incorrect information about them. +- **“renamed-ness” of files was ignored by FSCK.** This means that if FSCK + missed a rename it might bring back the old location of the file, the file + might have the wrong contents, or `hg status` would report incorrect + information for these files. +- Removed files could be brought back by FSCK, and `hg status` could report + incorrect information about them. #### Here’s the nity-grity of how FSCK was incorrect for renames: -| File renamed while eden is …| Old parent is …|File is ...|Same named file in source in scm?|Same named file in destination in scm ?|Old parent is empty |Tombstone placed?|Source inode state after fsck|Source FS state after fsck|Destination inode state after fsck|Destination FS state after fsck|In sync|No re-appearing files|Matches sparse profile behavior| -|---|---|---|---|---|---|---|---|---|---|---|---|---|---| -|running|placeholder / full|placeholder|y|y|y/n|y|No inode|No file on disk|inode with destination scm contents|file with source scm contents|❌ (1)|✔️|❌ (2)| -|running|placeholder / full|placeholder|n|y|y/n|y|No inode|No file on disk|inode with destination scm contents|error reading file|❌ (1)|✔️|❌ (2)| -|running|placeholder / full|placeholder|y|n|y/n|y|No inode|No file on disk|inode with source scm contents|inode with source scm contents|✔️|✔️|❌ (2)| -|running|placeholder / full|placeholder|n|n|y/n|y|No inode|No file on disk|error when accessing inode|error when reading file|✔️|✔️|❌ (2)| -|running|placeholder|hydrated placeholder / full|y/n|y/n|y/n|y|No inode|No file on disk|inode with moved contents|file with moved contents|✔️|✔️|✔️| -|running|full|full|y|y/n|y|n|No inode|No file on disk|inode with moved contents|file with moved contents|✔️|✔️|✔️| -|running|full|full|y|y/n|n|n|Inode with scm hash|No file on disk|inode with moved contents|inode with moved contents|❌ (4)|❌ (4)|✔️| -|running|full|full|n|y/n|y/n|n|No inode|No file on disk|inode with moved contents|inode with moved contents|✔️|✔️|✔️| -|stopped|placeholder|full|y|y/n|y/n|n|inode with scm hash|file on disk with scm hash|inode with moved contents|inode with moved contents|✔️|❌(4)|✔️|| -stopped|placeholder|full|n|y/n|y/n|n|no inodes|no file on disk|inode with moved contents|inode with moved contents|✔️|✔️|✔️| -|stopped|placeholder / full|placeholder|y/n|y/n|y/n|y/n|inode with source scm hash|file on disk with scm content|inode with destination scm contents|file with source scm contents|❌ (1)|❌ (3)|✔️| -|stopped|placeholder / full|placeholder|y|n|y/n|n|inode with source scm hash|file on disk with scm content|inode with source scm contents|file with source scm contents|✔️|❌ (3)|❌ (2)| -|stopped|placeholder / full|placeholder|n|n|y/n|n|no inode|no file on disk|inode with source scm contents | inode with source scm contents|✔️|✔️|❌(2)| -|stopped|placeholder / full|placeholder|n|y|y/n|n|No inode|No file on disk|inode with destination scm contents|file with source scm contents|❌(1)|✔️|✔️| -|stopped|full|full|y|y/n|y/n|n|Inode with scm hash|No file on disk|inode with moved contents|inode with moved contents|❌(4)|❌(4)|✔️| -|stopped|full|full|n|y/n|y/n|n|No inode|No file on disk|inodes with moved contents|inode with moved contents|✔️|✔️|✔️| - -* Checks are good behavior; Xs are bad behavior. -* The most incorrect Eden behavior corresponds to the X in the first check/x -column. Out of sync means `hg status` will be wrong and `hg checkout` is likely -to fail in weird ways. -* The other Xs indicates Eden does something unexpected. Unexpected includes -bringing back removed files or contents being different than they would be on -unix platforms or sparse profiles. +| File renamed while eden is … | Old parent is … | File is ... | Same named file in source in scm? | Same named file in destination in scm ? | Old parent is empty | Tombstone placed? | Source inode state after fsck | Source FS state after fsck | Destination inode state after fsck | Destination FS state after fsck | In sync | No re-appearing files | Matches sparse profile behavior | +| ---------------------------- | ------------------ | --------------------------- | --------------------------------- | --------------------------------------- | ------------------- | ----------------- | ----------------------------- | ----------------------------- | ----------------------------------- | ------------------------------- | ------- | --------------------- | ------------------------------- | --- | +| running | placeholder / full | placeholder | y | y | y/n | y | No inode | No file on disk | inode with destination scm contents | file with source scm contents | ❌ (1) | ✔️ | ❌ (2) | +| running | placeholder / full | placeholder | n | y | y/n | y | No inode | No file on disk | inode with destination scm contents | error reading file | ❌ (1) | ✔️ | ❌ (2) | +| running | placeholder / full | placeholder | y | n | y/n | y | No inode | No file on disk | inode with source scm contents | inode with source scm contents | ✔️ | ✔️ | ❌ (2) | +| running | placeholder / full | placeholder | n | n | y/n | y | No inode | No file on disk | error when accessing inode | error when reading file | ✔️ | ✔️ | ❌ (2) | +| running | placeholder | hydrated placeholder / full | y/n | y/n | y/n | y | No inode | No file on disk | inode with moved contents | file with moved contents | ✔️ | ✔️ | ✔️ | +| running | full | full | y | y/n | y | n | No inode | No file on disk | inode with moved contents | file with moved contents | ✔️ | ✔️ | ✔️ | +| running | full | full | y | y/n | n | n | Inode with scm hash | No file on disk | inode with moved contents | inode with moved contents | ❌ (4) | ❌ (4) | ✔️ | +| running | full | full | n | y/n | y/n | n | No inode | No file on disk | inode with moved contents | inode with moved contents | ✔️ | ✔️ | ✔️ | +| stopped | placeholder | full | y | y/n | y/n | n | inode with scm hash | file on disk with scm hash | inode with moved contents | inode with moved contents | ✔️ | ❌(4) | ✔️ | | +| stopped | placeholder | full | n | y/n | y/n | n | no inodes | no file on disk | inode with moved contents | inode with moved contents | ✔️ | ✔️ | ✔️ | +| stopped | placeholder / full | placeholder | y/n | y/n | y/n | y/n | inode with source scm hash | file on disk with scm content | inode with destination scm contents | file with source scm contents | ❌ (1) | ❌ (3) | ✔️ | +| stopped | placeholder / full | placeholder | y | n | y/n | n | inode with source scm hash | file on disk with scm content | inode with source scm contents | file with source scm contents | ✔️ | ❌ (3) | ❌ (2) | +| stopped | placeholder / full | placeholder | n | n | y/n | n | no inode | no file on disk | inode with source scm contents | inode with source scm contents | ✔️ | ✔️ | ❌(2) | +| stopped | placeholder / full | placeholder | n | y | y/n | n | No inode | No file on disk | inode with destination scm contents | file with source scm contents | ❌(1) | ✔️ | ✔️ | +| stopped | full | full | y | y/n | y/n | n | Inode with scm hash | No file on disk | inode with moved contents | inode with moved contents | ❌(4) | ❌(4) | ✔️ | +| stopped | full | full | n | y/n | y/n | n | No inode | No file on disk | inodes with moved contents | inode with moved contents | ✔️ | ✔️ | ✔️ | + +- Checks are good behavior; Xs are bad behavior. +- The most incorrect Eden behavior corresponds to the X in the first check/x + column. Out of sync means `hg status` will be wrong and `hg checkout` is + likely to fail in weird ways. +- The other Xs indicates Eden does something unexpected. Unexpected includes + bringing back removed files or contents being different than they would be on + unix platforms or sparse profiles. As you can see there are more rows with Xs than without. Generally, the no-X -rows are a bit more common case, but I am sure there are plenty of users who -hit these Xs regularly. Though there are lots of incorrect rows, the issues can -be categorized into 4 root cause bugs. The cause of each X is labeled with the +rows are a bit more common case, but I am sure there are plenty of users who hit +these Xs regularly. Though there are lots of incorrect rows, the issues can be +categorized into 4 root cause bugs. The cause of each X is labeled with the identified issues below. Before I get into it though. We need to get on the same page about some terms. -* “full”: This is a ProjectedFS term. For files it means a file is locally -modified or locally created. For directories it only includes locally created -directories. -* “hydrated placeholder”: This is a ProjectedFS term. For files this means the -file has been read, and its contents are present on disk (in your repo -directory). Directories are never hydrated. -* “placeholder”: This is a ProjectedFS term. For files this is a file that has -never been written or read. For directories, this is all directories that were -not locally created (with like mkdir or something). -* “materialized”: This is an Eden term. On windows it generally means disk (in -your repo directory) is the source of truth for this file/directory. Reads for -these files are completed by reading the file off disk. -* “WCP”: This is a mercurial term. Short for “working copy parent”. This is the -last commit that you checked out. - - -1. **FSCK is unaware that renamed files are special snowflakes in Eden’s model.** +- “full”: This is a ProjectedFS term. For files it means a file is locally + modified or locally created. For directories it only includes locally created + directories. +- “hydrated placeholder”: This is a ProjectedFS term. For files this means the + file has been read, and its contents are present on disk (in your repo + directory). Directories are never hydrated. +- “placeholder”: This is a ProjectedFS term. For files this is a file that has + never been written or read. For directories, this is all directories that were + not locally created (with like mkdir or something). +- “materialized”: This is an Eden term. On windows it generally means disk (in + your repo directory) is the source of truth for this file/directory. Reads for + these files are completed by reading the file off disk. +- “WCP”: This is a mercurial term. Short for “working copy parent”. This is the + last commit that you checked out. + +1. **FSCK is unaware that renamed files are special snowflakes in Eden’s + model.** Renamed files are the only files that are placeholders, but their path does not map to a source control object in the WCP. FSCK does not properly handle this @@ -120,18 +120,18 @@ read the path out of a source control object for the WCP. However, if ProjectedFS were to ask us to read a renamed file, it would ask us with the original path of the file. So, we would return the source control object at the original path. In essence, FSCK needs to know to check for the source control -object of a renamed file at the original path instead of the current one. -Though there is a bit of a simpler solution. I’ll go through the solution later. +object of a renamed file at the original path instead of the current one. Though +there is a bit of a simpler solution. I’ll go through the solution later. 2. **Moved files are incorrectly handled generally by Windows Eden.** -Like I mentioned above Eden always reads file content out of the WCP. That -means that for renamed files we read their content from the original path in -the WCP. This is a problem if you checked out a new commit since you renamed -the file. We will read the contents of the file at the new checked out commit -instead of the one that was checked out when the rename happened. Worse(?) if -the file was removed in the new commit, you checkout, you will get an internal -error when reading the file!! +Like I mentioned above Eden always reads file content out of the WCP. That means +that for renamed files we read their content from the original path in the WCP. +This is a problem if you checked out a new commit since you renamed the file. We +will read the contents of the file at the new checked out commit instead of the +one that was checked out when the rename happened. Worse(?) if the file was +removed in the new commit, you checkout, you will get an internal error when +reading the file!! This is easier to understand with an example. This little repro will do it: @@ -152,28 +152,27 @@ Rename removes a file from the source location and adds it to the destination. Removing the file from the source location is subject to the same bugs we have with removed files. -This issue is a little easier to talk about when there are fewer moving parts -so I’ll describe it below. +This issue is a little easier to talk about when there are fewer moving parts so +I’ll describe it below. **4. Deleted files are not handled correctly when the parent is a full file.** same as 3. - #### Here’s the nity-grity of how FSCK was incorrect for removals: -|File removed while eden is …|Parent is …|Same named file in scm?|Parent is empty (in both inode and on disk)|Tombstone placed?|Indode state after fsck|Fs state after fsck|In sync|No re-appearing files| -|---|---|---|---|---|---|---|---|---| -|running|placeholder|y/n|y/n|y|No inode|No file on disk|✔️|✔️| -|running|full|y|y|n|No inode|No file on disk|✔️|✔️| -|running|full|y|n|n|Inode with scm hash|No file on disk|❌ (4)|❌ (4)| -|running|full|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|placeholder|y|y/n|n|inode with scm hash|file on disk with scm content|✔️| ❌ (3)| -|stopped|placeholder|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|y|y/n|n|Inode with scm hash|No file on disk|❌ (4)|❌ (4)| -|stopped|full|n|y/n|n|No inode|No file on disk|✔️|✔️| +| File removed while eden is … | Parent is … | Same named file in scm? | Parent is empty (in both inode and on disk) | Tombstone placed? | Indode state after fsck | Fs state after fsck | In sync | No re-appearing files | +| ---------------------------- | ----------- | ----------------------- | ------------------------------------------- | ----------------- | ----------------------- | ----------------------------- | ------- | --------------------- | +| running | placeholder | y/n | y/n | y | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | y | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | n | n | Inode with scm hash | No file on disk | ❌ (4) | ❌ (4) | +| running | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | placeholder | y | y/n | n | inode with scm hash | file on disk with scm content | ✔️ | ❌ (3) | +| stopped | placeholder | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | full | y | y/n | n | Inode with scm hash | No file on disk | ❌ (4) | ❌ (4) | +| stopped | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | - the 3rd and 7th rows are more problematic that the 5th. Since these cases are -corruption rather than unexpected behavior. + corruption rather than unexpected behavior. 3. **Deleted files are not handled correctly when the parent is a placeholder.** @@ -189,12 +188,12 @@ from directories. But is has a few unfortunate consequences including this bug. Normally, when a file is removed from a Placeholder, ProjectedFS will place a tombstone there. However, if a file is removed while Eden isn’t running, no placeholder is put in its place. Why ... that part I cannot reverse engineer. -Seems like a bug that has *** reasons *** behind it. Perhaps we should ask +Seems like a bug that has **_ reasons _** behind it. Perhaps we should ask Microsoft if they could do better. But anyways this causes a bit of an issue. After Eden restarts files removed from placeholder directories will -automagically be brought back by ProjectedFS because ProjectedFS doesn’t know -to subtract the file from the directory listing returned from Eden. +automagically be brought back by ProjectedFS because ProjectedFS doesn’t know to +subtract the file from the directory listing returned from Eden. To compensate for ProjectedFS’s behavior, Eden matches projectedFS’s behavior and brings the file back. This is ok-ish. But to be honest we are not bringing @@ -207,129 +206,127 @@ directories, we decided that if a file is missing from disk (without a tombstone), Eden should bring it back (I’m making a simplification). However, ProjectedFS doesn’t have the same behavior for full directories as -placeholder directories. ProjectedFS will not revive such removed files on -disk. Eden brings back removed files internally, but they will remain missing -from disk. Very bad Eden!! +placeholder directories. ProjectedFS will not revive such removed files on disk. +Eden brings back removed files internally, but they will remain missing from +disk. Very bad Eden!! ## Solutions ### Problem 1 -To recap this problem is that FSCK thinks it should be correcting renamed -files to make them match the source control object their path maps too. But -really it should be making them match the source control object at their -original path. - -However, I mentioned there is an even simpler solution. It's this: Instead -of even matching the file to a source control object, its sufficient to just -make the inode materialized in Eden. When materialized inodes are read -internally Eden reads it from disk (in your repo directory). Or when you (or -more likely Buck) ask for the sha1 of a file, Eden reads the file from disk -and hashes the contents (with some caching layers in between). This reading -from disk thing is a little spooky. But a lot of Eden is tangled up in this -reading from disk situation, so to avoid the web of issues from growing, we -are gonna accept that reading files from disk is the reality of Eden on Windows. - -The reading from disk thing does give us an advantage here. All we have to do -is mark renamed files materialized, and then we know that when Eden goes to -read the file, it will read from disk, and ProjectedFS will ask Eden for the -file at the original path and Eden will look it up the old path in the WCP. -This materialize renamed files is what Eden already does when ProjectedFS tells -us about a rename, so we just need to make FSCK do it too. +To recap this problem is that FSCK thinks it should be correcting renamed files +to make them match the source control object their path maps too. But really it +should be making them match the source control object at their original path. + +However, I mentioned there is an even simpler solution. It's this: Instead of +even matching the file to a source control object, its sufficient to just make +the inode materialized in Eden. When materialized inodes are read internally +Eden reads it from disk (in your repo directory). Or when you (or more likely +Buck) ask for the sha1 of a file, Eden reads the file from disk and hashes the +contents (with some caching layers in between). This reading from disk thing is +a little spooky. But a lot of Eden is tangled up in this reading from disk +situation, so to avoid the web of issues from growing, we are gonna accept that +reading files from disk is the reality of Eden on Windows. + +The reading from disk thing does give us an advantage here. All we have to do is +mark renamed files materialized, and then we know that when Eden goes to read +the file, it will read from disk, and ProjectedFS will ask Eden for the file at +the original path and Eden will look it up the old path in the WCP. This +materialize renamed files is what Eden already does when ProjectedFS tells us +about a rename, so we just need to make FSCK do it too. So, in summary a solution here is to just make sure all renamed files are correctly marked materialized in Eden’s internal state. -There are other solutions, but they involve overhauling Eden’s general -treatment of renamed files which as I will get into in the Problem 2 section is -messy to say the least. So we decided to go with this solution. +There are other solutions, but they involve overhauling Eden’s general treatment +of renamed files which as I will get into in the Problem 2 section is messy to +say the least. So we decided to go with this solution. To make this solution work Eden needs to detect a renamed file in FSCK. Lucky -for us ProjectedFS sets a certain bit in the reparse point representing the -file when its renamed (it also puts the original path in there too, but the -bit is easier to use). This certain bit is not documented, but it is reliable, -and we have a pretty comprehensive suite of tests now that assert our -assumptions about the bit. +for us ProjectedFS sets a certain bit in the reparse point representing the file +when its renamed (it also puts the original path in there too, but the bit is +easier to use). This certain bit is not documented, but it is reliable, and we +have a pretty comprehensive suite of tests now that assert our assumptions about +the bit. Unluckily though, reading a reparse buffer is pretty slow, and it makes our -strict rule of 1 filesystem operation per file two operations per file. And -this has consequences. FSCK gets 2x slower. FSCK was already in the -multi-minutes for users. So 2x here really hurts. +strict rule of 1 filesystem operation per file two operations per file. And this +has consequences. FSCK gets 2x slower. FSCK was already in the multi-minutes for +users. So 2x here really hurts. -But it so happens this was the kick in the pants we needed to do something -about how slow FSCK was. The way FSCK roughly works is crawl all the tracked -files on disk and in Eden’s representation of them. Then fix each file as -needed. Single threaded. +But it so happens this was the kick in the pants we needed to do something about +how slow FSCK was. The way FSCK roughly works is crawl all the tracked files on +disk and in Eden’s representation of them. Then fix each file as needed. Single +threaded. - Xavier added multiple threads to FSCK, and bam reasonable start up times - (the rollout is in progress and final numbers will grace your workplace - feed soon :) ) +Xavier added multiple threads to FSCK, and bam reasonable start up times (the +rollout is in progress and final numbers will grace your workplace feed soon :) +) -Now FSCK is typically in the tens of seconds range, and that 2x doesn’t hurt -so bad. We are currently rolling out detect a renames and mark the inode as +Now FSCK is typically in the tens of seconds range, and that 2x doesn’t hurt so +bad. We are currently rolling out detect a renames and mark the inode as materialized solution. - ### Problem 2 To recap, this problem is that Eden always serves ProjectedFS read requests -directly from the WCP and that interacts bad with renamed files. To match -sparse profiles behavior, and Eden behavior on macOS and Linux, Eden really -should be reading the contents from the source control tree for the commit -that was checked out when the file was renamed. Or something that matches that -behavior. +directly from the WCP and that interacts bad with renamed files. To match sparse +profiles behavior, and Eden behavior on macOS and Linux, Eden really should be +reading the contents from the source control tree for the commit that was +checked out when the file was renamed. Or something that matches that behavior. Alright so what are the options. 1. **Do away with this whole read from source control objects and use our -inodes.** This unfortunately doesn’t work so well. This is how Eden use to -work, and there were three+ problems. issue a: ProjectedFS is going to ask -Eden to read the original path of renamed files, and this won’t exist in the -inodes. Eden would need to keep some mapping of renames ... see potential -solution #2 for why an exploration of why this is bad. issue b: This causes -ProjectedFS to over zealously create tombstones and cause lots of weird -behavior. issue c: Edens inode state has had a lot of reliability issues on -windows, so reading from source control is more reliable. See the diff -changing diff D32022639 and [thread](http://xavier%20deguillard%20https//github.com/microsoft/ProjFS-Managed-API/issues/68) -with Microsoft for more details. Overall, we would be adding more problems -than solving to go back to inodes. - -2. **Track renames in Eden and special case reading renamed files.** This -“tracking” would be internal Eden state that can fall out of sync with reality -(i.e. ProjectedFS). And the root root (this duplication is not a typo) cause of -all these problems I am writing about in this post is really that we duplicate -state in Eden that gets our of sync with the source of truth. So, adding more -duplicated state to fix our issue of duplicated state (in my opinion) is a bad -idea. - -3. _Make all renamed files full on disk._ Now this sounds bad, but hear me out, -it might not be so bad. So first I have to explain that we already materialize -all renamed files. For correctness reasons, Eden has to mark any renamed file -materialized in Eden. Generally, Eden’s “materialized” matches ProjectedFS’s -“full”. So we already do the Eden equivalent of “Make all renamed files full -on disk.” Renamed files is a case where materialized and full do not match up. -Lining these two up more is not such a bad idea. Additionally, we really are -only talking about files here, not directories. Placeholder directories cannot -be renamed in ProjectedFS. ProjectedFS just strait up refuses to let you rename -placeholder directories - you have to manually copy and delete (honestly thank -goodness, because we can barely handle file renames 😅) . Since we are only -talking about files here, we are not talking about crawling anything here, -it’s just marking single files as full. However, the biggest problem here is -that this will be inherintly racy. “Marking files full” means issue a write to -the file on disk. There will be some period when a file is renamed, but not -full. And theoretically the bug would still exist in that period. Plus, there -could be problems with writing the wrong thing to disk. Overall, I think this -is ok, but not an ideal solution. - -4. _Store the source control object in the reparse point._ We can add custom -data to ProjectedFs’s on disk representation for placeholders. We can use that -custom data to include the source control object hash and use that hash to -directly look up the right source control object for a file when asked to read -it. This would be complicated. The reparse buffer storage thing seems sketchy -— I think we have had reliability issues with it in the past. And it could have -performance implications broadly for Eden. However, this seems like the least -bad solution I can come up with. - + inodes.** This unfortunately doesn’t work so well. This is how Eden use to + work, and there were three+ problems. issue a: ProjectedFS is going to ask + Eden to read the original path of renamed files, and this won’t exist in the + inodes. Eden would need to keep some mapping of renames ... see potential + solution #2 for why an exploration of why this is bad. issue b: This causes + ProjectedFS to over zealously create tombstones and cause lots of weird + behavior. issue c: Edens inode state has had a lot of reliability issues on + windows, so reading from source control is more reliable. See the diff + changing diff D32022639 and + [thread](http://xavier%20deguillard%20https//github.com/microsoft/ProjFS-Managed-API/issues/68) + with Microsoft for more details. Overall, we would be adding more problems + than solving to go back to inodes. + +2. **Track renames in Eden and special case reading renamed files.** This + “tracking” would be internal Eden state that can fall out of sync with + reality (i.e. ProjectedFS). And the root root (this duplication is not a + typo) cause of all these problems I am writing about in this post is really + that we duplicate state in Eden that gets our of sync with the source of + truth. So, adding more duplicated state to fix our issue of duplicated state + (in my opinion) is a bad idea. + +3. _Make all renamed files full on disk._ Now this sounds bad, but hear me out, + it might not be so bad. So first I have to explain that we already + materialize all renamed files. For correctness reasons, Eden has to mark any + renamed file materialized in Eden. Generally, Eden’s “materialized” matches + ProjectedFS’s “full”. So we already do the Eden equivalent of “Make all + renamed files full on disk.” Renamed files is a case where materialized and + full do not match up. Lining these two up more is not such a bad idea. + Additionally, we really are only talking about files here, not directories. + Placeholder directories cannot be renamed in ProjectedFS. ProjectedFS just + strait up refuses to let you rename placeholder directories - you have to + manually copy and delete (honestly thank goodness, because we can barely + handle file renames 😅) . Since we are only talking about files here, we are + not talking about crawling anything here, it’s just marking single files as + full. However, the biggest problem here is that this will be inherintly + racy. “Marking files full” means issue a write to the file on disk. There + will be some period when a file is renamed, but not full. And theoretically + the bug would still exist in that period. Plus, there could be problems with + writing the wrong thing to disk. Overall, I think this is ok, but not an + ideal solution. + +4. _Store the source control object in the reparse point._ We can add custom + data to ProjectedFs’s on disk representation for placeholders. We can use + that custom data to include the source control object hash and use that hash + to directly look up the right source control object for a file when asked to + read it. This would be complicated. The reparse buffer storage thing seems + sketchy — I think we have had reliability issues with it in the past. And it + could have performance implications broadly for Eden. However, this seems + like the least bad solution I can come up with. Generally, I think we should go with 3 or 4 here. 3 will be simpler, but 4 is more solid. Mark has mentioned he is interested, but he is looking at other @@ -337,88 +334,88 @@ things first. At some point one of the Eden folks will take a look. But right now we are all dealing with more egregious bugs like problem 4 :). And with that, onto problem 3 and 4 ... - ### Problem 3 & 4 -To recap the problems here are that we try to resurrect deleted files to -match ProjectedFS behavior for placeholder directories. Here are the options: +To recap the problems here are that we try to resurrect deleted files to match +ProjectedFS behavior for placeholder directories. Here are the options: 1. **We could keep trying to match what ProjectedFS does for placeholders.** -Essentially, resurrecting files both on disk and in Eden’s state when they are missing. This would be this kind of behavior: - -|File removed while eden is …|Parent is …|Same named file in scm?|Parent is empty (in both inode and on disk)|Tombstone placed?|Inode state after fsck|Fs state after fsck|In sync|No re-appearing files| -|---|---|---|---|---|---|---|---|---| -|running|placeholder|y/n|y/n|y|No inode|No file on disk|✔️|✔️| -|running|full|y|y|n|No inode|No file on disk|✔️|✔️| -|running|full|y|n|n|Inode with scm hash|Inode with scm content|✔️|❌| -|running|full|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|placeholder|y|y/n|n|inode with scm hash|file on disk with scm content|✔️|❌ | -stopped|placeholder|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|y|y/n|n|Inode with scm hash|Inode with scm content|✔️|❌| -|stopped|full|n|y/n|n|No inode|No file on disk|✔️|✔️| + Essentially, resurrecting files both on disk and in Eden’s state when they + are missing. This would be this kind of behavior: + +| File removed while eden is … | Parent is … | Same named file in scm? | Parent is empty (in both inode and on disk) | Tombstone placed? | Inode state after fsck | Fs state after fsck | In sync | No re-appearing files | +| ---------------------------- | ----------- | ----------------------- | ------------------------------------------- | ----------------- | ---------------------- | ----------------------------- | ------- | --------------------- | +| running | placeholder | y/n | y/n | y | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | y | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | n | n | Inode with scm hash | Inode with scm content | ✔️ | ❌ | +| running | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | placeholder | y | y/n | n | inode with scm hash | file on disk with scm content | ✔️ | ❌ | +| stopped | placeholder | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | full | y | y/n | n | Inode with scm hash | Inode with scm content | ✔️ | ❌ | +| stopped | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | I don’t like this because we are bringing back files with potentially incorrect contents. Especially for files removed while Eden running, this feels egregious. -2. **We could do the first option, but use empty contents instead of source control contents.** +2. **We could do the first option, but use empty contents instead of source + control contents.** -|File removed while eden is …|Parent is …|Same named file in scm?|Parent is empty (in both inode and on disk)|Tombstone placed?|Inode state after fsck|Fs state after fsck|In sync|No re-appearing files| -|---|---|---|---|---|---|---|---|---| -|running|placeholder|y/n|y/n|y|No inode|No file on disk|✔️|✔️| -|running|full|y|y|n|No inode|No file on disk|✔️|✔️| -|running|full|y|n|n|Materialized inode|Empty file|✔️|❌| -|running|full|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|placeholder|y|y/n|n|Materialized inode|Empty file|✔️|❌ | -|stopped|placeholder|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|y|y/n|n|Inode with scm hash|Empty file|✔️|❌| -|stopped|full|n|y/n|n|No inode|No file on disk|✔️|✔️| +| File removed while eden is … | Parent is … | Same named file in scm? | Parent is empty (in both inode and on disk) | Tombstone placed? | Inode state after fsck | Fs state after fsck | In sync | No re-appearing files | +| ---------------------------- | ----------- | ----------------------- | ------------------------------------------- | ----------------- | ---------------------- | ------------------- | ------- | --------------------- | +| running | placeholder | y/n | y/n | y | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | y | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | n | n | Materialized inode | Empty file | ✔️ | ❌ | +| running | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | placeholder | y | y/n | n | Materialized inode | Empty file | ✔️ | ❌ | +| stopped | placeholder | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | full | y | y/n | n | Inode with scm hash | Empty file | ✔️ | ❌ | +| stopped | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | This seems maybe less egregious, but still pretty bad because of the file removed while eden is running changing after restart case. -3. I think the ideal solution we want is to **keep deleted files deleted.** -That looks like this: - -|File removed while eden is …|Parent is …|Same named file in scm?|Parent is empty (in both inode and on disk)|Tombstone placed?|Inode state after fsck|Fs state after fsck|In sync|No re-appearing files| -|---|---|---|---|---|---|---|---|---| -|running|placeholder|y/n|y/n|y|No inode|No file on disk|✔️|✔️| -|running|full|y|y|n|No inode|No file on disk|✔️|✔️| -|running|full|y|n|n|No inode|No file on disk|✔️|✔️| -|running|full|n|y/n|n|No inode|No file on disk|✔️|✔️| -| -stopped|placeholder|y|y/n|n|No inode|No file on disk|✔️|✔️| +3. I think the ideal solution we want is to **keep deleted files deleted.** That + looks like this: + +| File removed while eden is … | Parent is … | Same named file in scm? | Parent is empty (in both inode and on disk) | Tombstone placed? | Inode state after fsck | Fs state after fsck | In sync | No re-appearing files | +| ---------------------------- | ----------- | ----------------------- | ------------------------------------------- | ----------------- | ---------------------- | ------------------- | ------- | --------------------- | +| running | placeholder | y/n | y/n | y | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | y | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | n | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | + +| stopped|placeholder|y|y/n|n|No inode|No file on disk|✔️|✔️| |stopped|placeholder|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|y|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|n|y/n|n|No inode|No file on disk|✔️|✔️| +|stopped|full|y|y/n|n|No inode|No file on disk|✔️|✔️| |stopped|full|n|y/n|n|No +inode|No file on disk|✔️|✔️| This prevents files from re-appearing. Files re-appearing seems like pretty clearly unexpected behavior to me. But this solution is kinda tricky to implement. Particularly for the placeholder case. We have to delete the file -through disk. - like `rm` the file in the repo. Which means more re-entrant -IO. Additionally, this has to happen after Eden starts to ensure a tombstone -gets placed. Currently, FSCK runs before eden starts, so we have queue up some +through disk. - like `rm` the file in the repo. Which means more re-entrant IO. +Additionally, this has to happen after Eden starts to ensure a tombstone gets +placed. Currently, FSCK runs before eden starts, so we have queue up some deletes. And this is kinda messy. Don’t get me wrong I think this is what we want, but option 4 below is perhaps a good intermediary point. 4. We could **better match ProjectedFS’s behavior** (essentially fix “problem 4” -only and skip “problem 3” for now). ProjectedFS only resurrects files in -placeholder directories. So, we could only resurrect files in placeholder -directories and not full directories. This equates to: - - -|File removed while eden is …|Parent is …|Same named file in scm?|Parent is empty (in both inode and on disk)|Tombstone placed?|Inode state after fsck|Fs state after fsck|In sync|No re-appearing files| -|---|---|---|---|---|---|---|---|---| -|running|placeholder|y/n|y/n|y|No inode|No file on disk|✔️|✔️| -|running|full|y|y|n|No inode|No file on disk|✔️|✔️| -|running|full|y|n|n|No inode|No file on disk|✔️|✔️| -|running|full|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|placeholder|y|y/n|n|inode with scm hash|file on disk with scm content|✔️|❌ | -|stopped|placeholder|n|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|y|y/n|n|No inode|No file on disk|✔️|✔️| -|stopped|full|n|y/n|n|No inode|No file on disk|✔️|✔️| - -This is the easiest solution to implement, as we are purely changing Eden’s -view not ProjectedFS’s. And we get part way to solution 3. + only and skip “problem 3” for now). ProjectedFS only resurrects files in + placeholder directories. So, we could only resurrect files in placeholder + directories and not full directories. This equates to: + +| File removed while eden is … | Parent is … | Same named file in scm? | Parent is empty (in both inode and on disk) | Tombstone placed? | Inode state after fsck | Fs state after fsck | In sync | No re-appearing files | +| ---------------------------- | ----------- | ----------------------- | ------------------------------------------- | ----------------- | ---------------------- | ----------------------------- | ------- | --------------------- | +| running | placeholder | y/n | y/n | y | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | y | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | y | n | n | No inode | No file on disk | ✔️ | ✔️ | +| running | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | placeholder | y | y/n | n | inode with scm hash | file on disk with scm content | ✔️ | ❌ | +| stopped | placeholder | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | full | y | y/n | n | No inode | No file on disk | ✔️ | ✔️ | +| stopped | full | n | y/n | n | No inode | No file on disk | ✔️ | ✔️ | + +This is the easiest solution to implement, as we are purely changing Eden’s view +not ProjectedFS’s. And we get part way to solution 3. I have already implemented solution 4. And the next steps are to implement a full solution 3. @@ -429,6 +426,6 @@ We echo some bad behavior in ProjectedFS of resurrecting files in placeholder directories. #4 We over echo that same bad ProjectedFS behavior for full directories. -#1 and #4, the most critical issues (the ones that make `hg status` wrong), -have fixes on the way. #2 and #3, the others (that make Eden behave in weird -ways), are still pending. +#1 and #4, the most critical issues (the ones that make `hg status` wrong), have +fixes on the way. #2 and #3, the others (that make Eden behave in weird ways), +are still pending. diff --git a/eden/fs/docs/img/README.md b/eden/fs/docs/img/README.md index a73d5e1a928f3..12e3e3835526a 100644 --- a/eden/fs/docs/img/README.md +++ b/eden/fs/docs/img/README.md @@ -1,4 +1,4 @@ All of the SVG documents in this directory were generated from the -`edenfs_diagrams.pptx` slide deck. If you want to update any of them you can -update the slide deck, and then simply re-save the specific slide in question -as an SVG file. +`edenfs_diagrams.pptx` slide deck. If you want to update any of them you can +update the slide deck, and then simply re-save the specific slide in question as +an SVG file. diff --git a/eden/fs/docs/slides/Checkout.md b/eden/fs/docs/slides/Checkout.md index 5e6fcd4e29803..fb03e9451ea56 100644 --- a/eden/fs/docs/slides/Checkout.md +++ b/eden/fs/docs/slides/Checkout.md @@ -12,37 +12,50 @@ As of 08/01/2023 - `DRY_RUN`: Ran first on `hg update`, merely reports conflicts to Mercurial - `NORMAL`: Ran after the first `DRY_RUN`, does the actual update -- `FORCE`: Ran on `hg update -C`, this will always prefer the destination commit file content on conflict. +- `FORCE`: Ran on `hg update -C`, this will always prefer the destination commit + file content on conflict. --- # Gist of the algorithm - Only do the minimum required to sync the working copy to the destination - * This means not recursing down to directories that the OS isn't aware of. - * But also not recursing down to directories that are identical between the working copy and the destination commit. -- For every files/directories whose content needs to be updated, EdenFS will notify the OS to invalidate the file/directory. + - This means not recursing down to directories that the OS isn't aware of. + - But also not recursing down to directories that are identical between the + working copy and the destination commit. +- For every files/directories whose content needs to be updated, EdenFS will + notify the OS to invalidate the file/directory. --- # Invalidation - This is done via: - * `invalidateChannelEntryCache`: informs the OS that the given filename in the directory has changed. This also needs to be called for new files to make sure the OS discards its negative path cache. - * `invalidateChannelDirCache`: informs the OS that the directory content has changed. In particular, if a file is added or removed from the directory, this needs to be called. + - `invalidateChannelEntryCache`: informs the OS that the given filename in the + directory has changed. This also needs to be called for new files to make + sure the OS discards its negative path cache. + - `invalidateChannelDirCache`: informs the OS that the directory content has + changed. In particular, if a file is added or removed from the directory, + this needs to be called. --- ## Invalidation on ProjectedFS -- `invalidateChannelEntryCache`: calls into `PrjDeleteFile` to remove the placeholder/full file from disk. Future requests to this file will have EdenFS re-create the file on disk. -- `invalidateChannelDirCache`: calls into `PrjMarkDirectoryAsPlaceholder`, this forces directory listing to always be served by EdenFS for that directory. This is always called after a directory is fully processed. +- `invalidateChannelEntryCache`: calls into `PrjDeleteFile` to remove the + placeholder/full file from disk. Future requests to this file will have EdenFS + re-create the file on disk. +- `invalidateChannelDirCache`: calls into `PrjMarkDirectoryAsPlaceholder`, this + forces directory listing to always be served by EdenFS for that directory. + This is always called after a directory is fully processed. --- ## Invalidation on FUSE -FUSE was the original FsChannel of EdenFS and thus the invalidation were built around the FUSE semantics and the EdenFS invalidation function map 1:1 to FUSE invalidation opcodes. +FUSE was the original FsChannel of EdenFS and thus the invalidation were built +around the FUSE semantics and the EdenFS invalidation function map 1:1 to FUSE +invalidation opcodes. Of note is that invalidations are sent asynchronously. @@ -53,29 +66,51 @@ Of note is that invalidations are sent asynchronously. ## Invalidation on NFS -On NFS, there are no native mechanism to tell the kernel to invalidate its cache. Instead, EdenFS rely on NFS clients looking at the `mtime` of files/directories to invalidate its caches. +On NFS, there are no native mechanism to tell the kernel to invalidate its +cache. Instead, EdenFS rely on NFS clients looking at the `mtime` of +files/directories to invalidate its caches. Similarly to FUSE, NFS invalidations are sent asynchronously. -- `invalidateChannelEntryCache`: Does nothing, the code rely on the parent directory being invalidated with an updated `mtime` which flushes caches. -- `invalidateChannelDirCache`: Uses `chmod(mode)` to force a no-op `SETATTR` to be sent to EdenFS. Historically, EdenFS was merely opening and closing the file to take advantage of the "close to open" consistency, but macOS doesn't respect it. +- `invalidateChannelEntryCache`: Does nothing, the code rely on the parent + directory being invalidated with an updated `mtime` which flushes caches. +- `invalidateChannelDirCache`: Uses `chmod(mode)` to force a no-op `SETATTR` to + be sent to EdenFS. Historically, EdenFS was merely opening and closing the + file to take advantage of the "close to open" consistency, but macOS doesn't + respect it. --- # Core Checkout -- `TreeInode::checkout`: entry point for a directory. It spawns `CheckoutAction` by comparing the currently checked out `Tree` to the destination `Tree` in `TreeInode::computeCheckoutActions`. Once all actions have completed, invalidation is run for that directory and the overlay is updated. -- `TreeInode::processCheckoutEntry`: called by `TreeInode::checkout` and runs with the `contents_` lock held, this will handle addition and removal immediately and defers conflict checks to `CheckoutAction`. -- `CheckoutAction`: wrapper class to simplify loading `Blob` sha1 and `Tree`. Once loaded and conflicts are resolved, calls `TreeInode::checkoutUpdateEntry`. -- `TreeInode::checkoutUpdateEntry`: called once the `Blob` and `Tree` are loaded, this will take the `contents_` locks, revalidate it to ensure that no new conflicts arose since `TreeInode::checkout` released the lock, and perform in place modification. For directories, this recurses down by calling `TreeInode::checkout`. +- `TreeInode::checkout`: entry point for a directory. It spawns `CheckoutAction` + by comparing the currently checked out `Tree` to the destination `Tree` in + `TreeInode::computeCheckoutActions`. Once all actions have completed, + invalidation is run for that directory and the overlay is updated. +- `TreeInode::processCheckoutEntry`: called by `TreeInode::checkout` and runs + with the `contents_` lock held, this will handle addition and removal + immediately and defers conflict checks to `CheckoutAction`. +- `CheckoutAction`: wrapper class to simplify loading `Blob` sha1 and `Tree`. + Once loaded and conflicts are resolved, calls + `TreeInode::checkoutUpdateEntry`. +- `TreeInode::checkoutUpdateEntry`: called once the `Blob` and `Tree` are + loaded, this will take the `contents_` locks, revalidate it to ensure that no + new conflicts arose since `TreeInode::checkout` released the lock, and perform + in place modification. For directories, this recurses down by calling + `TreeInode::checkout`. --- # Overlay update -- At the end of `TreeInode::checkout`, after processing all the `CheckoutAction`, the overlay is updated to the destination state (`TreeInode::saveOverlayPostCheckout`). -- Since the overlay for this `TreeInode` is updated, this needs to be recorded in the parent directory overlay, which will force it to be materialized and written to disk (potentially recursively). - * The number of overlay writes for a directory is thus potentially `O(number of subdirectory)` +- At the end of `TreeInode::checkout`, after processing all the + `CheckoutAction`, the overlay is updated to the destination state + (`TreeInode::saveOverlayPostCheckout`). +- Since the overlay for this `TreeInode` is updated, this needs to be recorded + in the parent directory overlay, which will force it to be materialized and + written to disk (potentially recursively). + - The number of overlay writes for a directory is thus potentially + `O(number of subdirectory)` --- diff --git a/eden/fs/docs/stats/DynamicStats.md b/eden/fs/docs/stats/DynamicStats.md index 5bdb9f53ac57e..305ddf7a959af 100644 --- a/eden/fs/docs/stats/DynamicStats.md +++ b/eden/fs/docs/stats/DynamicStats.md @@ -1,25 +1,38 @@ -Dynamic Counters -=============== +# Dynamic Counters -The values of dynamic counters are computed and updated on-demand whenever they are requested by a monitoring tool or API. -When a monitoring tool or API requests the value of one of these counters (e.g., using the fb303 command-line client or the `getCounters()` Thrift method), the Facebook Base library (also known as fb303) will call the corresponding lambda function that was registered with the counter. This lambda function will then call the appropriate method to retrieve the current value of the counter. +The values of dynamic counters are computed and updated on-demand whenever they +are requested by a monitoring tool or API. When a monitoring tool or API +requests the value of one of these counters (e.g., using the fb303 command-line +client or the `getCounters()` Thrift method), the Facebook Base library (also +known as fb303) will call the corresponding lambda function that was registered +with the counter. This lambda function will then call the appropriate method to +retrieve the current value of the counter. + +Most of these counters register their call-back functions in `EdenServer.cpp` +with the following code: -Most of these counters register their call-back functions in `EdenServer.cpp` with the following code: ``` auto counters = fb303::ServiceData::get()->getDynamicCounters(); counters->registerCallback(counterName, lambdaFunction); ``` All of these counters should be unregistered on the deconstruction methods. + ``` counters->unregisterCallback(counterName); ``` ### Note: -The frequency at which the counters are updated depends on how often they are queried by monitoring tools or APIs. If the counters are not queried frequently, their values may become stale or outdated. + +The frequency at which the counters are updated depends on how often they are +queried by monitoring tools or APIs. If the counters are not queried frequently, +their values may become stale or outdated. ## SaplingBackingStore dynamic Counter -Based on the `RequestStage` enum we have two sets of dynamic counters in the BackingStore layer. + +Based on the `RequestStage` enum we have two sets of dynamic counters in the +BackingStore layer. + ``` /** * stages of requests that are tracked, these represent where any request @@ -36,72 +49,105 @@ enum RequestStage { LIVE, }; ``` -1. `store.sapling.pending_import.{xxx}.count` and `store.sapling.pending_import.{xxx}.max_duration_us` : -Show the number and max duration of the xxx (blob, blobmetadata, tree, prefetch) requests which are in the SaplingImportRequestQueue. This is the total time of waiting in the queue and being a live request. - -2. `store.sapling.pending_import.count` and `store.sapling.pending_import.max_duration_us` : -Show the number and max duration of all the objects(blob, prefetch blob, tree, blob metadata) requests which are in the SaplingImportRequestQueue. This is the total time of waiting in the queue and being a live request. - -3. `store.sapling.live_import.{xxx}.count` and `store.sapling.live_import.{xxx}.max_duration_us` : -Show the number and max duration of xxx (blob, blobmetadata, tree, prefetch) requests being fetched individually from the backing store. After sending a batch of requests to Sapling, only the failed requests will be sent individually to the backing store (getRetry functions). Therefore, these dynamic counters only get value when a retry happens. - -4. `store.sapling.live_import.batched_{xxx}.count` and `store.sapling.live_import.batched_{xxx}.max_duration_us` : -When SaplingBackingStore is preparing a batch of xxx (blob, blobmetadata, tree) request, it pairs the request with a watch list and starts a watch. This watch will stop when the request is fulfilled. Therefore, these dynamic counters show the number and max duration of batches of xxx (blob, blobmetadata, tree) requests right now are processing in backingstore. - -5. `store.sapling.live_import.count` and `store.sapling.live_import.max_duration_us` : -Show the number and max duration of all the object (blob, blob metadata, prefetch blob, tree) requests being fetched individually or in a batch from the backing store. +1. `store.sapling.pending_import.{xxx}.count` and + `store.sapling.pending_import.{xxx}.max_duration_us` : Show the number and + max duration of the xxx (blob, blobmetadata, tree, prefetch) requests which + are in the SaplingImportRequestQueue. This is the total time of waiting in + the queue and being a live request. + +2. `store.sapling.pending_import.count` and + `store.sapling.pending_import.max_duration_us` : Show the number and max + duration of all the objects(blob, prefetch blob, tree, blob metadata) + requests which are in the SaplingImportRequestQueue. This is the total time + of waiting in the queue and being a live request. + +3. `store.sapling.live_import.{xxx}.count` and + `store.sapling.live_import.{xxx}.max_duration_us` : Show the number and max + duration of xxx (blob, blobmetadata, tree, prefetch) requests being fetched + individually from the backing store. After sending a batch of requests to + Sapling, only the failed requests will be sent individually to the backing + store (getRetry functions). Therefore, these dynamic counters only get value + when a retry happens. + +4. `store.sapling.live_import.batched_{xxx}.count` and + `store.sapling.live_import.batched_{xxx}.max_duration_us` : When + SaplingBackingStore is preparing a batch of xxx (blob, blobmetadata, tree) + request, it pairs the request with a watch list and starts a watch. This + watch will stop when the request is fulfilled. Therefore, these dynamic + counters show the number and max duration of batches of xxx (blob, + blobmetadata, tree) requests right now are processing in backingstore. + +5. `store.sapling.live_import.count` and + `store.sapling.live_import.max_duration_us` : Show the number and max + duration of all the object (blob, blob metadata, prefetch blob, tree) + requests being fetched individually or in a batch from the backing store. ## FSChannel Dynamic Counters -1. `fs.task.count` : -Count the number of tasks queued up for the fschannelthreads. We will monitor this when we unbound to see how much memory we are using and ensure that things are staying reasonable. This will also help inform how many pending requests we should allow max. - -## inodeMap Dynamic counters -1. `inodemap.{mountBasename ex. fbsource or www}.loaded` : -Number of loaded inodes in the inodemap for an eden mount. `eden stats` command will show these counters -2. `inodemap.{mountBasename ex. fbsource or www}.unloaded` : -Number of unloaded inodes in the inodemap for an eden mount. `eden stats` command will show these counters +1. `fs.task.count` : Count the number of tasks queued up for the + fschannelthreads. We will monitor this when we unbound to see how much memory + we are using and ensure that things are staying reasonable. This will also + help inform how many pending requests we should allow max. -3. `inodemap.{mountBasename ex. fbsource or www}.unloaded_linked_inodes` : -The number of inodes that we have unloaded with our periodic linked inode unloading. Periodic linked inode unloading can be run at regular intervals on any mount type. This is the periodic task to clean up the inodes that are not used recently. +## inodeMap Dynamic counters -4. `inodemap.{mountBasename ex. fbsource or www}.unloaded_unlinked_inodes` : -The number of inodes that we have unloaded with our periodic unlinked inode unloading. Periodic unlinked inode unloading is run after operations that unlink lots of inodes like checkout on NFS mounts. This counter only has value in macOS. -NFSv3 has no inode invalidation flow built into the protocol. The kernel does not send us forget messages like we get in FUSE. The kernel also does not send us notifications when a file is closed. Thus EdenFS can not easily tell when all handles to a file have been closed. More details on the summary of this [commit](https://github.com/facebook/sapling/commit/ffa558bf847c5be4adc82899a793f3996619f332) +1. `inodemap.{mountBasename ex. fbsource or www}.loaded` : Number of loaded + inodes in the inodemap for an eden mount. `eden stats` command will show + these counters + +2. `inodemap.{mountBasename ex. fbsource or www}.unloaded` : Number of unloaded + inodes in the inodemap for an eden mount. `eden stats` command will show + these counters + +3. `inodemap.{mountBasename ex. fbsource or www}.unloaded_linked_inodes` : The + number of inodes that we have unloaded with our periodic linked inode + unloading. Periodic linked inode unloading can be run at regular intervals on + any mount type. This is the periodic task to clean up the inodes that are not + used recently. + +4. `inodemap.{mountBasename ex. fbsource or www}.unloaded_unlinked_inodes` : The + number of inodes that we have unloaded with our periodic unlinked inode + unloading. Periodic unlinked inode unloading is run after operations that + unlink lots of inodes like checkout on NFS mounts. This counter only has + value in macOS. NFSv3 has no inode invalidation flow built into the protocol. + The kernel does not send us forget messages like we get in FUSE. The kernel + also does not send us notifications when a file is closed. Thus EdenFS can + not easily tell when all handles to a file have been closed. More details on + the summary of this + [commit](https://github.com/facebook/sapling/commit/ffa558bf847c5be4adc82899a793f3996619f332) ## Journal Dynamic counters -1. `journal.{mountBasename ex. fbsource or www}.count` : -Show the number of entry in Journal -2. `journal.{mountBasename ex. fbsource or www}.duration_secs` : -Show how far back the Journal goes in seconds +1. `journal.{mountBasename ex. fbsource or www}.count` : Show the number of + entry in Journal -3. `journal.{mountBasename ex. fbsource or www}.files_accumulated.max` : -Show the maximum number of files that accumulated in Journal. +2. `journal.{mountBasename ex. fbsource or www}.duration_secs` : Show how far + back the Journal goes in seconds -4. `journal.{mountBasename ex. fbsource or www}.memory` : -Show the memory usage of the Journal. +3. `journal.{mountBasename ex. fbsource or www}.files_accumulated.max` : Show + the maximum number of files that accumulated in Journal. + +4. `journal.{mountBasename ex. fbsource or www}.memory` : Show the memory usage + of the Journal. ## Fuse Dynamic counters -1. `fuse.{mountBasename ex. fbsource or www}.live_requests.count` : -Show the number of live Fuse requests + +1. `fuse.{mountBasename ex. fbsource or www}.live_requests.count` : Show the + number of live Fuse requests 2. `fuse.{mountBasename ex. fbsource or www}.live_requests.max_duration_us` : -Show the maximum duration of Fuse live requests + Show the maximum duration of Fuse live requests -3. `fuse.{mountBasename ex. fbsource or www}.pending_requests.count` : -Show the number of Fuse pending requests +3. `fuse.{mountBasename ex. fbsource or www}.pending_requests.count` : Show the + number of Fuse pending requests ## Cache Dynamic counters -1. `blob_cache.memory` : -Show the total size of items in blob cache -2. `blob_cache.items` : -Count the number of items in blob cache +1. `blob_cache.memory` : Show the total size of items in blob cache + +2. `blob_cache.items` : Count the number of items in blob cache -3. `tree_cache.memory` : -Show the total size of items in tree cache +3. `tree_cache.memory` : Show the total size of items in tree cache -4. `tree_cache.items` : -Count the number of items in tree cache +4. `tree_cache.items` : Count the number of items in tree cache diff --git a/eden/fs/docs/stats/EdenStats.md b/eden/fs/docs/stats/EdenStats.md index 9f7801060efbe..9e9d3cb185dfb 100644 --- a/eden/fs/docs/stats/EdenStats.md +++ b/eden/fs/docs/stats/EdenStats.md @@ -1,218 +1,247 @@ -EdenStats Counter/Duration -=============== +# EdenStats Counter/Duration + +These stats are all listed in `EdenStats.h` file. There are two type of stats in +this file: -These stats are all listed in `EdenStats.h` file. There are two type of stats in this file: - `Counter` : The static counters can call `increment()` method to add a number -- `Duration` : These stats record duration of the events with `addDuration()` method. -> ## Note: These stats get turned into a histogram, and EdenFS reports the followings for them -> - Export Types -> - count(the number of times that `increment()` or `addDuration()` get called) -> - sum(accumulated value in the counter/duration) -> - average (sum/count) -> - rate -> - Sliding window: all these four export types are reported on sliding windows of -> - 1 min, 10 min, and 1 hour -> - Only `Durations` are turned into the following percentiles -> - P1, P10, P50, P90, and P99 - -You can see the list of all the EdenFS counter and their current values by running +- `Duration` : These stats record duration of the events with `addDuration()` + method. + > ## Note: These stats get turned into a histogram, and EdenFS reports the followings for them + > + > - Export Types + > - count(the number of times that `increment()` or `addDuration()` get + > called) + > - sum(accumulated value in the counter/duration) + > - average (sum/count) + > - rate + > - Sliding window: all these four export types are reported on sliding + > windows of + > - 1 min, 10 min, and 1 hour + > - Only `Durations` are turned into the following percentiles + > - P1, P10, P50, P90, and P99 + +You can see the list of all the EdenFS counter and their current values by +running + ``` $ eden debug thrift getCounters --json ``` The list of all the EdenStats Counter/Duration are as follows: + - [SaplingBackingStoreStats](./SaplingBackingStoreStats.md) - [ObjectStoreStats](./ObjectStoreStats.md) - [LocalStoreStats](./LocalStoreStats.md) - [OverlayStats](./OverlayStats.md) - JournalStats - 1. `Counter truncatedReads{"journal.truncated_reads"}` : - Number of times a truncated read happens in Journal. - 2. `Counter filesAccumulated{"journal.files_accumulated"}` : - Number of files accumulated in Journal. + 1. `Counter truncatedReads{"journal.truncated_reads"}` : Number of times a + truncated read happens in Journal. - 3. `Duration accumulateRange{"journal.accumulate_range_us"}` : - The duration of the journal accumulates range function. + 2. `Counter filesAccumulated{"journal.files_accumulated"}` : Number of files + accumulated in Journal. - 4. `Counter journalStatusCacheHit{"journal.status_cache_hit"}` : - Number of cache hits. This is updated when we have a valid SCM status result in cache to return given the current Journal sequence number. + 3. `Duration accumulateRange{"journal.accumulate_range_us"}` : The duration of + the journal accumulates range function. - 5. `Counter journalStatusCacheMiss{"journal.status_cache_miss"}` : - Number of cache misses. This is updated when we don't have a valid SCM status result in cache to return given the current Journal sequence number. + 4. `Counter journalStatusCacheHit{"journal.status_cache_hit"}` : Number of + cache hits. This is updated when we have a valid SCM status result in cache + to return given the current Journal sequence number. - 6. `Counter journalStatusCacheSkip{"journal.status_cache_skip"}` : - Number of cache insertion skipped. This is updated when we skip inserting a new entry into the cache when the number of the entries from the calculated result is larger than the limit configured [here](https://fburl.com/code/flwry2g4). + 5. `Counter journalStatusCacheMiss{"journal.status_cache_miss"}` : Number of + cache misses. This is updated when we don't have a valid SCM status result + in cache to return given the current Journal sequence number. + 6. `Counter journalStatusCacheSkip{"journal.status_cache_skip"}` : Number of + cache insertion skipped. This is updated when we skip inserting a new entry + into the cache when the number of the entries from the calculated result is + larger than the limit configured [here](https://fburl.com/code/flwry2g4). - ThriftStats - 1. `Duration streamChangesSince{ "thrift.StreamingEdenService.streamChangesSince.streaming_time_us"}` : - Duration of thrift stream change calls. - 2. `Duration streamSelectedChangesSince{"thrift.StreamingEdenService.streamSelectedChangesSince.streaming_time_us"}` : - Duration of thrift stream change calls for selected changes. + 1. `Duration streamChangesSince{ "thrift.StreamingEdenService.streamChangesSince.streaming_time_us"}` + : Duration of thrift stream change calls. - 3. `Counter globFilesSaplingRemoteAPISuccess{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_success"}` : - Count number of times globFiles succeed using remote pathway + 2. `Duration streamSelectedChangesSince{"thrift.StreamingEdenService.streamSelectedChangesSince.streaming_time_us"}` + : Duration of thrift stream change calls for selected changes. - 4. `Counter globFilesSaplingRemoteAPIFallback{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_fallback"}` : - Count number of times globFiles fails using the remote pathway and ends up using the fallback pathway + 3. `Counter globFilesSaplingRemoteAPISuccess{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_success"}` + : Count number of times globFiles succeed using remote pathway - 5. `Counter globFilesLocal{"thrift.EdenServiceHandler.glob_files.local_success"}` : - Count number of times globFiles succeed using the indicated local pathway + 4. `Counter globFilesSaplingRemoteAPIFallback{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_fallback"}` + : Count number of times globFiles fails using the remote pathway and ends + up using the fallback pathway - 6. `Duration globFilesSaplingRemoteAPISuccessDuration{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_success_duration_us"}` : - Duration for how long it takes globFiles to execute the remote pathway + 5. `Counter globFilesLocal{"thrift.EdenServiceHandler.glob_files.local_success"}` + : Count number of times globFiles succeed using the indicated local pathway - 7. `Duration globFilesSaplingRemoteAPIFallbackDuration{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_fallback_duration_us"}` : - Duration for how long it takes globFiles to execute the fallback pathway + 6. `Duration globFilesSaplingRemoteAPISuccessDuration{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_success_duration_us"}` + : Duration for how long it takes globFiles to execute the remote pathway - 8. `Duration globFilesLocalDuration{"thrift.EdenServiceHandler.glob_files.local_duration_us"}` : - Duration for how long it takes globFiles to execute in the indicated local pathway + 7. `Duration globFilesSaplingRemoteAPIFallbackDuration{"thrift.EdenServiceHandler.glob_files.sapling_remote_api_fallback_duration_us"}` + : Duration for how long it takes globFiles to execute the fallback pathway - 9. `Duration globFilesLocalOffloadableDuration{"thrift.EdenServiceHandler.glob_files.local_offloadable_duration_us"}` : - Duration for how long it takes globFiles to execute a potentially offloadable request locally + 8. `Duration globFilesLocalDuration{"thrift.EdenServiceHandler.glob_files.local_duration_us"}` + : Duration for how long it takes globFiles to execute in the indicated + local pathway + 9. `Duration globFilesLocalOffloadableDuration{"thrift.EdenServiceHandler.glob_files.local_offloadable_duration_us"}` + : Duration for how long it takes globFiles to execute a potentially + offloadable request locally - InodeMapStats - 1. `Counter lookupTreeInodeHit{"inode_map.lookup_tree_inode_hit"}` : - Count the number of Tree Inodes found in the InodeMap - 2. `Counter lookupBlobInodeHit{"inode_map.lookup_blob_inode_hit"}` : - Count the number of Blob Inodes found in the InodeMap + 1. `Counter lookupTreeInodeHit{"inode_map.lookup_tree_inode_hit"}` : Count the + number of Tree Inodes found in the InodeMap - 3. `Counter lookupTreeInodeMiss{"inode_map.lookup_tree_inode_miss"}` : - Count the number of Tree Inodes missed in the InodeMap + 2. `Counter lookupBlobInodeHit{"inode_map.lookup_blob_inode_hit"}` : Count the + number of Blob Inodes found in the InodeMap - 4. `Counter lookupBlobInodeMiss{"inode_map.lookup_blob_inode_miss"}` : - Count the number of Blob Inodes missed in the InodeMap + 3. `Counter lookupTreeInodeMiss{"inode_map.lookup_tree_inode_miss"}` : Count + the number of Tree Inodes missed in the InodeMap - 5. `Counter lookupInodeError{"inode_map.lookup_inode_error"}` : - Count the number of Inodes lookup errors + 4. `Counter lookupBlobInodeMiss{"inode_map.lookup_blob_inode_miss"}` : Count + the number of Blob Inodes missed in the InodeMap + 5. `Counter lookupInodeError{"inode_map.lookup_inode_error"}` : Count the + number of Inodes lookup errors - InodeMetadataTableStats - 1. `Counter getHit{"inode_metadata_table.get_hit"}` : - Count the number of hits in InodeMetadata Table - 2. `Counter getMiss{"inode_metadata_table.get_miss"}` : - Count the number of misses in InodeMetadata Table + 1. `Counter getHit{"inode_metadata_table.get_hit"}` : Count the number of hits + in InodeMetadata Table + 2. `Counter getMiss{"inode_metadata_table.get_miss"}` : Count the number of + misses in InodeMetadata Table - BlobCacheStats - 1. `Counter getHit{"blob_cache.get_hit"}` : - Number of times BlobCache request got hit - 2. `Counter getMiss{"blob_cache.get_miss"}` : - Number of times BlobCache request got miss + 1. `Counter getHit{"blob_cache.get_hit"}` : Number of times BlobCache request + got hit - 3. `Counter insertEviction{"blob_cache.insert_eviction"}` : - Number of blobs evicted from cache (The cache reaches its maximum size and the LRU (least recently used) item evicted from cache) + 2. `Counter getMiss{"blob_cache.get_miss"}` : Number of times BlobCache + request got miss - 4. `Counter objectDrop{"blob_cache.object_drop"}` : - Number of blobs dropped from cache (For some reason the object was invalid, and it got dropped from the cache) + 3. `Counter insertEviction{"blob_cache.insert_eviction"}` : Number of blobs + evicted from cache (The cache reaches its maximum size and the LRU (least + recently used) item evicted from cache) + 4. `Counter objectDrop{"blob_cache.object_drop"}` : Number of blobs dropped + from cache (For some reason the object was invalid, and it got dropped from + the cache) - TreeCacheStats - 1. `Counter getHit{"tree_cache.get_hit"}` : - Number of times TreeCache request got hit - 2. `Counter getMiss{"tree_cache.get_miss"}` : - Number of times TreeCache request got miss + 1. `Counter getHit{"tree_cache.get_hit"}` : Number of times TreeCache request + got hit - 3. `Counter insertEviction{"tree_cache.insert_eviction"}` : - Number of trees evicted from cache (The cache reaches its maximum size and the LRU (least recently used) item evicted from cache) + 2. `Counter getMiss{"tree_cache.get_miss"}` : Number of times TreeCache + request got miss - 4. `Counter objectDrop{"tree_cache.object_drop"}` : - Number of trees dropped from cache (For some reason the object was invalid and it got dropped from the cache) + 3. `Counter insertEviction{"tree_cache.insert_eviction"}` : Number of trees + evicted from cache (The cache reaches its maximum size and the LRU (least + recently used) item evicted from cache) + 4. `Counter objectDrop{"tree_cache.object_drop"}` : Number of trees dropped + from cache (For some reason the object was invalid and it got dropped from + the cache) - FakeStats - - This is a fake stats object that is used for testing. Counter/Duration objects can be added here to mirror variables used in real stats objects as needed. + - This is a fake stats object that is used for testing. Counter/Duration + objects can be added here to mirror variables used in real stats objects as + needed. - FuseStats - - In Fuse FS the following ODS Durations record the duration of each Fuse command in microseconds. Also, we have counters for all these durations for Successful/Failure events. - ``` - Duration lookup{"fuse.lookup_us"} - Duration forget{"fuse.forget_us"} - Duration getattr{"fuse.getattr_us"} - Duration setattr{"fuse.setattr_us"} - Duration readlink{"fuse.readlink_us"} - Duration mknod{"fuse.mknod_us"} - Duration mkdir{"fuse.mkdir_us"} - Duration unlink{"fuse.unlink_us"} - Duration rmdir{"fuse.rmdir_us"} - Duration symlink{"fuse.symlink_us"} - Duration rename{"fuse.rename_us"} - Duration link{"fuse.link_us"} - Duration open{"fuse.open_us"} - Duration read{"fuse.read_us"} - Duration write{"fuse.write_us"} - Duration flush{"fuse.flush_us"} - Duration release{"fuse.release_us"} - Duration fsync{"fuse.fsync_us"} - Duration opendir{"fuse.opendir_us"} - Duration readdir{"fuse.readdir_us"} - Duration releasedir{"fuse.releasedir_us"} - Duration fsyncdir{"fuse.fsyncdir_us"} - Duration statfs{"fuse.statfs_us"} - Duration setxattr{"fuse.setxattr_us"} - Duration getxattr{"fuse.getxattr_us"} - Duration listxattr{"fuse.listxattr_us"} - Duration removexattr{"fuse.removexattr_us"} - Duration access{"fuse.access_us"} - Duration create{"fuse.create_us"} - Duration bmap{"fuse.bmap_us"} - Duration forgetmulti{"fuse.forgetmulti_us"} - Duration fallocate{"fuse.fallocate_us"} - ``` + - In Fuse FS the following ODS Durations record the duration of each Fuse + command in microseconds. Also, we have counters for all these durations for + Successful/Failure events. + + ``` + Duration lookup{"fuse.lookup_us"} + Duration forget{"fuse.forget_us"} + Duration getattr{"fuse.getattr_us"} + Duration setattr{"fuse.setattr_us"} + Duration readlink{"fuse.readlink_us"} + Duration mknod{"fuse.mknod_us"} + Duration mkdir{"fuse.mkdir_us"} + Duration unlink{"fuse.unlink_us"} + Duration rmdir{"fuse.rmdir_us"} + Duration symlink{"fuse.symlink_us"} + Duration rename{"fuse.rename_us"} + Duration link{"fuse.link_us"} + Duration open{"fuse.open_us"} + Duration read{"fuse.read_us"} + Duration write{"fuse.write_us"} + Duration flush{"fuse.flush_us"} + Duration release{"fuse.release_us"} + Duration fsync{"fuse.fsync_us"} + Duration opendir{"fuse.opendir_us"} + Duration readdir{"fuse.readdir_us"} + Duration releasedir{"fuse.releasedir_us"} + Duration fsyncdir{"fuse.fsyncdir_us"} + Duration statfs{"fuse.statfs_us"} + Duration setxattr{"fuse.setxattr_us"} + Duration getxattr{"fuse.getxattr_us"} + Duration listxattr{"fuse.listxattr_us"} + Duration removexattr{"fuse.removexattr_us"} + Duration access{"fuse.access_us"} + Duration create{"fuse.create_us"} + Duration bmap{"fuse.bmap_us"} + Duration forgetmulti{"fuse.forgetmulti_us"} + Duration fallocate{"fuse.fallocate_us"} + ``` - NfsStats - - In NFS the following ODS Durations record the duration of each NFS command in microseconds. Also, we have counters for all of these duration for Successful/Failure events. - ``` - Duration nfsNull{"nfs.null_us"} - Duration nfsGetattr{"nfs.getattr_us"} - Duration nfsSetattr{"nfs.setattr_us"} - Duration nfsLookup{"nfs.lookup_us"} - Duration nfsAccess{"nfs.access_us"} - Duration nfsReadlink{"nfs.readlink_us"} - Duration nfsRead{"nfs.read_us"} - Duration nfsWrite{"nfs.write_us"} - Duration nfsCreate{"nfs.create_us"} - Duration nfsMkdir{"nfs.mkdir_us"} - Duration nfsSymlink{"nfs.symlink_us"} - Duration nfsMknod{"nfs.mknod_us"} - Duration nfsRemove{"nfs.remove_us"} - Duration nfsRmdir{"nfs.rmdir_us"} - Duration nfsRename{"nfs.rename_us"} - Duration nfsLink{"nfs.link_us"} - Duration nfsReaddir{"nfs.readdir_us"} - Duration nfsReaddirplus{"nfs.readdirplus_us"} - Duration nfsFsstat{"nfs.fsstat_us"} - Duration nfsFsinfo{"nfs.fsinfo_us"} - Duration nfsPathconf{"nfs.pathconf_us"} - Duration nfsCommit{"nfs.commit_us"} - ``` + - In NFS the following ODS Durations record the duration of each NFS command + in microseconds. Also, we have counters for all of these duration for + Successful/Failure events. + + ``` + Duration nfsNull{"nfs.null_us"} + Duration nfsGetattr{"nfs.getattr_us"} + Duration nfsSetattr{"nfs.setattr_us"} + Duration nfsLookup{"nfs.lookup_us"} + Duration nfsAccess{"nfs.access_us"} + Duration nfsReadlink{"nfs.readlink_us"} + Duration nfsRead{"nfs.read_us"} + Duration nfsWrite{"nfs.write_us"} + Duration nfsCreate{"nfs.create_us"} + Duration nfsMkdir{"nfs.mkdir_us"} + Duration nfsSymlink{"nfs.symlink_us"} + Duration nfsMknod{"nfs.mknod_us"} + Duration nfsRemove{"nfs.remove_us"} + Duration nfsRmdir{"nfs.rmdir_us"} + Duration nfsRename{"nfs.rename_us"} + Duration nfsLink{"nfs.link_us"} + Duration nfsReaddir{"nfs.readdir_us"} + Duration nfsReaddirplus{"nfs.readdirplus_us"} + Duration nfsFsstat{"nfs.fsstat_us"} + Duration nfsFsinfo{"nfs.fsinfo_us"} + Duration nfsPathconf{"nfs.pathconf_us"} + Duration nfsCommit{"nfs.commit_us"} + ``` - PrjfsStats - - In prjFS the following ODS Durations record the duration of each command in microseconds. Also, we have counters for all of these duration for Successful/Failure events. - ``` - Duration newFileCreated{"prjfs.newFileCreated_us"} - Duration fileOverwritten{"prjfs.fileOverwritten_us"} - Duration fileHandleClosedFileModified{"prjfs.fileHandleClosedFileModified_us"} - Duration fileRenamed{"prjfs.fileRenamed_us"} - Duration preDelete{"prjfs.preDelete_us"} - Duration preRenamed{"prjfs.preRenamed_us"} - Duration fileHandleClosedFileDeleted{"prjfs.fileHandleClosedFileDeleted_us"} - Duration preSetHardlink{"prjfs.preSetHardlink_us"} - Duration preConvertToFull{"prjfs.preConvertToFull_us"} - Duration openDir{"prjfs.opendir_us"} - Duration readDir{"prjfs.readdir_us"} - Duration lookup{"prjfs.lookup_us"} - Duration access{"prjfs.access_us"} - Duration read{"prjfs.read_us"} - Duration removeCachedFile{"prjfs.remove_cached_file_us"} - Duration addDirectoryPlaceholder{"prjfs.add_directory_placeholder_us"} - ``` + - In prjFS the following ODS Durations record the duration of each command in + microseconds. Also, we have counters for all of these duration for + Successful/Failure events. + ``` + Duration newFileCreated{"prjfs.newFileCreated_us"} + Duration fileOverwritten{"prjfs.fileOverwritten_us"} + Duration fileHandleClosedFileModified{"prjfs.fileHandleClosedFileModified_us"} + Duration fileRenamed{"prjfs.fileRenamed_us"} + Duration preDelete{"prjfs.preDelete_us"} + Duration preRenamed{"prjfs.preRenamed_us"} + Duration fileHandleClosedFileDeleted{"prjfs.fileHandleClosedFileDeleted_us"} + Duration preSetHardlink{"prjfs.preSetHardlink_us"} + Duration preConvertToFull{"prjfs.preConvertToFull_us"} + Duration openDir{"prjfs.opendir_us"} + Duration readDir{"prjfs.readdir_us"} + Duration lookup{"prjfs.lookup_us"} + Duration access{"prjfs.access_us"} + Duration read{"prjfs.read_us"} + Duration removeCachedFile{"prjfs.remove_cached_file_us"} + Duration addDirectoryPlaceholder{"prjfs.add_directory_placeholder_us"} + ``` diff --git a/eden/fs/docs/stats/LocalStoreStats.md b/eden/fs/docs/stats/LocalStoreStats.md index 084628ed58474..f769cca566c30 100644 --- a/eden/fs/docs/stats/LocalStoreStats.md +++ b/eden/fs/docs/stats/LocalStoreStats.md @@ -1,21 +1,20 @@ -LocalStoreStats -=============== +# LocalStoreStats 1. `Duration get{xxx}{"local_store.get_{xxx}_us"}` : The duration of fetching a xxx (blob, blobmetadata, tree) from Local Store - 2. `Counter get{xxx}Success{"local_store.get_{xxx}_success"}` : -Count the number of xxx (blob, blobmetadata, tree) that are successfully fetched from local store - +Count the number of xxx (blob, blobmetadata, tree) that are successfully fetched +from local store 3. `Counter get{xxx}Failure{"local_store.get_{xxx}_failure"}` : -Count the number of xxx (blob, blobmetadata, tree) that cannot get from local store - +Count the number of xxx (blob, blobmetadata, tree) that cannot get from local +store 4. `Counter get{xxx}Error{"local_store.get_{xxx}_error"}` : -Count the number of xxx (blob, blobmetadata, tree) that are fetched from local store but it cannot get parsed. +Count the number of xxx (blob, blobmetadata, tree) that are fetched from local +store but it cannot get parsed. diff --git a/eden/fs/docs/stats/ObjectStoreStats.md b/eden/fs/docs/stats/ObjectStoreStats.md index 6635b566894c0..af2443c2e28e1 100644 --- a/eden/fs/docs/stats/ObjectStoreStats.md +++ b/eden/fs/docs/stats/ObjectStoreStats.md @@ -1,46 +1,44 @@ -ObjectStoreStats -=============== +# ObjectStoreStats 1. `Duration get{xxx}{"store.get_{xxx}_us"}` : -The whole duration of get{xxx} xxx (blob, blobmetadata, tree) in ObjectStore. Consider that ObjectStore can get Object from memory (MemoryCache), LocalStore (OndiskCache), or BackingStore - +The whole duration of get{xxx} xxx (blob, blobmetadata, tree) in ObjectStore. +Consider that ObjectStore can get Object from memory (MemoryCache), LocalStore +(OndiskCache), or BackingStore 2. `Counter get{xxx}FromMemory{"object_store.get_{xxx}.memory"}` : -Count the number of xxx (blob, blobmetadata, tree) that are successfully obtained from MemoryCache. It doesn’t check the local store either. - +Count the number of xxx (blob, blobmetadata, tree) that are successfully +obtained from MemoryCache. It doesn’t check the local store either. 3. `Counter get{xxx}FromLocalStore{"object_store.get_{xxx}.local_store"}` : -Count the number of xxx (blob, blobmetadata, tree) that are successfully obtained from LocalStore (OnDiskCache). It doesn’t hit the BackingStore. - +Count the number of xxx (blob, blobmetadata, tree) that are successfully +obtained from LocalStore (OnDiskCache). It doesn’t hit the BackingStore. 4. `Counter get{xxx}FromBackingStore{"object_store.get_{xxx}.backing_store"}` : -Count the number of xxx (blob, blobmetadata, tree) that are obtained from BackingStore. - +Count the number of xxx (blob, blobmetadata, tree) that are obtained from +BackingStore. 5. `Counter get{xxx}Failed{"object_store.get_{xxx}_failed"}` : Count the number of xxx (blob, blobmetadata, tree) cannot be fetched. - 6. `Counter getBlobMetadataFromBlob{"object_store.get_blob_metadata.blob"}` : -Count the number of BlobMetadata that cannot be obtained from BackingStore, but we obtained Blob and from Blob we found the BlobMetadata. - +Count the number of BlobMetadata that cannot be obtained from BackingStore, but +we obtained Blob and from Blob we found the BlobMetadata. 7. `Duration getRootTree{"store.get_root_tree_us"}` : The whole duration of getRootTree in ObjectStore. - -8. `Counter getRootTreeFromBackingStore{ "Object_store.get_root_tree.backing_store"}` : +8. `Counter getRootTreeFromBackingStore{ "Object_store.get_root_tree.backing_store"}` + : Count the number of RootTree that are obtained from BackingStore. - 9. `Counter getRootTreeFailed{"object_store.get_root_tree_failed"}` : Count the number of RootTree cannot be fetched. diff --git a/eden/fs/docs/stats/OverlayStats.md b/eden/fs/docs/stats/OverlayStats.md index eadb4ee68dda1..72029e28d519b 100644 --- a/eden/fs/docs/stats/OverlayStats.md +++ b/eden/fs/docs/stats/OverlayStats.md @@ -1,196 +1,166 @@ -OverlayStats -=============== +# OverlayStats 1. `Duration saveOverlayDir{"overlay.save_overlay_dir_us"}` : Duration of saving a directory in Overlay - 2. `Duration loadOverlayDir{"overlay.load_overlay_dir_us"}` : Duration of loading an overlay directory - 3. `Duration openFile{"overlay.open_overlay_file_us"}` : Duration of opening a file in overlay - 4. `Duration createOverlayFile{"overlay.create_overlay_file_us"}` : Duration of creating an overlay file - 5. `Duration removeOverlayFile{"overlay.remove_overlay_file_us"}` : Duration of removing an Overlay file - 6. `Duration removeOverlayDir{"overlay.remove_overlay_dir_us"}` : Duration of removing an Overlay directory - -7. `Duration recursivelyRemoveOverlayDir{ "Overlay.recursively_remove_overlay_dir_us"}` : +7. `Duration recursivelyRemoveOverlayDir{ "Overlay.recursively_remove_overlay_dir_us"}` + : Duration of recursively removing an Overlay directory - 8. `Duration hasOverlayDir{"overlay.has_overlay_dir_us"}` : Duration of checking an Overlay directory's existance - 9. `Duration hasOverlayFile{"overlay.has_overlay_file_us"}` : Duration of checking an Overlay file's existance - 10. `Duration addChild{"overlay.add_child_us"}` : Duration of adding a child directory to a parent directory in Overlay - 11. `Duration removeChild{"overlay.remove_child_us"}` : Duration of removing a child directory from its parent in Overlay - 12. `Duration removeChildren{"overlay.remove_children_us"}` : -Duration of removing the entries for some children from a directory in the overlay. - +Duration of removing the entries for some children from a directory in the +overlay. 13. `Duration renameChild{"overlay.rename_child_us"}` : Duration of renaming entry in Overlay - 14. `Counter loadOverlayDirSuccessful{"overlay.load_overlay_dir_successful"}` : Counts the number of times that overlay directory is loaded successfully - 15. `Counter loadOverlayDirFailure{"overlay.load_overlay_dir_failure"}` : Counts the number of times that overlay directory load failed - 16. `Counter saveOverlayDirSuccessful{"overlay.save_overlay_dir_successful"}` : Count the number of successfully save directory in Overlay - 17. `Counter saveOverlayDirFailure{"overlay.save_overlay_dir_failure"}` : Count the number of save Overlay directory is failed - -18. `Counter openOverlayFileSuccessful{"overlay.open_overlay_file_successful"}` : +18. `Counter openOverlayFileSuccessful{"overlay.open_overlay_file_successful"}` + : Count the number of successfully open file in Overlay - 19. `Counter openOverlayFileFailure{"overlay.open_overlay_file_failure"}` : Count the number of failure open file in Overlay - -20. `Counter createOverlayFileSuccessful{"overlay.create_overlay_file_successful"}` : +20. `Counter createOverlayFileSuccessful{"overlay.create_overlay_file_successful"}` + : Count the number of Overlay files that are successfully created in the Overlay. - 21. `Counter createOverlayFileFailure{"overlay.create_overlay_file_failure"}` : Count the number of failure Overlay files creations. - -22. `Counter removeOverlayFileSuccessful{"overlay.remove_overlay_file_successful"}` : +22. `Counter removeOverlayFileSuccessful{"overlay.remove_overlay_file_successful"}` + : Count the number of file that successfully removed from Overlay. - 23. `Counter removeOverlayFileFailure{"overlay.remove_overlay_file_failure"}` : Count the number of failed remove file from Overlay - -24. `Counter removeOverlayDirSuccessful{"overlay.remove_overlay_dir_successful"}` : +24. `Counter removeOverlayDirSuccessful{"overlay.remove_overlay_dir_successful"}` + : Count the number of directory that successfully removed from Overlay. - 25. `Counter removeOverlayDirFailure{"overlay.remove_overlay_dir_failure"}` : Count the number of failed remove directory from Overlay +26. `Counter recursivelyRemoveOverlayDirSuccessful{ "Overlay.recursively_remove_overlay_dir_successful"}` + : -26. `Counter recursivelyRemoveOverlayDirSuccessful{ "Overlay.recursively_remove_overlay_dir_successful"}` : +Count the number of directories that successfully removed recursively from +Overlay. -Count the number of directories that successfully removed recursively from Overlay. - - -27. `Counter recursivelyRemoveOverlayDirFailure{ "Overlay.recursively_remove_overlay_dir_failure"}` : +27. `Counter recursivelyRemoveOverlayDirFailure{ "Overlay.recursively_remove_overlay_dir_failure"}` + : Count the number of failed recursively remove directory from Overlay - 28. `Counter hasOverlayDirSuccessful{"overlay.has_overlay_dir_successful"}` : Count the number of has Overlay directory which are successfully run - 29. `Counter hasOverlayDirFailure{"overlay.has_overlay_dir_failure"}` : Count the number of has Overlay directory which failed to run - 30. `Counter hasOverlayFileSuccessful{"overlay.has_overlay_file_successful"}` : Count the number of has Overlay file which are successfully run - 31. `Counter hasOverlayFileFailure{"overlay.has_overlay_file_failure"}` : Count the number of has Overlay file which failed to run - 32. `Counter addChildSuccessful{"overlay.add_child_successful"}` : Count the number of successfully add child commands in Overlay. - 33. `Counter addChildFailure{"overlay.add_child_failure"}` : Count the number of failure add child commands in Overlay. - 34. `Counter removeChildSuccessful{"overlay.remove_child_successful"}` : Count the number of successfully remove child commands in Overlay. - 35. `Counter removeChildFailure{"overlay.remove_child_failure"}` : Count the number of failure remove child commands in Overlay. - 36. `Counter removeChildrenSuccessful{"overlay.remove_children_successful"}` : Count the number of successfully remove children commands in Overlay. - 37. `Counter removeChildrenFailure{"overlay.remove_children_failure"}` : Count the number of failure remove children commands in Overlay. - 38. `Counter renameChildSuccessful{"overlay.rename_child_successful"}` : Count the number of successfully rename child commands in Overlay. - 39. `Counter renameChildFailure{"overlay.rename_child_failure"}` : Count the number of failure rename child commands in Overlay. diff --git a/eden/fs/docs/stats/SaplingBackingStoreStats.md b/eden/fs/docs/stats/SaplingBackingStoreStats.md index 2108920e5e0ef..c7fc5d4707d07 100644 --- a/eden/fs/docs/stats/SaplingBackingStoreStats.md +++ b/eden/fs/docs/stats/SaplingBackingStoreStats.md @@ -1,151 +1,146 @@ -SaplingBackingStoreStats -=============== +# SaplingBackingStoreStats 1. `Duration get{xxx}{"store.sapling.get_{xxx}_us"}` : -Duration of the whole get xxx (blob, blobmetadata, tree) SaplingBackingStore::get{xxx} in Microsecond. This includes looking in local first then if not found prepare the request, enqueue the request and then mark it as finished when it is fulfilled. - +Duration of the whole get xxx (blob, blobmetadata, tree) +SaplingBackingStore::get{xxx} in Microsecond. This includes looking in local +first then if not found prepare the request, enqueue the request and then mark +it as finished when it is fulfilled. 2. `Duration fetch{xxx}{"store.sapling.fetch_{xxx}_us"}` : -Duration of fetching xxx (blob, blobmetadata, tree) requests from the network in Microsecond. - +Duration of fetching xxx (blob, blobmetadata, tree) requests from the network in +Microsecond. 3. `Duration getRootTree{"store.sapling.get_root_tree_us"}` : Duration of getting a Root Tree from the Backing Store in Microsecond. - -4. `Duration importManifestForRoot{"store.sapling.import_manifest_for_root_us"}` : +4. `Duration importManifestForRoot{"store.sapling.import_manifest_for_root_us"}` + : Duration of getting a manifest for Root from the Backing Store in Microsecond. - 5. `Counter fetch{xxx}Local{"store.sapling.fetch_{xxx}_local"}` : Number of xxx (blob, blobmetadata, tree) fetching locally from hgcache - 6. `Counter fetch{xxx}Remote{"store.sapling.fetch_{xxx}_remote"}` : -Number of xxx (blob, blobmetadata, tree) fetching remotely from the network (EdenAPI) - +Number of xxx (blob, blobmetadata, tree) fetching remotely from the network +(EdenAPI) 7. `Counter fetch{xxx}Success{"store.sapling.fetch_{xxx}_success"}` : -Number of xxx (blob, blobmetadata, tree) that fetch successfully in the first try. (It could be local or remote) - +Number of xxx (blob, blobmetadata, tree) that fetch successfully in the first +try. (It could be local or remote) 8. `Counter fetch{xxx}Failure{"store.sapling.fetch_{xxx}_failure"}` : Number of xxx (blob, blobmetadata, tree) that failed in the first fetch try. - 9. `Counter fetch{xxx}RetrySuccess{"store.sapling.fetch_{xxx}_retry_success"}` : -Number of xxx (blob, tree) that fetch successfully in the retry. (It could be local or remote) +Number of xxx (blob, tree) that fetch successfully in the retry. (It could be +local or remote) - -10. `Counter fetch{xxx}RetryFailure{"store.sapling.fetch_{xxx}_retry_failure"}` : +10. `Counter fetch{xxx}RetryFailure{"store.sapling.fetch_{xxx}_retry_failure"}` + : Number of xxx (blob, tree) that failed in the fetch retry. - 11. `Counter getRootTreeLocal{"store.sapling.get_root_tree_local"}` : Number of root trees fetching locally from Cache - 12. `Counter getRootTreeRemote{"store.sapling.get_root_tree_remote"}` : Number of root trees fetching remotely from Sapling BackingStore - 13. `Counter getRootTreeSuccess{"store.sapling.get_root_tree_success"}` : -Number of root trees that fetch successfully in the first try. (It could be local or remote) - +Number of root trees that fetch successfully in the first try. (It could be +local or remote) 14. `Counter getRootTreeFailure{"store.sapling.get_root_tree_failure"}` : Number of root trees that failed in the first fetch try. +15. `Counter getRootTreeRetrySuccess{"store.sapling.get_root_tree_retry_success"}` + : -15. `Counter getRootTreeRetrySuccess{"store.sapling.get_root_tree_retry_success"}` : - -Number of root trees that fetch successfully in the retry. (It could be local or remote) - +Number of root trees that fetch successfully in the retry. (It could be local or +remote) -16. `Counter getRootTreeRetryFailure{"store.sapling.get_root_tree_retry_failure"}` : +16. `Counter getRootTreeRetryFailure{"store.sapling.get_root_tree_retry_failure"}` + : Number of root trees that failed in the fetch retry. - -17. `Counter importManifestForRootLocal{ "store.sapling.import_manifest_for_root_local"}` : +17. `Counter importManifestForRootLocal{ "store.sapling.import_manifest_for_root_local"}` + : Number of manifest for root fetching locally from Cache - -18. `Counter importManifestForRootRemote{"Store.sapling.import_manifest_for_root_remote"}` : +18. `Counter importManifestForRootRemote{"Store.sapling.import_manifest_for_root_remote"}` + : Number of manifest for root fetching remotely from Sapling BackingStore +19. `Counter importManifestForRootSuccess{"Store.sapling.import_manifest_for_root_success"}` + : -19. `Counter importManifestForRootSuccess{"Store.sapling.import_manifest_for_root_success"}` : - -Number of manifest for root that fetch successfully in the first try. (It could be local or remote) - +Number of manifest for root that fetch successfully in the first try. (It could +be local or remote) -20. `Counter importManifestForRootFailure{"Store.sapling.import_manifest_for_root_failure"}` : +20. `Counter importManifestForRootFailure{"Store.sapling.import_manifest_for_root_failure"}` + : Number of manifest for root that failed in the first fetch try. +21. `Counter importManifestForRootRetrySuccess{"Store.sapling.import_manifest_for_root_retry_success"}` + : -21. `Counter importManifestForRootRetrySuccess{"Store.sapling.import_manifest_for_root_retry_success"}` : +Number of manifests for root that fetch successfully in the retry. (It could be +local or remote) -Number of manifests for root that fetch successfully in the retry. (It could be local or remote) - - -22. `Counter importManifestForRootRetryFailure{"Store.sapling.import_manifest_for_root_retry_failure"}` : +22. `Counter importManifestForRootRetryFailure{"Store.sapling.import_manifest_for_root_retry_failure"}` + : Number of manifests for root that failed in the fetch retry. - 23. `Duration prefetchBlob{"store.sapling.prefetch_blob_us"}` : Duration of prefetching Blobs requests from BackingStore. - 24. `Counter prefetchBlobLocal{"store.sapling.prefetch_blob_local"}` : Number of Blobs prefetching locally from Cache - 25. `Counter prefetchBlobRemote{"store.sapling.prefetch_blob_remote"}` : Number of Blobs prefetching remotely from Sapling BackingStore - 26. `Counter prefetchBlobSuccess{"store.sapling.prefetch_blob_success"}` : -Number of Blobs that prefetch successfully in the first try. (It could be local or remote) - +Number of Blobs that prefetch successfully in the first try. (It could be local +or remote) 27. `Counter prefetchBlobFailure{"store.sapling.prefetch_blob_failure"}` : Number of Blobs that failed in the first prefetch try. +28. `Counter prefetchBlobRetrySuccess{"store.sapling.prefetch_blob_retry_success"}` + : -28. `Counter prefetchBlobRetrySuccess{"store.sapling.prefetch_blob_retry_success"}` : +Number of Blobs that prefetch successfully in the retry. (It could be local or +remote) -Number of Blobs that prefetch successfully in the retry. (It could be local or remote) - - -29. `Counter prefetchBlobRetryFailure{"store.sapling.prefetch_blob_retry_failure"}` : +29. `Counter prefetchBlobRetryFailure{"store.sapling.prefetch_blob_retry_failure"}` + : Number of Blobs that failed in the prefetch retry. - 30. `Counter loadProxyHash{"store.sapling.load_proxy_hash"}` : Count the number of times that a proxy hash gets loaded. diff --git a/eden/fs/docs/stats/Stats.md b/eden/fs/docs/stats/Stats.md index 8b02f8ff74e09..6c4e7a89c4fa0 100644 --- a/eden/fs/docs/stats/Stats.md +++ b/eden/fs/docs/stats/Stats.md @@ -1,6 +1,6 @@ -EdenFS ODS Counters and Duration -=============== +# EdenFS ODS Counters and Duration We have two set of stats in Eden: + 1. [Stats which are listed in EdenStats.h - Most common](./EdenStats.md) 2. [Dynamic Counters that are registered with a callback. Usually in EdenServer.cpp](./DynamicStats.md) diff --git a/eden/fs/monitor/README.md b/eden/fs/monitor/README.md index e248251ee5398..d190f31b1d73d 100644 --- a/eden/fs/monitor/README.md +++ b/eden/fs/monitor/README.md @@ -1,20 +1,20 @@ -This directory contains a wrapper process that monitors the EdenFS daemon. -This wrapper process serves a few purposes: +This directory contains a wrapper process that monitors the EdenFS daemon. This +wrapper process serves a few purposes: # Simplifies management of EdenFS across graceful restarts This monitoring process provides a single parent process that can be monitored by systemd and other system management daemons, even across EdenFS graceful -restarts. When a graceful restart is desired this wrapper daemon can spawn the +restarts. When a graceful restart is desired this wrapper daemon can spawn the new EdenFS instance, so that the new EdenFS instance is still part of the original service process hierarchy. -Note that using a wrapper for this purpose is not strictly required with -systemd (it is possible to inform systemd that the main process ID has changed -and it should monitor a new process moving forward). However, this wrapper -provides us a bit more flexibility and control around the restart mechanism, -and also makes it easier to monitor EdenFS with other service management -frameworks on other platforms. +Note that using a wrapper for this purpose is not strictly required with systemd +(it is possible to inform systemd that the main process ID has changed and it +should monitor a new process moving forward). However, this wrapper provides us +a bit more flexibility and control around the restart mechanism, and also makes +it easier to monitor EdenFS with other service management frameworks on other +platforms. # Log file management and rotation @@ -28,13 +28,13 @@ spawned Python subprocesses. # Intelligent Restarting of EdenFS when it is Idle -This wrapper process supports requests to trigger a restart at some point in -the future when EdenFS appears to be idle. +This wrapper process supports requests to trigger a restart at some point in the +future when EdenFS appears to be idle. While graceful restart should minimize user-visible disruption, it can still -introduce a delay for I/O operations while the restart is in progress. -Therefore it is still desirable to try and perform the restart while users are -not actively accessing the file system, if possible. +introduce a delay for I/O operations while the restart is in progress. Therefore +it is still desirable to try and perform the restart while users are not +actively accessing the file system, if possible. This functionality is provided by the wrapper primarily because the wrapper provides a convenient location to centralize this management in case multiple