Skip to content

Commit

Permalink
format: apply lints to markdown files
Browse files Browse the repository at this point in the history
Summary:
# Context

I was really annoyed w/ the variable formatting of Markdown files. I decided to apply formatting to all the markdown files to make things consistent.

# This diff

Formats all the markdown files to be consistent. The next diff will enable an option in linttool to enforce formatting in all Markdown files under `eden/fs/**/*`

Reviewed By: zertosh

Differential Revision: D59930918

fbshipit-source-id: 20964f531fbe6be919e8cc391caf148d5c107ae1
  • Loading branch information
MichaelCuevas authored and facebook-github-bot committed Jul 18, 2024
1 parent a92eb43 commit 503737d
Show file tree
Hide file tree
Showing 30 changed files with 1,636 additions and 1,586 deletions.
6 changes: 3 additions & 3 deletions eden/fs/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# "Macro" Benchmarks

This directory contains benchmarks of EdenFS through its filesystem
and Thrift APIs. Several of these benchmarks allow comparison of
EdenFS's performance to native filesystems.
This directory contains benchmarks of EdenFS through its filesystem and Thrift
APIs. Several of these benchmarks allow comparison of EdenFS's performance to
native filesystems.
6 changes: 3 additions & 3 deletions eden/fs/benchmarks/language/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# C++ Language Benchmarks

Sometimes it's useful to microbenchmark the compiler and standard
library itself. These microbenchmarks allow us to compare fundamental
costs across operating systems, compilers, and standard libraries.
Sometimes it's useful to microbenchmark the compiler and standard library
itself. These microbenchmarks allow us to compare fundamental costs across
operating systems, compilers, and standard libraries.
21 changes: 10 additions & 11 deletions eden/fs/docs/Caching.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
Caching in Eden
===============
# Caching in Eden

[This captures the state of Eden as of November, 2018. The information below may
change.]
Expand Down Expand Up @@ -32,10 +31,10 @@ quick succession, and reloading the blob each time would be inefficient.

The design of this cache attempts to satisfy competing objectives:

* Minimize blob reloads under Eden's various access patterns
* Fit in a mostly-capped memory budget
* Avoid performance cliffs under pathological access patterns
* Maximize memory available to the kernel's own caches, since they have the
- Minimize blob reloads under Eden's various access patterns
- Fit in a mostly-capped memory budget
- Avoid performance cliffs under pathological access patterns
- Maximize memory available to the kernel's own caches, since they have the
highest leverage.

The cache has a maximum size (default 40 MiB as of this writing), and blobs are
Expand All @@ -50,7 +49,7 @@ experimentation.
One interesting aspect of the blob cache is that Eden has a sense of whether a
request is likely to occur again. For example, if the kernel does not support
caching readlink calls over FUSE, then any symlink blob should be kept in Eden's
cache until evicted. If the kernel *does* cache readlink, then the blob can be
cache until evicted. If the kernel _does_ cache readlink, then the blob can be
released as soon it's been read, making room for other blobs.

A more complicated example is that of a series of reads across a large file.
Expand All @@ -61,11 +60,11 @@ blob, Eden evicts the blob from its cache.

Blobs are evicted from cache when:

* The blob cache is full and exceeds its minimum entry count.
* The blob has been read by the kernel and the kernel cache is populated.
* A file inode is materialized and future requests will be satisfied by the
- The blob cache is full and exceeds its minimum entry count.
- The blob has been read by the kernel and the kernel cache is populated.
- A file inode is materialized and future requests will be satisfied by the
overlay.
* The kernel has evicted an inode from its own inode cache after reading some of
- The kernel has evicted an inode from its own inode cache after reading some of
the blob.

## Blob Metadata
Expand Down
70 changes: 33 additions & 37 deletions eden/fs/docs/Data_Model.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,30 @@
Data Model
==========
# Data Model

EdenFS is designed to serve file and directory state from an underlying source
control system. In order to do this, it has two parallel representations of the
control system. In order to do this, it has two parallel representations of the
state: one that tracks the original immutable source control state, and one that
tracks the current mutable file and directory structure being shown in the
checkout.

Source Control Model
====================
# Source Control Model

EdenFS's model of source control state mimics the model used by
[Git](https://git-scm.com/) and EdenSCM. The source control repository is
viewed as an object storage system with 3 main object types: commits, trees
(aka directories), and blobs (aka files).
[Git](https://git-scm.com/) and EdenSCM. The source control repository is viewed
as an object storage system with 3 main object types: commits, trees (aka
directories), and blobs (aka files).

The Git documentation has an
[in-depth overview of the object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects).

EdenFS expects to be able to look up objects by ID, where an object ID is an
opaque 20-byte key. In practice, both Git and EdenSCM are content-addressed
opaque 20-byte key. In practice, both Git and EdenSCM are content-addressed
object stores, where the object IDs are computed from the object contents.
However, EdenFS does not strictly care about this property, and simply requires
being able to look up an object from its ID.

These 3 types of objects are chained together in a
[DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) to allow
representing the full commit history in a repository. Each commit contains the
representing the full commit history in a repository. Each commit contains the
ID(s) of its parent commit(s), the ID of the tree that represents its root
directory, plus additional information like the commit message and author
information.
Expand All @@ -35,52 +33,50 @@ information.

Commit objects are referenced by variable-width identifiers whose meaning is
defined by the concrete BackingStore implementation. For example, in Mercurial
and Git, they're 20-byte binary (40-byte hex) strings. Each mount remembers
its parent root ID across EdenFS restarts.
and Git, they're 20-byte binary (40-byte hex) strings. Each mount remembers its
parent root ID across EdenFS restarts.

Tree objects represent a directory and contain a list of the directory
contents. Each entry in the directory has the name of the child entry as well
as the object ID, which refers either to another tree object for a subdirectory
or to a blob object for a regular file. Each entry also contains some
additional information, such as flags tracking whether the entry is a file or
directory, whether it is executable, etc.
Tree objects represent a directory and contain a list of the directory contents.
Each entry in the directory has the name of the child entry as well as the
object ID, which refers either to another tree object for a subdirectory or to a
blob object for a regular file. Each entry also contains some additional
information, such as flags tracking whether the entry is a file or directory,
whether it is executable, etc.

Additionally, tree entry objects can also contain information about the file
size and hashes of the file contents. This allows EdenFS to efficiently
respond to file attribute requests without having to fetch the entire blob data
from source control. Note that these fields are not present in Git's object
model, but are available when the underlying data is fetched from an EdenSCM
Mononoke server.
size and hashes of the file contents. This allows EdenFS to efficiently respond
to file attribute requests without having to fetch the entire blob data from
source control. Note that these fields are not present in Git's object model,
but are available when the underlying data is fetched from an EdenSCM Mononoke
server.

![Example Tree Object](img/tree_object.svg)

The blob type is the final object type and is the simplest. The blob object
type simply contains the raw file contents. Note that blob objects are used to
represent both regular files as well as symbolic links. For symbolic links, the
The blob type is the final object type and is the simplest. The blob object type
simply contains the raw file contents. Note that blob objects are used to
represent both regular files as well as symbolic links. For symbolic links, the
blob contents are the symlink contents.

![Example Blob Object](img/blob_object.svg)

EdenFS's classes representing these source control objects can be found in the
[`eden/fs/model`](../model) directory. The `Tree` class represents a source
[`eden/fs/model`](../model) directory. The `Tree` class represents a source
control tree, and the `Blob` class represents a source control blob.

Note that EdenFS is primarily concerned about showing the current working
directory state, and this mainly only requires using Tree and Blob objects. In
directory state, and this mainly only requires using Tree and Blob objects. In
general, EdenFS does not need to process source control history related
operations, and therefore does not deal much with commit objects.

# Parallels with the Inode State

Parallels with the Inode State
==============================

The classes in `eden/fs/model` represent source control objects. These objects
The classes in `eden/fs/model` represent source control objects. These objects
are immutable, as once a commit is checked in to source control it cannot be
modified, only updated by a newer commit.

In order to represent the current file and directory state of a checkout, EdenFS
has a separate set of inode data structures. These generally parallel the
source control model data structures: a `TreeInode` represents a directory, and
its contents may be backed by a `Tree` object loaded from source control. A
`FileInode` represents a file, and its contents may be backed by a `Blob`
object loaded from source control.
has a separate set of inode data structures. These generally parallel the source
control model data structures: a `TreeInode` represents a directory, and its
contents may be backed by a `Tree` object loaded from source control. A
`FileInode` represents a file, and its contents may be backed by a `Blob` object
loaded from source control.
84 changes: 59 additions & 25 deletions eden/fs/docs/Futures.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,73 @@
# Futures and Asynchronous Code

This document assumes some working knowledge of folly::Future and folly::SemiFuture. Please read the [Future overview](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md) first.
This document assumes some working knowledge of folly::Future and
folly::SemiFuture. Please read the
[Future overview](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md)
first.

## Why Future?

EdenFS is largely concurrent and asynchronous. The traditional way to write this kind of code would be explicit state machines with requests and callbacks. It's easy to forget to call a callback or call one twice under rarely-executed paths like error handling.
EdenFS is largely concurrent and asynchronous. The traditional way to write this
kind of code would be explicit state machines with requests and callbacks. It's
easy to forget to call a callback or call one twice under rarely-executed paths
like error handling.

To make asynchronous code easier to reason about, Folly provides `folly::Future` and `folly::Promise`. Each Future and Promise form a pair, where `folly::Future` holds the eventual value and Promise is how the value is published. Readers can either block on the result (offering their thread to any callbacks that may run) or schedule a callback to be run when the value is available. `folly::Promise` is fulfilled on the writing side.
To make asynchronous code easier to reason about, Folly provides `folly::Future`
and `folly::Promise`. Each Future and Promise form a pair, where `folly::Future`
holds the eventual value and Promise is how the value is published. Readers can
either block on the result (offering their thread to any callbacks that may run)
or schedule a callback to be run when the value is available. `folly::Promise`
is fulfilled on the writing side.

## Why SemiFuture?

The biggest problem with Future is that callbacks may run either on the thread calling `Future::then` or on the thread calling `Promise::set`. Callbacks have to be written carefully, and if they acquire locks, any site that calls `Future::then` or `Promise::set` must not hold those locks.
The biggest problem with Future is that callbacks may run either on the thread
calling `Future::then` or on the thread calling `Promise::set`. Callbacks have
to be written carefully, and if they acquire locks, any site that calls
`Future::then` or `Promise::set` must not hold those locks.

`folly::SemiFuture` is a reaction to these problems. It's a Future without a `SemiFuture::then` method. Assuming no use of unsafe APIs (including any `InlineExecutor`), callbacks will never run on the thread that calls `Promise::set`. Any system with an internal thread pool that cannot tolerate arbitrary callbacks running on its threads should use `SemiFuture`.
`folly::SemiFuture` is a reaction to these problems. It's a Future without a
`SemiFuture::then` method. Assuming no use of unsafe APIs (including any
`InlineExecutor`), callbacks will never run on the thread that calls
`Promise::set`. Any system with an internal thread pool that cannot tolerate
arbitrary callbacks running on its threads should use `SemiFuture`.

## Why ImmediateFuture?

`folly::Future` and `folly::SemiFuture` introduce significant overhead. A `Future`/`Promise` pair hold a heap-allocated, atomic refcounted `FutureCore`. In EdenFS, it's common to make an asynchronous call that hits cache and can answer immediately. Heap allocating the result is comparatively expensive. We introduced `facebook::eden::ImmediateFuture` for those cases. ImmediateFuture either stores the result value inline or holds a SemiFuture.
`folly::Future` and `folly::SemiFuture` introduce significant overhead. A
`Future`/`Promise` pair hold a heap-allocated, atomic refcounted `FutureCore`.
In EdenFS, it's common to make an asynchronous call that hits cache and can
answer immediately. Heap allocating the result is comparatively expensive. We
introduced `facebook::eden::ImmediateFuture` for those cases. ImmediateFuture
either stores the result value inline or holds a SemiFuture.

## When should I use which Future?

There are reasons to use each Future.

  | `Future` | `SemiFuture` | `ImmediateFuture`
--- | --- | --- | ---
Storage is heap-allocated | yes | yes | no
Callbacks run as early as the result is available | yes | no | no
Callbacks may run on the fulfiller's thread | yes | no | no
Callbacks may run immediately or asynchronously | yes | no | yes
sizeof, cost of move() | void* | void* | Depends on sizeof(T) with minimum of 40 bytes as of Oct 2021
|   | `Future` | `SemiFuture` | `ImmediateFuture` |
| ------------------------------------------------- | -------- | ------------ | ------------------------------------------------------------ |
| Storage is heap-allocated | yes | yes | no |
| Callbacks run as early as the result is available | yes | no | no |
| Callbacks may run on the fulfiller's thread | yes | no | no |
| Callbacks may run immediately or asynchronously | yes | no | yes |
| sizeof, cost of move() | void\* | void\* | Depends on sizeof(T) with minimum of 40 bytes as of Oct 2021 |

`folly::Future` should be used when it's important the callback runs as early as possible. For example, measuring the duration of internal operations.
`folly::Future` should be used when it's important the callback runs as early as
possible. For example, measuring the duration of internal operations.

SemiFuture or ImmediateFuture should be used when it's important that chained callbacks never run on internal thread pools.
SemiFuture or ImmediateFuture should be used when it's important that chained
callbacks never run on internal thread pools.

ImmediateFuture should be used when the value is small and avoiding an allocation is important for performance. Large structs can use unique_ptr or shared_ptr.
ImmediateFuture should be used when the value is small and avoiding an
allocation is important for performance. Large structs can use unique_ptr or
shared_ptr.

It's important to note that, when a callback and its closures hold reference counts or are larger than the result value, it can be worth using Future, because the callbacks are collapsed into a value as early as possible. SemiFuture, even if the SemiFuture is held by an ImmediateFuture, will not collapse any chained callbacks until the SemiFuture is attached to an executor.
It's important to note that, when a callback and its closures hold reference
counts or are larger than the result value, it can be worth using Future,
because the callbacks are collapsed into a value as early as possible.
SemiFuture, even if the SemiFuture is held by an ImmediateFuture, will not
collapse any chained callbacks until the SemiFuture is attached to an executor.

## Safetyness and caveats

Expand Down Expand Up @@ -73,14 +104,16 @@ As a general rule of thumb, any use of `folly::InlineLikeExecutor` is widely
unsafe and should never be used. This is primarily due to forcing `Promise::set`
to execute the `folly::Future` callbacks in the context of the fulfiller' thread

For instance, if we re-use the previous example, but where the `threadPool` is an
`InlineLikeExecutor` the `setValue` will also execute both continuation before
returning.
For instance, if we re-use the previous example, but where the `threadPool` is
an `InlineLikeExecutor` the `setValue` will also execute both continuation
before returning.

This has been known to cause deadlocks in the past. This includes:
- `folly::SemiFuture::toUnsafeFuture` and any `Unsafe` methods as these are merely wrappers on `.via(&InlineExecutor::instance())`,
- `folly::Promise::getFuture` for the same reason,
- `folly::SemiFuture::via(&QueuedImmediateExecutor::instance())`

- `folly::SemiFuture::toUnsafeFuture` and any `Unsafe` methods as these are
merely wrappers on `.via(&InlineExecutor::instance())`,
- `folly::Promise::getFuture` for the same reason,
- `folly::SemiFuture::via(&QueuedImmediateExecutor::instance())`

`folly::InlineLikeExecutor` also have the downside to be incompatible with
`folly::coro::Task` which is Folly's coroutine implementation.
Expand All @@ -98,5 +131,6 @@ execute eagerly unless attached to an executor (and thus becoming

## TODO

* Unsafely mapping ImmediateFuture onto Future with .via(QueuedImmediateExecutor)?
* What about coroutines?
- Unsafely mapping ImmediateFuture onto Future with
.via(QueuedImmediateExecutor)?
- What about coroutines?
16 changes: 8 additions & 8 deletions eden/fs/docs/Globbing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

EdenFS supports glob patterns through the following interfaces:

* Ignore files (e.g. `.gitignore`)
* `globFiles` Thrift API
- Ignore files (e.g. `.gitignore`)
- `globFiles` Thrift API

## Ignore Files

EdenFS uses *ignore files* to exclude files in the `getScmStatus` Thrift API
EdenFS uses _ignore files_ to exclude files in the `getScmStatus` Thrift API
(used by `hg status`, for example). The syntax for EdenFS' ignore files is
compatible with the syntax for [`gitignore` files][gitignore] used by the Git
version control system, even when an EdenFS checkout is backed by a Mercurial
Expand All @@ -17,12 +17,12 @@ repository.

EdenFS interprets the following tokens specially within glob patterns:

* `**`: Match zero, one, or more path components.
* `*`: Match zero, one, or more valid path component characters.
* `?`: Match exactly one valid path component characters.
* `[`: Match exactly one path component character in the given set of
- `**`: Match zero, one, or more path components.
- `*`: Match zero, one, or more valid path component characters.
- `?`: Match exactly one valid path component characters.
- `[`: Match exactly one path component character in the given set of
characters. The set is terminated by `]`.
* `[!`, `[^`: Match exactly one path component character *not* in the given set
- `[!`, `[^`: Match exactly one path component character _not_ in the given set
of characters. The set is terminated by `]`.

EdenFS glob patterns are compatible with [`gitignore` patterns][gitignore] used
Expand Down
Loading

0 comments on commit 503737d

Please sign in to comment.