Skip to content

Commit

Permalink
Pass through the RTOS chapter, start adding examples for threads.
Browse files Browse the repository at this point in the history
  • Loading branch information
davidchisnall committed Dec 18, 2024
1 parent 50c11e4 commit f26fbda
Show file tree
Hide file tree
Showing 4 changed files with 134 additions and 17 deletions.
2 changes: 1 addition & 1 deletion rtos-source
2 changes: 1 addition & 1 deletion text/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ html: ${SOURCES} Makefile

cheriot-programmers-guide.pdf: ${SOURCES} Makefile
${IGK_PATH}/igk --plugin ${IGK_PATH}/libigk-clang.dylib --plugin ${IGK_PATH}/libigk-treesitter.dylib --lua-directory ${IGK_PATH}/../lua/ --lua-directory ../lua --file book.tex --pass include --pass fixme --pass metadata --pass clean-empty --pass begin-end --pass comment --pass blank-is-paragraph --pass autolabel --pass sile-admonitions --pass clang-listing --pass ts-listing --pass sile-lua --pass sile-listings --pass sile-paragraph --pass sile-keywords --pass sile-description-lists --pass sile-tables --pass sile-note --pass sile-xref --pass sile-boilerplate --pass clean-empty --pass XMLOutputPass --config sile_packages=cheriot.listings > cheriot-programmers-guide.xml
sile cheriot-programmers-guide.xml
./run_sile.sh cheriot-programmers-guide.xml

validate: html
for I in *.html ; do vnu $$I ; done
Expand Down
62 changes: 56 additions & 6 deletions text/core_rtos.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,35 @@ \section{Starting the system with the loader}

The \keyword{loader} runs on system startup.
It reads the compartment headers and populates each compartment with the set of capabilities that it needs.
The loader exists so that the system can be started from a firmware image that does not embed capabilities.
This is a useful property even if a particular target has persistent storage (non-volatile RAM) that \textem{can} hold capabilities because it ensures that there is an on-device \keyword{pointer provenance} flow for the firmware.

If a device has non-volatile storage that holds tags, you will typically run the loader once at install time or on first boot of a new firmware image.
This ensures that the image contains only capabilities handed to it.
This, in turn, enables multi-stage boot where some functionality, such as attestation, secure key storage, and so on, are provided by a bootloader.
These abstractions can all be built from capabilities and so, unlike systems based on protection rings such as TrustZone, an arbitrary number can be nested.

If a compartment contains a global that is a pointer, initialised to point to another global, the loader will initialise pointer by deriving a capability from one out of the compartment's code or data capabilities.
In addition, cross-compartment and cross-library calls and imported MMIO regions all require capabilities to be set up, as does the initial register file for each thread.
Again, this enforces provenance properties, this time within a firmware image.
A malicious compartment may provide a relocation that points to a global outside its own memory, but the loader will attempt to derive the capability only from the compartment's initial \reg{pcc} (code) and \reg{cgp} (globals) regions and so will fail.

The loader must also provide all capabilities to compartments that allow them to communicate outside of their own private space.
This includes access to \keyword{memory-mapped I/O} (MMIO) regions, capabilities for pre-shared objects, for software-defined capabilities, and any capabilities for calling entry points exposed by other compartments or libraries.
The loader also creates the stacks and trusted stacks for each thread and creates their initial entry points.

The loader is the most privileged component in the system.
When a CHERIoT CPU boots, it will have a small set of \keyword{root capabilities} in registers.
These, between them, convey the full set of rights that can be granted by a capability.
Every capability in the running system is derived (often via many steps) from one of these.
As such, the loader is able to do anything.

\begin{note}
In a system with a multi-stage boot, the initial capabilities provided to the loader may be restricted, rather than the omnipotent set from CPU boot.
For example, an early loader may implement A/B booting by providing the RTOS loader with capabilities to only half of persistent memory.
\end{note}

The risk from the loader is mitigated by the fact that it does not run on untrusted data.
The loader operates only on the instructions generated by the linker and so it is possible to audit precisely what it will do.
The loader operates only on the instructions generated by the linker and so it is possible to audit precisely what it will do (see \ref{cheriot-audit}).
It is also possible to validate this by running the loader in a simulator and capturing the precise memory state after it has run.

The loader enforces some of the guarantees in the initial state.
Expand All @@ -36,6 +54,10 @@ \section{Changing trust domain with the switcher}
It is responsible for transitions between threads (context switches) and between compartments (cross-compartment calls and returns).
The switcher is a very small amount of code—under 500 instructions—that is expected to be amenable to formal verification.

\begin{note}
Work is underway to formally verify the security properties of the switcher, but is still in early stages.
\end{note}

The switcher is the only component in a running CHERIoT system that has access-system-registers permission.
It uses this primarily to access a single reserved register that holds the \keyword{trusted stack}.
The trusted stack is a region of memory containing the register save area for context switches and a small frame for every cross-compartment call that allows safe return even if the callee has corrupted all state that it has access to.
Expand Down Expand Up @@ -63,20 +85,35 @@ \section{Changing trust domain with the switcher}
Most errors simply forcibly unwind to the previous trusted stack frame, so a compartment that attempts to attack the switcher exits to its caller.}
\item{Like everything else in the system, it must follow the capability rules.
Unlike an operating system running in a privileged mode on mainstream hardware, it does not get to opt out of memory protection, it is not able to access beyond the bounds of capabilities passed to it or access any memory to which it does not have an explicit capability.}
\item{It is largely stateless, all state that it modifies is held in the trusted stack.}
\item{It is largely stateless, all state that it modifies is held in the trusted stack for the current thread.}
\end{itemize}

The switcher appears to the rest of the system as a library.
It can expose functions for inspecting or, in a small number of cases, modifying state.
These are defined in \file{switcher.h}.
For example, prior to performing a cross-compartment call, you may want to check that there is sufficient space on the trusted stack for the number of calls that it will need to make.
The \c{trusted_stack_has_space} function exposed by the switcher lets you query if the trusted stack has enough space for a specified number of cross-compartment calls.
The amount of (normal) stack space is directly visible in a compartment and so normal stack checks do not require the switcher to be involved.

\functiondoc{trusted_stack_has_space}

The switcher also implements the \c{thread_id_get} function, which provides a fast way for compartments to determine which thread they are currently running on.
This function is used in the implementation of priority-inheriting locks (see \ref{priority_inheritance}).
Implementing efficient priority-inheriting locks requires a fast mechanism for getting the current thread ID so that it can be stored in the lock.

\functiondoc{thread_id_get}

\section{Time slicing with the scheduler}

When the switcher receives an interrupt (including an explicit yield), it delegates the decision about what to run next to the _scheduler_.
When the switcher receives an interrupt (including an explicit yield), it delegates the decision about what to run next to the \keyword{scheduler}.
The scheduler has direct access to the interrupt controller but, in most respects, is just another compartment.

The switcher also holds a capability to a small stack for use by the scheduler.
This is not quite a full thread.
It cannot make cross-compartment calls and is not independently schedulable.
When the switcher takes an interrupt, it invokes the switcher's entry point on this stack.
When the switcher handles an interrupt, it invokes the scheduler's entry point on this stack.

The switcher also exposes other entry points that can be invoked by cross-compartment calls.
The scheduler also exposes other entry points that can be invoked by cross-compartment calls.
These fulfil a role similar to system calls on other operating systems, for example waiting for external events or performing inter-thread communication.
The scheduler implements blocking operations by moving the current thread from a run queue to a sleep queue and then issuing a software interrupt instruction to branch to the switcher.
When the switcher then invokes the scheduler to make a scheduling decision, it will discover that the current thread is no longer runnable and pick another.
Expand All @@ -90,6 +127,17 @@ \section{Time slicing with the scheduler}
The scheduler has no mechanism to inspect the state of an interrupted thread.
When invoked explicitly, it is called with a normal cross-compartment call and so has no access to anything other than the arguments.

As with the switcher, the scheduler mitigates these risks by being small (though larger than the switcher).
It currently compiles to under 4 KiB of object code.
This small size is accomplished by providing only a small set of features that can be used as building blocks for other tasks.

For example, some embedded operating systems provide features such as message queues in their kernel.
In CHERIoT RTOS, these are provided by a separate library, which relies on the \keyword{futex} (see \ref{futex}) facility exposed by the scheduler to allow a producer to block when the queue is full and allow consumers to block when the queue is empty.

Futexes are the \textem{only} mechanism that the scheduler provides for blocking.
Interrupts are mapped to futexes and so threads wait for hardware or software events in exactly the same way.
This narrow interface and clear separation of concerns helps improve overall system security.

\section{Sharing memory from the allocator}

The final core component is the memory allocator, which provides the heap, which is used for all dynamic memory allocations.
Expand All @@ -106,6 +154,8 @@ \section{Sharing memory from the allocator}
As such, it is in the TCB only with respect to heap allocations.
It cannot access globals (or code), held in other compartments and so a compartment that does not use the heap does not need to trust the allocator.

The allocator also provides a rich set of mechanisms (described in \ref{shared_heap}) for two mutually distrusting compartments to ensure that memory is not deallocated at inconvenient times.

\section{Building firmware images}

CHERIoT RTOS uses the xmake build system.
Expand Down
85 changes: 76 additions & 9 deletions text/threads.tex
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,26 @@ \section{Identifying the current thread}

\functiondoc{thread_count}

The \file{current_thread} example shows calling these functions.
The entry point function for this is shown in \ref{lst:currentthread} and the thread definitions from the \file{xmake.lua} file in \ref{lst:currentthreadxmake}.

\codelisting[filename=examples/current_thread/current.cc,marker=entry,label=lst:currentthread,caption="A simple example that prints the current thread"]{}

\lualisting[filename=examples/current_thread/xmake.lua,marker=threads,label=lst:currentthreadxmake,caption="The thread definitions for the current-thread example"]{}

Note that thread two has a higher priority than thread one.
When you run this example, you should see output like this:

\begin{console}
Current thread: 2 of 2
Current thread: 2 of 2
Current thread: 1 of 2
Current thread: 1 of 2
\end{console}

The higher-priority thread is running until it exist.
Normally, a higher-priority thread would \keyword{yield} to allow another thread to run, as we'll see later in this chapter.

\section{Using the \c{Timeout} structure}

Several RTOS APIs have timeouts.
Expand All @@ -75,11 +95,18 @@ \section{Using the \c{Timeout} structure}
Timeouts measure time in \keyword{scheduler ticks}.
A tick is a single scheduling quantum, which depends on the board configuration.
This is the minimum amount of time for which it is plausible for a thread to sleep.
If a thread sleeps then another thread becomes runnable and is then allowed to run (unless it also yields) for one tick.
If a thread sleeps then another thread becomes runnable and is then allowed to run (unless it also yields).

Although ticks exist as a unit of accounting, the CHERIoT RTOS scheduler is a \keyword{tickless scheduler}.
Traditional schedulers schedule a timer interrupt at a fixed quantum and make a scheduling choice at each call.
This can be inefficient because a high-priority thread will be routinely interrupted and then rescheduled (because it remains the highest-priority thread).
A tickless scheduler avoids this and instead, before scheduling a thread, sets a timer interrupt to fire at the next point when another thread may be woken.

At the end of each tick, the scheduler receives the timer interrupt and chooses the next thread to run.
Threads may only run at all if no higher-priority thread is runnable.
Threads at the same priority level are round-robin scheduled.
For example, consider the case where a high-priority thread sleeps for three ticks and a lower-priority thread runs.
With a traditional scheduler, a timer interrupt will fire three times.
Each time, the scheduler will do some accounting and then reschedule the lower-priority thread.
In contrast, a tickless scheduler will configure the timer to fire once, after three ticks have elapsed.
At that point, the high-priority thread is runnable and so will be scheduled.

The timeout structure captures the amount of time that is allowed to block and the number of ticks for which it has blocked.
Each subsequent call that is passed the same timeout structure may increase the amount of slept time and decrease the remaining time.
Expand Down Expand Up @@ -115,12 +142,8 @@ \section{Sleeping}
\end{itemize}

The \c{thread_sleep} call supports both of these but understanding how they differ requires understanding a little of the scheduler's behaviour.
Recall that CHERIoT RTOS has a \keyword{tickless scheduler}.

Traditional OS schedulers from the earlies preemptive multitasking systems used a fixed scheduling quantum.
The scheduler would configure a periodic timer interrupt and would make a new scheduling decision at each interrupt or at explicit yields.
This fixed quantum is the origin of the tick abstraction in CHERIoT RTOS.

The CHERIoT RTOS scheduler is *tickless*.
This means that, although it uses ticks as an abstraction for defining scheduling quanta, it does not schedule a regular timer interrupt.
When two threads at the same priority level are runnable, the scheduler will request a timer interrupt to preempt the current one and switch to the other.
If the running thread has no peers, the scheduler will allow it to run until either it yields or another higher or equal-priority thread's timeout expires.
Expand All @@ -137,6 +160,50 @@ \section{Sleeping}
Even if no other threads are runnable, you have no useful work to do for a bit.
You can pass \c{ThreadSleepNoEarlyWake} as the \c{flags} argument to \c{thread_sleep} to indicate that you really want to sleep.

You can see the effect of sleeping in the \file{thread_sleep} example, as shown in \ref{lst:thread_sleep}.
This is a modified version of the \file{current_thread} example from earlier, now sleeping in each loop iteration.

\codelisting[filename=examples/thread_sleep/current.cc,marker=entry,label=lst:thread_sleep,caption="A simple example of thread sleeping"]{}

If you run this, you should see output that looks somewhat like this:

\begin{console}
Current thread: 2 of 2
Current thread: 1 of 2
Current thread: 2 of 2
Current thread: 1 of 2
Cycles elapsed: 262193
Cycles elapsed: 265806
\end{console}

As before, thread two runs first, but then it yields and allows thread one to run.
Thread one then yields and allows thread two to run, and so on.
If thread one did \textem{not} yield then it would be preempted after one tick.

\begin{note}
If you run this with the Sail simulator, do not be surprised if the cycle counts look very small.
Sail is not a cycle-accurate model and so the cycle count is guaranteed to be monotonic, but not to represent a real system in any way.
The snippets in this section are using the Ibex simulator.
\end{note}

Try modifying this example, adding \c{ThreadSleepNoEarlyWake} as a second argument to the \c{thread_sleep} call.
You should now see output that looks very similar, but shows lower cycle counts at the end:

\begin{console}
Current thread: 2 of 2
Current thread: 1 of 2
Current thread: 2 of 2
Current thread: 1 of 2
Cycles elapsed: 249233
Cycles elapsed: 252273
\end{console}

Here you see that the total execution time has gone from 265,806 cycles to 252,273.
In the original version, when thread one slept (after doing far less than one tick's worth of work), there were no runnable threads and so the scheduler does nothing for a while.
Eventually, thread two (the high-priority thread) is runnable again and it resumes.
In the version with \c{ThreadSleepNoEarlyWake}, thread two can resume as soon as thread one sleeps.
Similarly, when thread two yields for the second time, thread one will resume.

\section[label=futex]{Building locks with futexes}

The scheduler exposes a set of futex APIs as a building block for various notification and locking mechanisms.
Expand Down

0 comments on commit f26fbda

Please sign in to comment.