index.html

<!DOCTYPE html>
<head>
  <meta charset="utf-8">
  <title>Store Halfword Byte-Reverse Indexed</title>
  <meta name="author" content="OzLabs">

  <link href="https://sthbrx.github.io/rss.xml" type="application/rss+xml" rel="alternate"
        title="Store Halfword Byte-Reverse Indexed RSS Feed" />


  <!-- http://t.co/dKP3o1e -->
  <meta name="HandheldFriendly" content="True">
  <meta name="MobileOptimized" content="320">
  <meta name="viewport" content="width=device-width, initial-scale=1">


    <link href="https://sthbrx.github.io/favicon.png" rel="icon">

  <link href="https://sthbrx.github.io/theme/css/main.css" media="screen, projection"
        rel="stylesheet" type="text/css">

  <link href="//fonts.googleapis.com/css?family=PT+Serif:regular,italic,bold,bolditalic"
        rel="stylesheet" type="text/css">
  <link href="//fonts.googleapis.com/css?family=PT+Sans:regular,italic,bold,bolditalic"
        rel="stylesheet" type="text/css">

  <script type="text/javascript">
document.addEventListener('DOMContentLoaded', function() {
  var ts = document.createElement('span')
  ts.className = 'toggle-sidebar'
  ts = document.getElementById('content').appendChild(ts);
  ts.addEventListener('click', function(e) {
    e.preventDefault();
    body = document.getElementsByTagName('body')[0];
    bodyClasses = body.classList.toggle('collapse-sidebar');
  });
  var sections = document.querySelectorAll('aside.sidebar > section');
  if (sections.length > 1) {
    for (index = 0; index < sections.length; index++) {
      section = sections[index];
      if ((sections.length >= 3) && index % 3 === 0) {
        section.classList.add("first");
      }
      var count = ((index +1) % 2) ? "odd" : "even";
      section.classList.add(count);
    }
  }
  if (sections.length >= 3) {
    document.querySelector('aside.sidebar').classList.add('thirds');
  }
});
  </script>
    <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-91189608-1', 'auto');

    ga('send', 'pageview');
    </script>
</head>

<body>
  <header role="banner"><hgroup>
  <h1><a href="https://sthbrx.github.io/">Store Halfword Byte-Reverse Indexed</a></h1>
    <h2>A Power Technical Blog</h2>
</hgroup></header>
  <nav role="navigation"><ul class="subscription" data-subscription="rss">
  <li><a href="https://sthbrx.github.io/rss.xml" rel="subscribe-rss">RSS</a></li>
</ul>


<ul class="main-navigation">
      <li >
        <a href="https://sthbrx.github.io/category/cryptography.html">Cryptography</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/development.html">Development</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/education.html">Education</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/openpower.html">OpenPOWER</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/performance.html">Performance</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/petitboot.html">Petitboot</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/snowpatch.html">snowpatch</a>
      </li>
      <li >
        <a href="https://sthbrx.github.io/category/virtualisation-and-emulation.html">Virtualisation and Emulation</a>
      </li>
</ul></nav>
  <div id="main">
    <div id="content">
<div class="blog-index">
  		<article>
<header>
      <h1 class="entry-title">
        <a href="https://sthbrx.github.io/blog/2023/08/07/going-out-on-a-limb-efficient-elliptic-curve-arithmetic-in-openssl/">Going out on a Limb: Efficient Elliptic Curve Arithmetic in OpenSSL</a>
      </h1>
    <p class="meta">
<time datetime="2023-08-07T12:00:00+10:00" pubdate>Mon 07 August 2023</time>    </p>
</header>

<div class="byline_index">
  <span class="byline author vcard">
    Posted by <span class="fn">
          <a href="https://sthbrx.github.io/author/rohan-mclure.html">Rohan McLure</a>
    </span>
  </span>
<time datetime="2023-08-07T12:00:00+10:00" pubdate>Mon 07 August 2023</time></div>

  <div class="entry-content"><p>So I've just managed to upstream some changes to OpenSSL for a <a href="https://github.com/openssl/openssl/blob/master/crypto/ec/ecp_nistp384.c">new strategy</a> I've developed for efficient arithmetic used in secp384r1, a curve prescribed by NIST for digital signatures and key exchange. In spite of its prevalence, its implementation in OpenSSL has remained somewhat unoptimised, even as less frequently used curves (P224, P256, P521) each have their own optimisations.</p>
<p>The strategy I have used could be called a 56-bit redundant limb implementation with <em>Solinas reduction</em>. Without too much micro-optimisation, we get ~5.5x speedup over the default (Montgomery Multiplication) implementation for creation of digital signatures.</p>
<p>How is this possible? Well first let's quickly explain some language:</p>
<h2>Elliptic Curves</h2>
<p>When it comes to cryptography, it's highly likely that those with a computer science background will be familiar with ideas such as key-exchange and private-key signing. The stand-in asymmetric cipher in a typical computer science curriculum is typically RSA. However, the heyday of Elliptic Curve ciphers has well and truly arrived, and their operation seems no less mystical than when they were just a toy for academia.</p>
<p>The word 'Elliptic' may seem to imply continuous mathematics. As a useful cryptographic problem, we fundamentally are just interested with the algebraic properties of these curves, whose points are elements of a <a href="https://en.wikipedia.org/wiki/Finite_field">finite field</a>. Irrespective of the underlying finite field, the algebraic properties of the elliptic curve group can be shown to exist by an application of <a href="https://en.wikipedia.org/wiki/Bézout%27s_theorem#:~:text=Bézout%27s%20theorem%20is%20a%20statement,the%20degrees%20of%20the%20polynomials.">Bézout's Theorem</a>. The <a href="https://en.wikipedia.org/wiki/Algebraic_group">group operator</a> on points on an elliptic curve for a particular choice of field involves the intersection of lines intersecting either once, twice or thrice with the curve, granting notions of addition and doubling for the points of intersection, and giving the 'point at infinity' as the group identity. A closed form exists for computing a point double/addition in arbitrary fields (different closed forms can apply, but determined by the field's <a href="https://en.wikipedia.org/wiki/Characteristic_(algebra)">characteristic</a>, and the same closed form applies for all large prime fields).</p>
<p>Our algorithm uses a field of the form <span class="math">\(\mathbb{F}_p\)</span>, that is the <a href="https://en.wikipedia.org/wiki/Finite_field#Existence_and_uniqueness">unique</a> field with <span class="math">\(p\)</span> (a prime) elements. The most straightforward construction of this field is arithmetic modulo <span class="math">\(p\)</span>. The other finite fields used in practise in ECC are of the form <span class="math">\(\mathbb{F}_{2^m}\)</span> and are sometimes called 'binary fields' (representible as polynomials with binary coefficients). Their field structure is also used in AES through byte substitution, implemented by inversion modulo <span class="math">\(\mathbb{F}_{2^8}\)</span>.</p>
<p>From a performance perspective, great optimisations can be made by implementing efficient fixed-point arithmetic specialised to modulo by single prime constant, <span class="math">\(p\)</span>. From here on out, I'll be speaking from this abstraction layer alone.</p>
<h2>Limbs</h2>
<p>We wish to multiply two <span class="math">\(m\)</span>-bit numbers, each of which represented with <span class="math">\(n\)</span> 64-bit machine words in some way. Let's suppose just for now that <span class="math">\(n\)</span> divides <span class="math">\(m\)</span> neatly, then the quotient <span class="math">\(d\)</span> is the minimum number of bits in each machine word that will be required for representing our number. Suppose we use the straightforward representation whereby the least significant <span class="math">\(d\)</span> bits are used for storing parts of our number, which we better call <span class="math">\(x\)</span> because this is crypto and descriptive variable names are considered harmful (apparently).</p>
<div class="math">$$x = \sum_{k = 0}^{n-1} 2^{dk} l_k$$</div>
<p>If we then drop the requirement for each of our <span class="math">\(n\)</span> machine words (also referred to as a 'limb' from hereon out) to have no more than the least significant <span class="math">\(d\)</span> bits populated, we say that such an implementation uses 'redundant limbs', meaning that the <span class="math">\(k\)</span>-th limb has high bits which overlap with the place values represented in the <span class="math">\((k+1)\)</span>-th limb.</p>
<h2>Multiplication (mod p)</h2>
<p>The fundamental difficulty with making modulo arithmetic fast is to do with the following property of multiplication.</p>
<p>Let <span class="math">\(a\)</span> and <span class="math">\(b\)</span> be <span class="math">\(m\)</span>-bit numbers, then <span class="math">\(0 \leq a &lt; 2^m\)</span> and <span class="math">\(0 \leq b &lt; 2^m\)</span>, but critically we cannot say the same about <span class="math">\(ab\)</span>. Instead, the best we can say is that <span class="math">\(0 \leq ab &lt; 2^{2m}\)</span>. Multiplication can in the worst case double the number of bits that must be stored, unless we can reduce modulo our prime.</p>
<p>If we begin with non-redundant, 56-bit limbs, then for <span class="math">\(a\)</span> and <span class="math">\(b\)</span> not too much larger than <span class="math">\(2^{384} &gt; p_{384}\)</span> that are 'reduced sufficiently' then we can multiply our limbs in the following ladder, so long as we are capable of storing the following sums without overflow.</p>
<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="cm">/* and so on ... */</span>

<span class="w">    </span><span class="n">out</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">5</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>

<span class="w">    </span><span class="n">out</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">5</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>

<span class="w">    </span><span class="n">out</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">5</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>

<span class="w">    </span><span class="n">out</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">4</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">5</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="w">           </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">uint128_t</span><span class="p">)</span><span class="w"> </span><span class="n">in1</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">in2</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>

<span class="w">    </span><span class="cm">/* ... and so forth */</span>
</code></pre></div>

<p>This is possible, if we back each of the 56-bit limbs with a 64-bit machine word, with products being stored in 128-bit machine words. The numbers <span class="math">\(a\)</span> and <span class="math">\(b\)</span> were able to be stored with 7 limbs, whereas we use 13 limbs for storing the product. If <span class="math">\(a\)</span> and <span class="math">\(b\)</span> were stored non-redundantly, than each of the output (redundant) limbs must contain values less than <span class="math">\(6 \cdot 2^{56} \cdot 2^{56} &lt; 2^{115}\)</span>, so there is no possibility of overflow in 128 bits. We even have room spare to do some additions/subtractions in cheap, redundant limb arithmetic.</p>
<p>But we can't keep doing our sums in redundant limb arithmetic forever, we must eventually reduce. Doing so may be expensive, and so we would rather reduce only when strictly necessary!</p>
<h2>Solinas-ish Reduction</h2>
<p>Our prime is a <em>Solinas</em> (<em>Pseudo/Generalised-Mersenne</em>) <em>Prime</em>. Mersenne Primes are primes expressible as <span class="math">\(2^m - 1\)</span>. This can be generalised to low-degree polynomials in <span class="math">\(2^m\)</span>. For example, another NIST curve uses <span class="math">\(p_{224} = 2^{224} - 2^{96} + 1\)</span> (a 224-bit number) where <span class="math">\(p_{224} = f(2^{32})\)</span> for <span class="math">\(f(t) = t^7 - t^3 + 1\)</span>. The simpler the choice of polynomial, the simpler the modular reduction logic.</p>
<p>Our choice of <span class="math">\(t\)</span> is <span class="math">\(2^{56}\)</span>. <a href="https://en.wikipedia.org/wiki/Solinas_prime#Modular_reduction_algorithm">Wikipedia</a> the ideal case for Solinas reduction where the bitwidth of the prime is divisible by <span class="math">\(\log_2{t}\)</span>, but that is not our scenario. We choose 56-bits for some pretty simple realities of hardware. 56 is less than 64 (standard machine word size) but not by too much, and the difference is byte-addressible (<span class="math">\(64-56=8\)</span>). Let me explain:</p>
<h2>Just the Right Amount of Reduction (mod p)</h2>
<p>Let's first describe the actual prime that is our modulus.</p>
<div class="math">$$p_{384} = 2^{384} - 2^{128} - 2^{96} + 2^{32} - 1$$</div>
<p>Yuck. This number is so yuck in fact, that noone has so far managed to upstream a Solinas' reduction method for it in OpenSSL, in spite of <code>secp384r1</code> being the preferred curve for ECDH (Elliptic Curve Diffie-Hellman key exchange) and ECDSA (Elliptic Curve Digital Signature Algorithm) by NIST.</p>
<p>In 56-bit limbs, we would express this number so:</p>
<p>Let <span class="math">\(f(t) = 2^{48} t^6 - 2^{16} t^2 - 2^{40} t + (2^{32} - 1)\)</span>, then observe that all coefficients are smaller than <span class="math">\(2^{56}\)</span>, and that <span class="math">\(p_{384} = f(2^{56})\)</span>.</p>
<p>Now let <span class="math">\(\delta(t) = 2^{16} t^2 + 2^{40} t - 2^{32} + 1\)</span>, consider that <span class="math">\(p_{384} = 2^{384} - \delta(2^{56})\)</span>, and thus <span class="math">\(2^{384} \equiv \delta(2^{56}) \mod{p_{384}}\)</span>. From now on let's call <span class="math">\(\delta(2^{56})\)</span> just <span class="math">\(\delta\)</span>. Thus, 'reduction' can be achieved as follows for suitable <span class="math">\(X\)</span> and <span class="math">\(Y\)</span>:</p>
<div class="math">$$ab = X + 2^{384} Y \equiv X + \delta Y \mod{p_{384}}$$</div>
<h3>Calculating <span class="math">\(\delta Y\)</span></h3>
<h4>First Substitution</h4>
<p>First make a choice of <span class="math">\(X\)</span> and <span class="math">\(Y\)</span>. The first thing to observe here is that this can actually be made a large number of ways! We choose:</p>
<div class="math">$$X_1 = \sum_{k=0}^8\texttt{in[k]} t^k$$</div>
<div class="math">$$Y_1 = 2^8 t^2 \sum_{k=9}^{12}\texttt{in[k]} t^{k-9} = 2^8 \sum_{k=9}^{12}\texttt{in[k]} t^{k-7}$$</div>
<p>'Where does the <span class="math">\(2^8 t^{2}\)</span> come from?' I hear you ask. See <span class="math">\(t^9 = t^2 \cdot t^7 = t^2 (2^8 \cdot 2^{384}) \equiv (2^8 t^2) \delta \mod{f(t)}\)</span>. It's clear to see that the place value of <code>in[9] ... in[12]</code> is greater than <span class="math">\(2^{384}\)</span>.</p>
<p>I'm using the subscripts here because we're in fact going to do a series of these reductions to reach a suitably small answer. That's because our equation for reducing <span class="math">\(t^7\)</span> terms is as follows:</p>
<div class="math">$$t^7 \equiv 2^8\delta \equiv 2^{24} t^2 + 2^{48} t + (-2^{40} + 2^8) \mod{f(t)}$$</div>
<p>Thus reducing <code>in[12]</code> involves computing:</p>
<div class="math">$$\texttt{in[12]} t^{12} = \texttt{in[12]} (t^5)(t^7) \equiv 2^8\delta \cdot \texttt{in[12]} t^5 \mod{f(t)}$$</div>
<p>But <span class="math">\(\delta\)</span> is a degree two polynomial, and so our numbers can still have two more limbs than we would want them to have. To be safe, let's store <span class="math">\(X_1 + \delta Y_1\)</span> in accumulator limbs <code>acc[0] ... acc[8]</code> (this will at first appear to be one more limb than necessary), then we can eliminate <code>in[12]</code> with the following logic.</p>
<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="cm">/* assign accumulators to begin */</span>
<span class="w">    </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">9</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="w">        </span><span class="n">acc</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">in</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>

<span class="w">    </span><span class="cm">/* X += 2^128 Y */</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">32</span><span class="p">;</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="p">(</span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xffffffff</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">24</span><span class="p">;</span>

<span class="w">    </span><span class="cm">/* X += 2^96 Y */</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="p">(</span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xff</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">48</span><span class="p">;</span>

<span class="w">    </span><span class="cm">/* X += (-2^32 + 1) Y */</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">16</span><span class="p">;</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="p">((</span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xffff</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">40</span><span class="p">);</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">48</span><span class="p">;</span>
<span class="w">    </span><span class="n">acc</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="p">(</span><span class="n">in</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xffffffffffff</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span>
</code></pre></div>

<p>Notice that for each term in <span class="math">\(\delta = 2^{128} + 2^{96} + (2^{32} - 1)\)</span> we do two additions/subtractions. This is in order to split up operands in order to minimise the final size of numbers and prevent over/underflows. Consequently, we need an <code>acc[8]</code> to receive the high bits of our <code>in[12]</code> substitution given above.</p>
<h4>Second Substitution</h4>
<p>Let's try and now eliminate through substitution <code>acc[7]</code> and <code>acc[8]</code>. Let</p>
<div class="math">$$X_2 = \sum^{6}_{k=0}\texttt{acc[k]}t^k $$</div>
<div class="math">$$Y_2 = 2^8(\texttt{acc[7]} t^7 + \texttt{acc[8]} t^8)$$</div>
<p>But this time, <span class="math">\(\delta Y_2\)</span> is a number that comfortably can take up just five limbs, so we can update <code>acc[0], ..., acc[5]</code> comfortably in-place.</p>
<h4>Third Substitution</h4>
<p>Finally, let's reduce all the high bits of <code>in[6]</code>. Since <code>in[6]</code> has place value <span class="math">\(t^6 = 2^{336}\)</span>, thus we wish to reduce all but the least significant <span class="math">\(384 - 336 = 48\)</span> bits.</p>
<p>A goal in designing this algorithm is to ensure that <code>acc[6]</code> has as tight a bound as reasonably possible. Intuitively, if we can cause <code>acc[6]</code> to be as large as possible by absorbing the high bits of lower limbs, we reduce the number of bits that must be carried forward later on. As such, we perform a carry of the high-bits of <code>acc[4]</code>, <code>acc[5]</code> into <code>acc[6]</code> before we begin our substitution.</p>
<p>Again, let</p>
<div class="math">$$X_3 = \sum^{5}_{k=0}\texttt{acc[k]}t^k + (\texttt{acc[6]} \text{(low bits)})t^6$$</div>
<div class="math">$$Y_3 = 2^{48}(\texttt{acc[6]} \text{(high bits, right shifted)}) t^6$$</div>
<p>The equation for eliminating <span class="math">\(2^{48}t^6\)</span> is pretty straightforward:</p>
<div class="math">$$2^{384} = 2^{48}t^6 \equiv 2^{16}t^2 + 2^{40}t + (-2^{32} + 1) \mod{f(t)}$$</div>
<h4>Carries</h4>
<p>Finally, as each of <code>acc[0], ..., acc[6]</code> can contain values larger than <span class="math">\(2^{56}\)</span>, we carry their respective high bits into <code>acc[6]</code> so as to remove any redundancy. Conveniently, our preemptive carrying before the third substitution has granted us a pretty tight bound on our final calculation - the final reduced number has the range <span class="math">\([0, 2^{384}]\)</span>.</p>
<h4>Canonicalisation</h4>
<p>This is 'just the right amount of reduction' but not <em>canonicalisation</em>. That is, since <span class="math">\(0 &lt; p_{384} &lt; 2^{384}\)</span>, there can be multiple possible reduced values for a given congruence class. <code>felem_contract</code> is a method which uses the fact that <span class="math">\(0  \leq x &lt; 2 p_{384}\)</span> to further reduce the output of <code>felem_reduce</code> into the range <span class="math">\([0, p_{384})\)</span> in constant time.</p>
<p>This code has many more dragons I won't explain here, but the basic premise to the calculations performed there is as follows:</p>
<p>Given a 385 bit input, checking whether our input (expressed as a concatenation of bits) <span class="math">\(b_{384}b_{383} \ldots b_1b_0\)</span> is greater than or equal to <span class="math">\(p_{384}\)</span> whose bits we denote <span class="math">\(q_{384}, \ldots, q_0\)</span> (<span class="math">\(q_{384} = 0\)</span>) is determined by the following logical predicate (<span class="math">\(G(384)\)</span>):</p>
<div class="math">$$G(k) \equiv (b_k \land \lnot q_k) \lor ((b_k = q_k) \land G(k-1))$$</div>
<div class="math">$$G(0) \equiv b_k = q_k$$</div>
<p>With <span class="math">\(p_{384}\)</span> being a Solinas'/Pseudo-Mersenne Prime, it has a large number of contiguous runs of repeated bits, so we can of course use this to massively simplify our predicate. Doing this in constant time involves some interesting bit-shifting/masking schenanigans. Essentially, you want a bit vector of all ones/zeros depending on the value of <span class="math">\(G(384)\)</span>, we then logically 'and' with this bitmask to 'conditionally' subtract <span class="math">\(p_{384}\)</span> from our result.</p>
<h3>A Side Note about the Weird Constants</h3>
<p>Okay so we're implementing our modular arithmetic with unsigned integer limbs that together represent a number of the following form:</p>
<div class="math">$$x = \sum_{k = 0}^{n-1} 2^{dk} l_k$$</div>
<p>How do we then do subtractions in a way which will make overflow impossible? Well computing <span class="math">\(a - b\)</span> is really straightforward if every limb of <span class="math">\(a\)</span> is larger than every limb of <span class="math">\(b\)</span>. We then add a suitable multiple of <span class="math">\(p_{384}\)</span> to <span class="math">\(a\)</span> that causes each limb of <span class="math">\(a\)</span> to be sufficiently large.</p>
<p>Thankfully, with redundant-limb arithmetic, we can do this easily by means of <em>telescopic sums</em>. For example, in <code>felem_reduce</code> we wanted all limbs of our <span class="math">\(p_{384}\)</span> multiple to be sufficiently large. We overshot any requirement and provided such a multiple which gives a lower bound <span class="math">\(2^{123}\)</span>. We first scale our prime accordingly so that its 'lead term' (speaking in the polynomial representation) is <span class="math">\(2^{124}\)</span>.</p>
<div class="math">$$2^{76} f(t) = 2^{124} t^6 - 2^{92} t^2 - 2^{116} t + (2^{108} - 2^{76}) t^0$$</div>
<p>Notice that most limbs of this multiple (the limbs will be the coefficients) are either too small or negative. We then transform this expression into a suitable telescopic sum. Observe that when <span class="math">\(t = 2^{56}\)</span>, <span class="math">\(2^{124} t^k = 2^{124-56}t^{k+1} = 2^{68} t^{k+1}\)</span>, and so simply introduce into each limb where required a <span class="math">\(2^{124}\)</span> term by means of addition, subtracting the same number from a higher limb.</p>
<div class="math">$$
\begin{align*}
2^{76} f(t) &amp;= (2^{124} - 2^{68}) t^6 \\
            &amp;+ (2^{124} - 2^{68}) t^5 \\
            &amp;+ (2^{124} - 2^{68}) t^4 \\
            &amp;+ (2^{124} - 2^{68}) t^3 \\
            &amp;+ (2^{124} - 2^{92} - 2^{68}) t^2 \\
            &amp;+ (2^{124} - 2^{116} - 2^{68}) t \\
            &amp;+ (2^{124} + 2^{108} - 2^{76})
\end{align*}
$$</div>
<p>We can then subtract values whose limbs are no larger than the least of these limbs above without fear of underflows providing us with an incorrect result. In our case, that upper bound for limb value is <span class="math">\(2^{124} - 2^{116} - 2^{68} &gt; 2^{123}\)</span>. Very comfortable.</p>
<h2>Concerning Timing Side-Channels</h2>
<p>Cryptographic routines must perform all of their calculations in constant time. More specifically, it is important that timing cryptography code should not reveal any private keys or random nonces used during computation. Ultimately, all of our work so far has been to speed up field arithmetic in the modulo field with prime <span class="math">\(p_{384}\)</span>. But this is done in order to facilitate calculations in the secp384r1 elliptic curve, and ECDSA/ECDH each depend on being able to perform scalar 'point multiplication' (repeat application of the group operator). Since such an operation is inherently iterative, it presents the greatest potential for timing attacks.</p>
<p>We implement constant-time multiplication with the <em>wNAF</em> ladder method. This relies on pre-computing a window of multiples of the group generator, and then scaling and selectively adding multiples when required. <a href="https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_multiplication">Wikipedia</a> provides a helpful primer to this method  by cumulatively building upon more naive approaches.</p>
<h2>Conclusion</h2>
<p>While the resulting code borrows from and uses common language of Solinas reduction, ultimately there are a number of implementation decisions that were guided by heuristic - going from theory to implementation was far from cut-and-dry. The limb size, carry order, choice of substitutions as well as pre and post conditions made here are ultimately arbitrary. You could easily imagine there being further refinements obtaining a better result. For now, I hope this post serves to demystify the inner workings of ECC implementations in OpenSSL. These algorithms, although particular and sophisticated, need not be immutable.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
    var align = "center",
        indent = "0em",
        linebreak = "false";

    if (false) {
        align = (screen.width < 768) ? "left" : align;
        indent = (screen.width < 768) ? "0em" : indent;
        linebreak = (screen.width < 768) ? 'true' : linebreak;
    }

    var mathjaxscript = document.createElement('script');
    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
    mathjaxscript.type = 'text/javascript';
    mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';

    var configscript = document.createElement('script');
    configscript.type = 'text/x-mathjax-config';
    configscript[(window.opera ? "innerHTML" : "text")] =
        "MathJax.Hub.Config({" +
        "    config: ['MMLorHTML.js']," +
        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
        "    displayAlign: '"+ align +"'," +
        "    displayIndent: '"+ indent +"'," +
        "    showMathMenu: true," +
        "    messageStyle: 'normal'," +
        "    tex2jax: { " +
        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
        "        displayMath: [ ['$$','$$'] ]," +
        "        processEscapes: true," +
        "        preview: 'TeX'," +
        "    }, " +
        "    'HTML-CSS': { " +
        "        availableFonts: ['STIX', 'TeX']," +
        "        preferredFont: 'STIX'," +
        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
        "    }, " +
        "}); " +
        "if ('default' !== 'default') {" +
            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
        "}";

    (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script><script type='text/javascript'>if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
    var align = "center",
        indent = "0em",
        linebreak = "false";

    if (false) {
        align = (screen.width < 768) ? "left" : align;
        indent = (screen.width < 768) ? "0em" : indent;
        linebreak = (screen.width < 768) ? 'true' : linebreak;
    }

    var mathjaxscript = document.createElement('script');
    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
    mathjaxscript.type = 'text/javascript';
    mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';

    var configscript = document.createElement('script');
    configscript.type = 'text/x-mathjax-config';
    configscript[(window.opera ? "innerHTML" : "text")] =
        "MathJax.Hub.Config({" +
        "    config: ['MMLorHTML.js']," +
        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
        "    displayAlign: '"+ align +"'," +
        "    displayIndent: '"+ indent +"'," +
        "    showMathMenu: true," +
        "    messageStyle: 'normal'," +
        "    tex2jax: { " +
        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
        "        displayMath: [ ['$$','$$'] ]," +
        "        processEscapes: true," +
        "        preview: 'TeX'," +
        "    }, " +
        "    'HTML-CSS': { " +
        "        availableFonts: ['STIX', 'TeX']," +
        "        preferredFont: 'STIX'," +
        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
        "    }, " +
        "}); " +
        "if ('default' !== 'default') {" +
            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
            "});" +
        "}";

    (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script></div>
  <footer>
    <a rel="full-article" href="https://sthbrx.github.io/blog/2023/08/07/going-out-on-a-limb-efficient-elliptic-curve-arithmetic-in-openssl/">Read On &crarr;</a>
  </footer>
  		</article>
  		<article>
<header>
      <h1 class="entry-title">
        <a href="https://sthbrx.github.io/blog/2023/08/04/quirks-of-parsing-ssh-configs/">Quirks of parsing SSH configs</a>
      </h1>
    <p class="meta">
<time datetime="2023-08-04T18:00:00+10:00" pubdate>Fri 04 August 2023</time>    </p>
</header>

<div class="byline_index">
  <span class="byline author vcard">
    Posted by <span class="fn">
          <a href="https://sthbrx.github.io/author/benjamin-gray.html">Benjamin Gray</a>
    </span>
  </span>
<time datetime="2023-08-04T18:00:00+10:00" pubdate>Fri 04 August 2023</time></div>

  <div class="entry-content"><h2>Introduction</h2>
<p>I've been using the VSCodium 
<a href="https://open-vsx.org/extension/jeanp413/open-remote-ssh">Open Remote - SSH</a>
extension recently to great results. I can treat everything as a single
environment, without any worry about syncing between my local development files
and the remote. This is very different to mounting the remote as a network drive
and opening a local instance of VSCodium on it: in addition to crippling latency
on every action, a locally mounted drive doesn't bring the build context that
tools like <code>clangd</code> require (e.g., system headers).</p>
<p>Instead, the remote extension runs a server on the remote that performs most
actions, and the local VSCodium instance acts as a client that buffers and
caches data seamlessly, so the experience is nearly as good as developing
locally. </p>
<p>For example, a project wide file search on a network drive is unusably slow
because every file and directory read requires a round trip back to the remote,
and the latency is just too large to finish getting results back in a reasonable
time. But with the client-server approach, the client just sends the search
request to the server for it to fulfil, and all the server has to do is send the
matches back. This eliminates nearly all the latency effects, except for the
initial request and receiving any results.</p>
<p>However there has been one issue with using this for everything: the extension
failed to connect when I wasn't on the same network as the host machine. So I
wasn't able to use it when working from home over a VPN. In this post we find
out why this happened, and in the process look at some of the weird quirks of
parsing an SSH config.</p>
<h2>The issue</h2>
<p>As above, I wasn't able to connect to my remote machines when working from home.
The extension would abort with the following error:</p>
<div class="highlight"><pre><span></span><code>[Error  - 00:23:10.592] Error resolving authority
Error: getaddrinfo ENOTFOUND remotename.ozlabs.ibm.com
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26)
</code></pre></div>

<p>So it's a DNS issue. This would make sense, as the remote machine is not exposed
to the internet, and must instead be accessed through a proxy. What's weird is
that the integrated terminal in VSCodium has no problem connecting to the
remote. So the extension seems to be doing something different than just a plain
SSH connection.</p>
<p>You might think that the extension is not reading the SSH config. But the
extension panel lists all the host aliases I've declared in the config, so it's
clearly aware of the config at least. Possibly it doesn't understand the proxy
config correctly? If it was trying to connect directly from the host, it would
make sense to fail a DNS lookup.</p>
<h2>Investigating</h2>
<p>Enough theorising, time to debug the extension as it tries to connect.</p>
<p>From the error above, the string <code>"Error resolving authority"</code> looks like
something I can search for. This takes me to the
<a href="https://github.com/jeanp413/open-remote-ssh/blob/521098e24f48b4b9e04d476895f9097b03f8c984/src/authResolver.ts#L226"><code>catch</code> case for a large try-catch block</a>.
It could be annoying to narrow down which part of the block
throws the exception, but fortunately debugging is as easy as installing the
dependencies and running the pre-configured 'Extension' debug target. This opens
a new window with the local copy of the extension active, and I can debug it in
the original window.</p>
<p>In this block, there is a conditional statement on whether the <code>ProxyJump</code> field
is present in the config. This is a good place to break on and see what the
computed config looks like. If it doesn't find a proxy then of course it's going
to run everything on the host.</p>
<p>And indeed, it doesn't think there is a proxy. This is progress, but why does
the extension's view of the config not match up with what SSH does? After all,
invoking SSH directly connects properly. Tracing back the source of the config
in the extension, it ultimately comes from manually reading in and parsing the
SSH config. When resolving the host argument it manually computes the config as
per <a href="https://man7.org/linux/man-pages/man5/ssh_config.5.html"><code>ssh_config(5)</code></a>.
Yet somewhere it makes a mistake, because it doesn't include the <code>ProxyJump</code>
field.</p>
<h2>Parsing SSH config</h2>
<p>To get to the bottom of this, we need to know the rules behind parsing SSH
configs. The <code>ssh_config(5)</code> manpage does a pretty decent job of explaining
this, but I'm going to go over the relevant information here. I reckon most
people have a vague idea of how it works, and can write enough to meet their
needs, but have never looked deeper into the actual rules behind how SSH parses
the config.</p>
<ol>
<li>
<p>For starters, the config is parsed line by line. Leading whitespace (i.e.,
   indentation) is ignored. So, while indentation makes it look like you are
   configuring properties for a particular host, this isn't quite correct.
   Instead, the <code>Host</code> and <code>Match</code> lines are special statements that enable or
   disable all subsequent lines until the next <code>Host</code> or <code>Match</code>.</p>
<p>There is no backtracking; previous conditions and lines are not re-evaluated
after learning more about the config later on.</p>
</li>
<li>
<p>When a config line is seen, and is active thanks to the most recent <code>Host</code> or
   <code>Match</code> succeeding, its value is selected if it is the first of that config
   to be selected. So the earliest place a value is set takes priority; this may
   be a little counterintuitive if you are used to having the latest value be
   picked, like enable/disable command line flags tend to work.</p>
</li>
<li>
<p>When <code>HostName</code> is set, it replaces the <code>host</code> value in <code>Match</code> matches. It
   is also used as the <code>Host</code> value during a final pass (if requested).</p>
</li>
<li>
<p>The last behaviour of interest is the <code>Match final</code> rule. There are several
   conditions a <code>Match</code> statement can have, and the <code>final</code> rule says make this
   active on the final pass over the config.</p>
</li>
</ol>
<p>Wait, final pass? Multiple passes? Yes. If <code>final</code> is a condition on a <code>Match</code>,
SSH will do another pass over the entire config, following all the rules above.
Except this time all the configs we read on the first pass are still active (and
can't be changed). But all the <code>Host</code> and <code>Matches</code> are re-evaluated, allowing
other configs to potentially be set. I guess that means rule (1) ought to have a
big asterisk next to it.</p>
<p>Together, these rules can lead to some quirky behaviours. Consider the following
config</p>
<div class="highlight"><pre><span></span><code>Match host=&quot;*.ozlabs.ibm.com&quot;
    ProxyJump proxy

Host example
    HostName example.ozlabs.ibm.com
</code></pre></div>

<p>If I run <code>ssh example</code> on the command line, will it use the proxy?</p>
<p>By rule (1), no. When testing the first <code>Match host</code> condition, our host value
is currently <code>example</code>. It is not until we reach the <code>HostName</code> config that we
start using <code>example.ozlabs.ibm.com</code> for these matches.</p>
<p>But by rule (4), the answer turns into <em>maybe</em>. If we end up doing a second pass
over the config thanks to a <code>Match final</code> that could be <em>anywhere</em> else, we
would now be matching <code>example.ozlabs.ibm.com</code> against the first line on the
second go around. This will pass, and, since nothing has set <code>ProxyJump</code> yet, we
would gain the proxy.</p>
<p>You may think, yes, but we don't have a <code>Match final</code> in that example. But if
you thought that, then you forgot about the system config.</p>
<p>The system config is effectively appended to the user config, to allow any
system wide settings. Most of the time this isn't an issue because of the
first-come-first-served rule with config matches (rule 2). But if the system
config includes a <code>Match final</code>, it will trigger the entire config to be
re-parsed, including the user section. And it so happens that, at least on
Fedora with the <code>openssh-clients</code> package installed, the system config does
contain a <code>Match final</code> (see <code>/etc/ssh/ssh_config.d</code>).</p>
<p>But wait, there's more! If we want to specify a custom SSH config file, then we
can use <code>-F path/to/config</code> in the command line. But this disables loading a
system config, so we would no longer get the proxy!</p>
<p>To sum up, for the above config:</p>
<ol>
<li><code>ssh example</code> doesn't have a proxy</li>
<li>...unless a system config contains <code>Match final</code></li>
<li>...but invoking it as <code>ssh -F ~/.ssh/config example</code> definitely won't have
   the proxy</li>
<li>...but if a subprocess invokes <code>ssh example</code> while trying to resolve another
   host, it'll probably not add the <code>-F ~/.ssh/config</code>, so we might get the
   proxy again (in the child process).</li>
</ol>
<p>Wait, how did that last one slip in? Well, unlike environment variables, it's a
lot harder for processes to propagate command line flags correctly. If resolving
the config involves running a script that itself tries to run SSH, chances are
the <code>-F</code> flag won't be propagated and you'll see some weird behaviour.</p>
<p>I swear that's all for now, you've probably learned more about SSH configs than
you will ever need to care about.</p>
<h2>Back to VSCodium</h2>
<p>Alright, armed now with this knowledge on SSH config parsing, we can work out
what's going on with the extension. It ends up being a simple issue: it doesn't
apply rules (3) and (4), so all <code>Host</code> matches are done against the original
host name.</p>
<p>In my case, there are several machines behind the proxy, but they all share a
common suffix, so I had a <code>Host *.ozlabs.ibm.com</code> rule to apply the proxy. I
also use aliases to refer to the machines without the <code>.ozlabs.ibm.com</code> suffix,
so failing to follow rule (3) lead to the situation where the extension didn't
think there was a proxy.</p>
<p>However, even if this were to be fixed, it still doesn't respect rule (4), or
most complex match logic in general. If the hostname bug is fixed then my setup
would work, but it's less than ideal to keep playing whack-a-mole with parsing
bugs. It would be a lot easier if there was a way to just ask SSH for the config
that a given host name resolves to.</p>
<p>Enter <code>ssh -G</code>. The <code>-G</code> flag asks SSH to dump the complete resolved config,
without actually opening the connection (it may execute arbitrary code while
resolving the config however!). So to fix the extension once and for all, we
could swap the manual parser to just invoking <code>ssh -G example</code>, and parsing the
output as the final config. No <code>Host</code> or <code>Match</code> or <code>HostName</code> or <code>Match final</code>
quirks to worry about.</p>
<p>Sure enough, if we replace the config backend with this 'native' resolver, we
can connect to all the machines with no problem. Hopefully the
<a href="https://github.com/jeanp413/open-remote-ssh/pull/103">pull request</a> to add this
support will get accepted, and I can stop running my locally patched copy of the
extension.</p>
<p>In general, I'd suggest avoiding any dependency on a second pass being done on
the config. Resolve your aliases early, so that the rest of your matches work
against the full hostname. If you later need to match against the name passed in
the command line, you can use <code>Match originalhost=example</code>. The example above
should always be written as</p>
<div class="highlight"><pre><span></span><code>Host example
    HostName example.ozlabs.ibm.com

Match host=&quot;*.ozlabs.ibm.com&quot;
    ProxyJump proxy
</code></pre></div>

<p>even if the reversed order might appear to work thanks to the weird interactions
described above. And after learning these parser quirks, I find the idea of
using <code>Host</code> match statements unreliable; that they may or may not be run
against the <code>HostName</code> value allows for truely strange bugs to appear. Maybe you
should remove this uncertainty by starting your config with <code>Match final</code> to at
least always be parsed the same way.</p></div>
  		</article>
  		<article>
<header>
      <h1 class="entry-title">
        <a href="https://sthbrx.github.io/blog/2023/04/05/detecting-rootless-docker/">Detecting rootless Docker</a>
      </h1>
    <p class="meta">
<time datetime="2023-04-05T13:00:00+10:00" pubdate>Wed 05 April 2023</time>    </p>
</header>

<div class="byline_index">
  <span class="byline author vcard">
    Posted by <span class="fn">
          <a href="https://sthbrx.github.io/author/andrew-donnellan.html">Andrew Donnellan</a>
    </span>
  </span>
<time datetime="2023-04-05T13:00:00+10:00" pubdate>Wed 05 April 2023</time></div>

  <div class="entry-content"><h2>Trying to do some fuzzing...</h2>
<p>The other day, for the first time in a while, I wanted to do something with <a href="https://github.com/google/syzkaller">syzkaller</a>, a system call fuzzer that has been used to find literally thousands of kernel bugs. As it turns out, since the last time I had done any work on syzkaller, I switched to a new laptop, and so I needed to set up a few things in my development environment again.</p>
<p>While I was doing this, I took a look at the syzkaller source again and found a neat little script called <a href="https://github.com/google/syzkaller/blob/master/tools/syz-env"><code>syz-env</code></a>, which uses a Docker image to provide you with a standardised environment that has all the necessary tools and dependencies preinstalled.</p>
<p>I decided to give it a go, and then realised I hadn't actually installed Docker since getting my new laptop. So I went to do that, and along the way I discovered <a href="https://docs.docker.com/engine/security/rootless/">rootless mode</a>, and decided to give it a try.</p>
<h2>What's rootless mode?</h2>
<p>As of relatively recently, Docker supports rootless mode, which allows you to run your <code>dockerd</code> as a non-root user. This is helpful for security, as traditional "rootful" Docker can trivially be used to obtain root privileges outside of a container. Rootless Docker is implemented using <a href="https://github.com/rootless-containers/rootlesskit">RootlessKit</a> (a fancy replacement for <a href="https://wiki.debian.org/FakeRoot">fakeroot</a> that uses user namespaces) to create a new user namespace that maps the UID of the user running <code>dockerd</code> to 0.</p>
<p>You can find more information, including details of the various restrictions that apply to rootless setups, <a href="https://docs.docker.com/engine/security/rootless/">in the Docker documentation</a>.</p>
<h2>The problem</h2>
<p>I ran <code>tools/syz-env make</code> to test things out. It pulled the container image, then gave me some strange errors:</p>
<div class="highlight"><pre><span></span><code>ajd@jarvis-debian:~/syzkaller$ tools/syz-env make NCORES=1
gcr.io/syzkaller/env:latest
warning: Not a git repository. Use --no-index to compare two paths outside a working tree
usage: git diff --no-index [&lt;options&gt;] &lt;path&gt; &lt;path&gt;

    ...

fatal: detected dubious ownership in repository at &#39;/syzkaller/gopath/src/github.com/google/syzkaller&#39;
To add an exception for this directory, call:

        git config --global --add safe.directory /syzkaller/gopath/src/github.com/google/syzkaller
fatal: detected dubious ownership in repository at &#39;/syzkaller/gopath/src/github.com/google/syzkaller&#39;
To add an exception for this directory, call:

        git config --global --add safe.directory /syzkaller/gopath/src/github.com/google/syzkaller
go list -f &#39;{{.Stale}}&#39; ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
error obtaining VCS status: exit status 128
        Use -buildvcs=false to disable VCS stamping.
error obtaining VCS status: exit status 128
        Use -buildvcs=false to disable VCS stamping.
make: *** [Makefile:155: descriptions] Error 1
</code></pre></div>

<p>After a bit of digging, I found that <code>syz-env</code> mounts the syzkaller source directory inside the container as a volume. <code>make</code> was running with UID 1000, while the files in the mounted volume appeared to be owned by root.</p>
<p>Reading the script, it turns out that <code>syz-env</code> invokes <code>docker run</code> with the <code>--user</code> option to set the UID inside the container to match the user's UID outside the container, to ensure that file ownership and permissions behave as expected.</p>
<p>This works in rootful Docker, where files appear inside the container to be owned by the same UID as they are outside the container. However, it breaks in rootless mode: due to the way RootlessKit sets up the namespaces, the user's UID is mapped to 0, causing the files to appear to be owned by root.</p>
<p>The workaround seemed pretty obvious: just skip the <code>--user</code> flag if running rootless.</p>
<h2>How can you check whether your Docker daemon is running in rootless mode?</h2>
<p>It took me quite a while, as a total Docker non-expert, to figure out how to definitively check whether the Docker daemon is running rootless or not. There's a variety of ways you could do this, such as checking the name of the current Docker context to see if it's called <code>rootless</code> (as used by the Docker rootless setup scripts), but I think the approach I settled on is the most correct one.</p>
<p>If you want to check whether your Docker daemon is running in rootless mode, use <code>docker info</code> to query the daemon's security options, and check for the <code>rootless</code> option.</p>
<div class="highlight"><pre><span></span><code>docker info -f &quot;{{println .SecurityOptions}}&quot; | grep rootless
</code></pre></div>

<p>If this prints something like:</p>
<div class="highlight"><pre><span></span><code>[name=seccomp,profile=builtin name=rootless name=cgroupns]
</code></pre></div>

<p>then you're running rootless.</p>
<p>If not, then you're running the traditional rootful.</p>
<p>Easy! (And I sent a fix which is now <a href="https://github.com/google/syzkaller/commit/340a1b9094e4b3fad232c98c62de653ec48954ab">merged into syzkaller!</a>)</p></div>
  		</article>
  		<article>
<header>
      <h1 class="entry-title">
        <a href="https://sthbrx.github.io/blog/2023/04/04/dumb-bugs-the-pci-device-that-wasnt/">Dumb bugs: the PCI device that wasn't</a>
      </h1>
    <p class="meta">
<time datetime="2023-04-04T15:55:00+10:00" pubdate>Tue 04 April 2023</time>    </p>
</header>

<div class="byline_index">
  <span class="byline author vcard">
    Posted by <span class="fn">
          <a href="https://sthbrx.github.io/author/russell-currey.html">Russell Currey</a>
    </span>
  </span>
<time datetime="2023-04-04T15:55:00+10:00" pubdate>Tue 04 April 2023</time></div>

  <div class="entry-content"><p>I was happily minding my own business one fateful afternoon when I received the following kernel bug report:</p>
<div class="highlight"><pre><span></span><code>BUG: KASAN: slab-out-of-bounds in vga_arbiter_add_pci_device+0x60/0xe00
Read of size 4 at addr c000000264c26fdc by task swapper/0/1

Call Trace:
dump_stack_lvl+0x1bc/0x2b8 (unreliable)
print_report+0x3f4/0xc60
kasan_report+0x244/0x698
__asan_load4+0xe8/0x250
vga_arbiter_add_pci_device+0x60/0xe00
pci_notify+0x88/0x444
notifier_call_chain+0x104/0x320
blocking_notifier_call_chain+0xa0/0x140
device_add+0xac8/0x1d30
device_register+0x58/0x80
vio_register_device_node+0x9ac/0xce0
vio_bus_scan_register_devices+0xc4/0x13c
__machine_initcall_pseries_vio_device_init+0x94/0xf0
do_one_initcall+0x12c/0xaa8
kernel_init_freeable+0xa48/0xba8
kernel_init+0x64/0x400
ret_from_kernel_thread+0x5c/0x64
</code></pre></div>

<p>OK, so <a href="https://www.kernel.org/doc/html/latest/dev-tools/kasan.html">KASAN</a> has helpfully found an out-of-bounds access in <code>vga_arbiter_add_pci_device()</code>.  What the heck is that?</p>
<h2>Why does my VGA require arbitration?</h2>
<p>I'd never heard of the <a href="https://en.wikipedia.org/wiki/VGA_connector">VGA</a> arbiter in the kernel (do kids these days know what VGA is?), or <code>vgaarb</code> as it's called.  What it does is irrelevant to this bug, but I found the history pretty interesting!  <a href="https://lists.freedesktop.org/archives/xorg/2005-March/006663.html">Benjamin Herrenschmidt proposed VGA arbitration back in 2005</a> as a way of resolving conflicts between multiple legacy VGA devices that want to use the same address assignments.  This was previously handled in userspace by the X server, but issues arose with multiple X servers on the same machine.  Plus, it's probably not a good idea for this kind of thing to be handled by userspace.  <a href="https://docs.kernel.org/gpu/vgaarbiter.html">You can read more about the VGA arbiter in the kernel docs</a>, but it's probably not something anyone has thought much about in a long time.</p>
<h2>The bad access</h2>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">vga_arbiter_add_pci_device</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">pci_dev</span><span class="w"> </span><span class="o">*</span><span class="n">pdev</span><span class="p">)</span>
<span class="p">{</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">vga_device</span><span class="w"> </span><span class="o">*</span><span class="n">vgadev</span><span class="p">;</span>
<span class="w">        </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">flags</span><span class="p">;</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">pci_bus</span><span class="w"> </span><span class="o">*</span><span class="n">bus</span><span class="p">;</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">pci_dev</span><span class="w"> </span><span class="o">*</span><span class="n">bridge</span><span class="p">;</span>
<span class="w">        </span><span class="n">u16</span><span class="w"> </span><span class="n">cmd</span><span class="p">;</span>

<span class="w">        </span><span class="cm">/* Only deal with VGA class devices */</span>
<span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="n">pdev</span><span class="o">-&gt;</span><span class="n">class</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">PCI_CLASS_DISPLAY_VGA</span><span class="p">)</span>
<span class="w">                </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
</code></pre></div>

<p>We're blowing up on the read to <code>pdev-&gt;class</code>, and it's not something like the data being uninitialised, it's out-of-bounds.  If we look back at the call trace:</p>
<div class="highlight"><pre><span></span><code>vga_arbiter_add_pci_device+0x60/0xe00
pci_notify+0x88/0x444
notifier_call_chain+0x104/0x320
blocking_notifier_call_chain+0xa0/0x140
device_add+0xac8/0x1d30
device_register+0x58/0x80
vio_register_device_node+0x9ac/0xce0
vio_bus_scan_register_devices+0xc4/0x13c
</code></pre></div>

<p>This thing is a VIO device, not a PCI device!  Let's jump into the caller, <code>pci_notify()</code>, to find out how we got our <code>pdev</code>.</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">pci_notify</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">notifier_block</span><span class="w"> </span><span class="o">*</span><span class="n">nb</span><span class="p">,</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">action</span><span class="p">,</span>
<span class="w">                      </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">device</span><span class="w"> </span><span class="o">*</span><span class="n">dev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">;</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">pci_dev</span><span class="w"> </span><span class="o">*</span><span class="n">pdev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">to_pci_dev</span><span class="p">(</span><span class="n">dev</span><span class="p">);</span>
</code></pre></div>

<p>So <code>pci_notify()</code> gets called with our VIO device (somehow), and we're converting that <code>struct device</code> into a <code>struct pci_dev</code> with no error checking.  We could solve this particular bug by just checking that our device is <em>actually</em> a PCI device before we proceed - but we're in a function called <code>pci_notify</code>, we're expecting a PCI device to come in, so this would just be a bandaid.</p>
<p><code>to_pci_dev()</code> works like other struct containers in the kernel - <code>struct pci_dev</code> contains a <code>struct device</code> as a member, so the <code>container_of()</code> function returns an address based on where a <code>struct pci_dev</code> would have to be if the given <code>struct device</code> was actually a PCI device.  Since we know it's not actually a PCI device and this <code>struct device</code> does not actually sit inside a <code>struct pci_dev</code>, our <code>pdev</code> is now pointing to some random place in memory, hence our access to a member like <code>class</code> is caught by KASAN.</p>
<p>Now we know why and how we're blowing up, but we still don't understand how we got here, so let's back up further.</p>
<h2>Notifiers</h2>
<p>The kernel's device subsystem allows consumers to register callbacks so that they can be notified of a given event.  I'm not going to go into a ton of detail on how they work, because I don't fully understand myself, and there's a lot of internals of the device subsystem involved.
The best references I could find for this are <a href="https://elixir.bootlin.com/linux/latest/source/include/linux/notifier.h">notifier.h</a>, and for our purposes here, <a href="https://elixir.bootlin.com/linux/latest/source/include/linux/device/bus.h#L260">the register notifier functions in bus.h</a>.</p>
<p>Something's clearly gone awry if we can end up in a function named <code>pci_notify()</code> without passing it a PCI device.  We find where the notifier is registered in <code>vgaarb.c</code> here:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">notifier_block</span><span class="w"> </span><span class="n">pci_notifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w">        </span><span class="p">.</span><span class="n">notifier_call</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pci_notify</span><span class="p">,</span>
<span class="p">};</span>

<span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">__init</span><span class="w"> </span><span class="nf">vga_arb_device_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="w">        </span><span class="cm">/* some stuff removed here... */</span>

<span class="w">        </span><span class="n">bus_register_notifier</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pci_bus_type</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">pci_notifier</span><span class="p">);</span>
</code></pre></div>

<p>This all looks sane.  A blocking notifier is registered so that <code>pci_notify()</code> gets called whenever there's a notification going out to PCI buses.  Our VIO device is distinctly <em>not</em> on a PCI bus, and in my debugging I couldn't find any potential causes of such confusion, so how on earth is a notification for PCI buses being applied to our non-PCI device?</p>
<p>Deep in the guts of the device subsystem, if we have a look at <code>device_add()</code> we find the following:</p>
<div class="highlight"><pre><span></span><code><span class="kt">int</span><span class="w"> </span><span class="nf">device_add</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">device</span><span class="w"> </span><span class="o">*</span><span class="n">dev</span><span class="p">)</span>
<span class="p">{</span>
<span class="w">        </span><span class="cm">/* lots of device init stuff... */</span>

<span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">bus</span><span class="p">)</span>
<span class="w">                </span><span class="n">blocking_notifier_call_chain</span><span class="p">(</span><span class="o">&amp;</span><span class="n">dev</span><span class="o">-&gt;</span><span class="n">bus</span><span class="o">-&gt;</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">bus_notifier</span><span class="p">,</span>
<span class="w">                                             </span><span class="n">BUS_NOTIFY_ADD_DEVICE</span><span class="p">,</span><span class="w"> </span><span class="n">dev</span><span class="p">);</span>
</code></pre></div>

<p>If the device we're initialising is attached to a bus, then we call the bus notifier of that bus with the <code>BUS_NOTIFY_ADD_DEVICE</code> notification, and the device in question.  So we're going through the process of adding a VIO device, and somehow calling into a notifier that's only registered for PCI devices.  I did a bunch of debugging to see if our VIO device was somehow malformed and pointing to a PCI bus, or the <code>struct subsys_private</code> (that's the <code>bus-&gt;p</code> above) was somehow pointing to the wrong place, but everything seemed sane.  My thesis of there being confusion while matching devices to buses was getting harder to justify - everything still looked sane.</p>
<h2>Debuggers</h2>
<p>I do not like debuggers.  I am an avid <code>printk()</code> enthusiast.  There's no real justification for this, a bunch of my problems could almost certainly be solved easier by using actual tools, but my brain seemingly enjoys the routine of printing and building and running until I figure out what's going on.  It was becoming increasingly obvious, however, that <code>printk</code> could not save me here, and we needed to go deeper.</p>
<p>Very thankfully for me, even though this bug was discovered on real hardware, it reproduces easily in <a href="https://www.qemu.org">QEMU</a>, making iteration easy.  With <a href="https://qemu-project.gitlab.io/qemu/system/gdb.html">GDB attached to QEMU</a>, it's time to dive in to the guts of this issue and figure out what's happening.</p>
<p>Somehow, VIO buses are ending up with <code>pci_notify()</code> in their <code>bus_notifier</code> list.  Let's break down the data structures here with a look at <code>struct notifier_block</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">struct</span><span class="w"> </span><span class="nc">notifier_block</span><span class="w"> </span><span class="p">{</span>
<span class="w">        </span><span class="n">notifier_fn_t</span><span class="w"> </span><span class="n">notifier_call</span><span class="p">;</span>
<span class="w">        </span><span class="k">struct</span><span class="w"> </span><span class="nc">notifier_block</span><span class="w"> </span><span class="n">__rcu</span><span class="w"> </span><span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="w">        </span><span class="kt">int</span><span class="w"> </span><span class="n">priority</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div>

<p>So notifier chains are <a href="https://en.wikipedia.org/wiki/Linked_list#Singly_linked_list">singly linked lists</a>.  Callbacks are registered through functions like <code>bus_register_notifier()</code>, then after a long chain of breadcrumbs we reach <a href="https://elixir.bootlin.com/linux/latest/source/kernel/notifier.c#L22"><code>notifier_chain_register()</code></a> which walks the list of <code>-&gt;next</code> pointers until it reaches <code>NULL</code>, at which point it sets <code>-&gt;next</code> of the tail node to the <code>struct notifier_block</code> that was passed in.  It's very important to note here that the data being appended to the list here is <em>not just the callback function</em> (i.e. <code>pci_notify()</code>), but the <code>struct notifier_block</code> itself (i.e. <code>struct notifier_block pci_notifier</code> from earlier).  There's no new data being initialised, just updating a pointer to the object that was passed by the caller.</p>
<p>If you've guessed what our bug is at this point, great job!  If the same <code>struct notifier_block</code> gets registered to two different bus types, then both of their <code>bus_notifier</code> fields will point to the <em>same memory</em>, and any further notifiers registered to either bus will end up being referenced by both since they walk through the same node.</p>
<p>So we bust out the debugger and start looking at what ends up in <code>bus_notifier</code> for PCI and VIO buses with breakpoints and watchpoints.</p>
<h2>Candidates</h2>
<p>Walking the <code>bus_notifier</code> list gave me the following:</p>
<div class="highlight"><pre><span></span><code>__gcov_.perf_trace_module_free
fail_iommu_bus_notify
isa_bridge_notify
ppc_pci_unmap_irq_line
eeh_device_notifier
iommu_bus_notifier
tce_iommu_bus_notifier
pci_notify
</code></pre></div>

<p>Time to find out if our assumption is correct - the same <code>struct notifier_block</code> is being registered to both bus types.  Let's start going through them!</p>
<p>First up, we have <code>__gcov_.perf_trace_module_free</code>.  Thankfully, I recognised this as complete bait.  Trying to figure out what gcov and perf are doing here is going to be its own giant rabbit hole, and unless building without gcov makes our problem disappear, we skip this one and keep on looking.  Rabbit holes in the kernel never end, we have to be strategic with our time!</p>
<p>Next, we reach <code>fail_iommu_bus_notify</code>, so let's take a look at that.</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">notifier_block</span><span class="w"> </span><span class="n">fail_iommu_bus_notifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w">        </span><span class="p">.</span><span class="n">notifier_call</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fail_iommu_bus_notify</span>
<span class="p">};</span>

<span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">__init</span><span class="w"> </span><span class="nf">fail_iommu_setup</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="cp">#ifdef CONFIG_PCI</span>
<span class="w">        </span><span class="n">bus_register_notifier</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pci_bus_type</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">fail_iommu_bus_notifier</span><span class="p">);</span>
<span class="cp">#endif</span>
<span class="cp">#ifdef CONFIG_IBMVIO</span>
<span class="w">        </span><span class="n">bus_register_notifier</span><span class="p">(</span><span class="o">&amp;</span><span class="n">vio_bus_type</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">fail_iommu_bus_notifier</span><span class="p">);</span>
<span class="cp">#endif</span>

<span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>

<p>Sure enough, here's our bug.  The same node is being registered to two different bus types:</p>
<div class="highlight"><pre><span></span><code>+------------------+
| PCI bus_notifier \
+------------------+\
                     \+-------------------------+    +-----------------+    +------------+
                      | fail_iommu_bus_notifier |----| PCI + VIO stuff |----| pci_notify |
                     /+-------------------------+    +-----------------+    +------------+
+------------------+/
| VIO bus_notifier /
+------------------+
</code></pre></div>

<p>when it should be like:</p>
<div class="highlight"><pre><span></span><code>+------------------+    +-----------------------------+    +-----------+    +------------+
| PCI bus_notifier |----| fail_iommu_pci_bus_notifier |----| PCI stuff |----| pci_notify |
+------------------+    +-----------------------------+    +-----------+    +------------+

+------------------+    +-----------------------------+    +-----------+
| VIO bus_notifier |----| fail_iommu_vio_bus_notifier |----| VIO stuff |
+------------------+    +-----------------------------+    +-----------+
</code></pre></div>

<h2>The fix</h2>
<p>Ultimately, the fix turned out to be pretty simple:</p>
<div class="highlight"><pre><span></span><code>Author: Russell Currey &lt;ruscur@russell.cc&gt;
Date:   Wed Mar 22 14:37:42 2023 +1100

<span class="w"> </span>   powerpc/iommu: Fix notifiers being shared by PCI and VIO buses

<span class="w"> </span>   fail_iommu_setup() registers the fail_iommu_bus_notifier struct to both
<span class="w"> </span>   PCI and VIO buses.  struct notifier_block is a linked list node, so this
<span class="w"> </span>   causes any notifiers later registered to either bus type to also be
<span class="w"> </span>   registered to the other since they share the same node.

<span class="w"> </span>   This causes issues in (at least) the vgaarb code, which registers a
<span class="w"> </span>   notifier for PCI buses.  pci_notify() ends up being called on a vio
<span class="w"> </span>   device, converted with to_pci_dev() even though it&#39;s not a PCI device,
<span class="w"> </span>   and finally makes a bad access in vga_arbiter_add_pci_device() as
<span class="w"> </span>   discovered with KASAN:

<span class="w"> </span>   [stack trace redacted, see above]

<span class="w"> </span>   Fix this by creating separate notifier_block structs for each bus type.

<span class="w"> </span>   Fixes: d6b9a81b2a45 (&quot;powerpc: IOMMU fault injection&quot;)
<span class="w"> </span>   Reported-by: Nageswara R Sastry &lt;rnsastry@linux.ibm.com&gt;
<span class="w"> </span>   Signed-off-by: Russell Currey &lt;ruscur@russell.cc&gt;

<span class="gh">diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c</span>
<span class="gh">index ee95937bdaf1..6f1117fe3870 100644</span>
<span class="gd">--- a/arch/powerpc/kernel/iommu.c</span>
<span class="gi">+++ b/arch/powerpc/kernel/iommu.c</span>
<span class="gu">@@ -171,17 +171,26 @@ static int fail_iommu_bus_notify(struct notifier_block *nb,</span>
<span class="w"> </span>        return 0;
<span class="w"> </span>}

<span class="gd">-static struct notifier_block fail_iommu_bus_notifier = {</span>
<span class="gi">+/*</span>
<span class="gi">+ * PCI and VIO buses need separate notifier_block structs, since they&#39;re linked</span>
<span class="gi">+ * list nodes.  Sharing a notifier_block would mean that any notifiers later</span>
<span class="gi">+ * registered for PCI buses would also get called by VIO buses and vice versa.</span>
<span class="gi">+ */</span>
<span class="gi">+static struct notifier_block fail_iommu_pci_bus_notifier = {</span>
<span class="gi">+        .notifier_call = fail_iommu_bus_notify</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="gi">+static struct notifier_block fail_iommu_vio_bus_notifier = {</span>
<span class="w"> </span>        .notifier_call = fail_iommu_bus_notify
<span class="w"> </span>};

<span class="w"> </span>static int __init fail_iommu_setup(void)
<span class="w"> </span>{
<span class="w"> </span>#ifdef CONFIG_PCI
<span class="gd">-        bus_register_notifier(&amp;pci_bus_type, &amp;fail_iommu_bus_notifier);</span>
<span class="gi">+        bus_register_notifier(&amp;pci_bus_type, &amp;fail_iommu_pci_bus_notifier);</span>
<span class="w"> </span>#endif
<span class="w"> </span>#ifdef CONFIG_IBMVIO
<span class="gd">-        bus_register_notifier(&amp;vio_bus_type, &amp;fail_iommu_bus_notifier);</span>
<span class="gi">+        bus_register_notifier(&amp;vio_bus_type, &amp;fail_iommu_vio_bus_notifier);</span>
<span class="w"> </span>#endif

<span class="w"> </span>        return 0;
</code></pre></div>

<p>Easy!  Problem solved.  The <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6b9a81b2a45">commit that introduced this bug back in 2012</a> was written by the legendary <a href="http://antonblanchardfacts.com">Anton Blanchard</a>, so it's always a treat to discover an Anton bug.  Ultimately this bug is of little consequence, but it's always fun to catch dormant issues with powerful tools like KASAN.</p>
<h2>In conclusion</h2>
<p>I think this bug provides a nice window into what kernel debugging can be like.  Thankfully, things are made easier by not dealing with any specific hardware and being easily reproducible in QEMU.</p>
<p>Bugs like this have an absurd amount of underlying complexity, but you rarely need to understand all of it to comprehend the situation and discover the issue.  I spent way too much time digging into device subsystem internals, when the odds of the issue lying within were quite low - the combination of IBM VIO devices and VGA arbitration isn't exactly common, so searching for potential issues within the guts of a heavily utilised subsystem isn't going to yield results very often.</p>
<p>Is there something haunted in the device subsystem?  Is there something haunted inside the notifier handlers?  It's possible, but assuming the core guts of the kernel have a baseline level of sanity helps to let you stay focused on the parts more likely to be relevant.</p>
<p>Finally, the process was made much easier by having good code navigation.  A ludicrous amount of kernel developers still use plain vim or Emacs, maybe with tags if you're lucky, and get by on <code>git grep</code> (not even ripgrep!) and memory.  Sort yourselves out and get yourself an editor with LSP support.  I personally use <a href="https://github.com/doomemacs/doomemacs">Doom Emacs</a> with <a href="https://clangd.llvm.org/">clangd</a>, and with the amount of jumping around the kernel I had to do to solve this bug, it would've been a much bigger ordeal without that power.</p>
<p>If you enjoyed the read, why not follow me on <a href="https://ozlabs.house/@ruscur">Mastodon</a> or checkout <a href="https://sthbrx.github.io/blog/2023/03/24/dumb-bugs-when-a-date-breaks-booting-the-kernel/">Ben's recount of another cursed bug!</a>  Thanks for stopping by.</p></div>
  		</article>
  		<article>
<header>
      <h1 class="entry-title">
        <a href="https://sthbrx.github.io/blog/2023/03/24/dumb-bugs-when-a-date-breaks-booting-the-kernel/">Dumb bugs: When a date breaks booting the kernel</a>
      </h1>
    <p class="meta">
<time datetime="2023-03-24T00:00:00+11:00" pubdate>Fri 24 March 2023</time>    </p>
</header>

<div class="byline_index">
  <span class="byline author vcard">
    Posted by <span class="fn">
          <a href="https://sthbrx.github.io/author/benjamin-gray.html">Benjamin Gray</a>
    </span>
  </span>
<time datetime="2023-03-24T00:00:00+11:00" pubdate>Fri 24 March 2023</time></div>

  <div class="entry-content"><h2>The setup</h2>
<p>I've recently been working on internal CI infrastructure for testing kernels before sending them to the mailing list. As part of this effort, I became interested in <a href="https://reproducible-builds.org/">reproducible builds</a>. Minimising the changing parts outside of the source tree itself could improve consistency and ccache hits, which is great for trying to make the CI faster and more reproducible across different machines. This means removing 'external' factors like timestamps from the build process, because the time changes every build and means the results between builds of the same tree are no longer identical binaries. This also prevents using previously cached results, potentially slowing down builds (though it turns out the kernel does a good job of limiting the scope of where timestamps appear in the build).</p>
<p>As part of this effort, I came across the <code>KBUILD_BUILD_TIMESTAMP</code> environment variable. This variable is used to set the kernel timestamp, which is primarily for any users who want to know when their kernel was built. That's mostly irrelevant for our work, so an easy <code>KBUILD_BUILD_TIMESTAMP=0</code> later and... it still uses the current date.</p>
<p>Ok, checking <a href="https://docs.kernel.org/kbuild/kbuild.html#kbuild-build-timestamp">the documentation</a> it says</p>
<blockquote>
<p>Setting this to a date string overrides the timestamp used in the UTS_VERSION definition (uname -v in the running kernel). The value has to be a string that can be passed to date -d. The default value is the output of the date command at one point during build.</p>
</blockquote>
<p>So it looks like the timestamp variable is actually expected to be a date format. To make it obvious that it's not a 'real' date, let's set <code>KBUILD_BUILD_TIMESTAMP=0000-01-01</code>. A bunch of zeroes (and the ones to make it a valid month and day) should tip off anyone to the fact it's invalid.</p>
<p>As an aside, this is a different date to what I tried to set it to earlier; a 'timestamp' typically refers to the number of seconds since the UNIX epoch (1970), so my first attempt would have corresponded to 1970-01-01. But given we're passing a date, not a timestamp, there should be no problem setting it back to the year 0. And I like the aesthetics of 0000 over 1970.</p>
<p>Building and booting the kernel, we see <code>#1 SMP 0000-01-01</code> printed as the build timestamp. Success! After confirming everything works, I set the environment variable in the CI jobs and call it a day.</p>
<h2>An unexpected error</h2>
<p>A few days later I need to run the CI to test my patches, and something strange happens. It builds fine, but the boot tests that load a root disk image fail inexplicably: there is a kernel panic saying "VFS: Unable to mount root fs on unknown-block(253,2)".</p>
<div class="highlight"><pre><span></span><code>[    0.909648][    T1] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(253,2)
[    0.909797][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc2-g065ffaee7389 #8
[    0.909880][    T1] Hardware name: IBM pSeries (emulated by qemu) POWER8 (raw) 0x4d0200 0xf000004 of:SLOF,HEAD pSeries
[    0.910044][    T1] Call Trace:
[    0.910107][    T1] [c000000003643b00] [c000000000fb6f9c] dump_stack_lvl+0x70/0xa0 (unreliable)
[    0.910378][    T1] [c000000003643b30] [c000000000144e34] panic+0x178/0x424
[    0.910423][    T1] [c000000003643bd0] [c000000002005144] mount_block_root+0x1d0/0x2bc
[    0.910457][    T1] [c000000003643ca0] [c000000002005720] prepare_namespace+0x1d4/0x22c
[    0.910487][    T1] [c000000003643d20] [c000000002004b04] kernel_init_freeable+0x36c/0x3bc
[    0.910517][    T1] [c000000003643df0] [c000000000013830] kernel_init+0x30/0x1a0
[    0.910549][    T1] [c000000003643e50] [c00000000000df94] ret_from_kernel_thread+0x5c/0x64
[    0.910587][    T1] --- interrupt: 0 at 0x0
[    0.910794][    T1] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
[    0.910828][    T1] REGS: c000000003643e80 TRAP: 0000   Not tainted  (6.3.0-rc2-g065ffaee7389)
[    0.910883][    T1] MSR:  0000000000000000 &lt;&gt;  CR: 00000000  XER: 00000000
[    0.910990][    T1] CFAR: 0000000000000000 IRQMASK: 0
[    0.910990][    T1] GPR00: 0000000000000000 c000000003644000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR12: 0000000000000000 0000000000000000 c000000000013808 0000000000000000
[    0.910990][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.910990][    T1] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.911371][    T1] NIP [0000000000000000] 0x0
[    0.911397][    T1] LR [0000000000000000] 0x0
[    0.911427][    T1] --- interrupt: 0
qemu-system-ppc64: OS terminated: OS panic: VFS: Unable to mount root fs on unknown-block(253,2)
</code></pre></div>

<p>Above the panic was some more context, saying</p>
<div class="highlight"><pre><span></span><code>[    0.906194][    T1] Warning: unable to open an initial console.
...
[    0.908321][    T1] VFS: Cannot open root device &quot;vda2&quot; or unknown-block(253,2): error -2
[    0.908356][    T1] Please append a correct &quot;root=&quot; boot option; here are the available partitions:
[    0.908528][    T1] 0100           65536 ram0
[    0.908657][    T1]  (driver?)
[    0.908735][    T1] 0101           65536 ram1
[    0.908744][    T1]  (driver?)
...
[    0.909216][    T1] 010f           65536 ram15
[    0.909226][    T1]  (driver?)
[    0.909265][    T1] fd00         5242880 vda
[    0.909282][    T1]  driver: virtio_blk
[    0.909335][    T1]   fd01            4096 vda1 d1f35394-01
[    0.909364][    T1]
[    0.909401][    T1]   fd02         5237760 vda2 d1f35394-02
[    0.909408][    T1]
[    0.909441][    T1] fd10             366 vdb
[    0.909446][    T1]  driver: virtio_blk
[    0.909479][    T1] 0b00         1048575 sr0
[    0.909486][    T1]  driver: sr
</code></pre></div>

<p>This is even more baffling: if it's unable to open a console, then what am I reading these messages on? And error <code>-2</code>, or ENOENT, on opening 'vda2' implies that no such file or directory exists. But it then lists vda2 as a present drive with a known driver? So is vda2 missing or not?</p>
<h2>Living in denial</h2>
<p>As you've read the title of this article, you can probably guess as to what changed to cause this error. But at the time I had no idea what could have been the cause. I'd already confirmed that a kernel with a set timestamp can boot to userspace, and there was another (seemingly) far more likely candidate for the failure: as part of the CI design, patches are extracted from the submitted branch and rebased onto the maintainer's tree. This is great from a convenience perspective, because you don't need to worry about forgetting to rebase your patches before testing and submission. But if the maintainer has synced their branch with Linus' tree it means there could be a lot of things changed in the source tree between runs, even if they were only a few days apart.</p>
<p>So, when you're faced with a working test on one commit and a broken test on another commit, it's time to break out the <code>git bisect</code>. Downloading the kernel images from the relevant CI jobs, I confirmed that indeed one was working while the other was broken. So I bisected the relevant commits, and... everything kept working. Each step I would build and boot the kernel, and each step would reach userspace just fine. I was getting suspicious at this point, so skipped ahead to the known bad commit and built and tested it locally. It <em>also worked</em>.</p>
<p>This was highly confusing, because it meant there was something fishy going on. Some kind of state outside of the kernel tree. Could it be... surely not...</p>
<p>Comparing the boot logs of the two CI kernels, I see that the working one indeed uses an actual timestamp, and the broken one uses the 0000-01-01 fixed date. Oh no. Setting the timestamp with a local build, I can now reproduce the boot panic with a kernel I built myself.</p>
<h2>But... why?</h2>
<p>OK, so it's obvious at this point that the timestamp is affecting loading a root disk somehow. But why? The obvious answer is that it's before the UNIX epoch. Something in the build process is turning the date into an actual timestamp, and going wrong when that timestamp gets used for something.</p>
<p>But it's not like there was a build error complaining about it. As best I could tell, the kernel doesn't try to parse the date anywhere, besides passing it to <code>date</code> during the build. And if <code>date</code> had an issue with it, it would have broken the <em>build</em>. Not <em>booting</em> the kernel. There's no <code>date</code> utility being invoked during kernel boot!</p>
<p>Regardless, I set about tracing the usage of <code>KBUILD_BUILD_TIMESTAMP</code> inside the kernel. The stacktrace in the panic gave the end point of the search; the function <code>mount_block_root()</code> wasn't happy. So all I had to do was work out at which point <code>mount_block_root()</code> tried to access the <code>KBUILD_BUILD_TIMESTAMP</code> value.</p>
<p>In short, that went nowhere.</p>
<p><code>mount_block_root()</code> effectively just tries to open a file in the filesystem. There's massive amounts of code handling this, and any part could have had the undocumented dependency on <code>KBUILD_BUILD_TIMESTAMP</code>. Approaching from the other direction, <code>KBUILD_BUILD_TIMESTAMP</code> is turned into <code>build-timestamp</code> inside a Makefile, which is in turn related to a file <code>include/generated/utsversion.h</code>. This file <code>#define</code>s <code>UTS_VERSION</code> equal to the <code>KBUILD_BUILD_TIMESTAMP</code> value. Searching the kernel for <code>UTS_VERSION</code>, we hit <code>init/version-timestamp.c</code> which stores it in a struct with other build information:</p>
<div class="highlight"><pre><span></span><code><span class="k">struct</span><span class="w"> </span><span class="nc">uts_namespace</span><span class="w"> </span><span class="n">init_uts_ns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w">    </span><span class="p">.</span><span class="n">ns</span><span class="p">.</span><span class="n">count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">REFCOUNT_INIT</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span>
<span class="w">    </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w">        </span><span class="p">.</span><span class="n">sysname</span><span class="w">    </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_SYSNAME</span><span class="p">,</span>
<span class="w">        </span><span class="p">.</span><span class="n">nodename</span><span class="w">   </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_NODENAME</span><span class="p">,</span>
<span class="w">        </span><span class="p">.</span><span class="n">release</span><span class="w">    </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_RELEASE</span><span class="p">,</span>
<span class="w">        </span><span class="p">.</span><span class="n">version</span><span class="w">    </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_VERSION</span><span class="p">,</span>
<span class="w">        </span><span class="p">.</span><span class="n">machine</span><span class="w">    </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_MACHINE</span><span class="p">,</span>
<span class="w">        </span><span class="p">.</span><span class="n">domainname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UTS_DOMAINNAME</span><span class="p">,</span>
<span class="w">    </span><span class="p">},</span>
<span class="w">    </span><span class="p">.</span><span class="n">user_ns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">init_user_ns</span><span class="p">,</span>
<span class="w">    </span><span class="p">.</span><span class="n">ns</span><span class="p">.</span><span class="n">inum</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PROC_UTS_INIT_INO</span><span class="p">,</span>
<span class="cp">#ifdef CONFIG_UTS_NS</span>
<span class="w">    </span><span class="p">.</span><span class="n">ns</span><span class="p">.</span><span class="n">ops</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">utsns_operations</span><span class="p">,</span>
<span class="cp">#endif</span>
<span class="p">};</span>
</code></pre></div>

<p>This is where the trail goes cold: I don't know if you've ever tried this, but searching for <code>.version</code> in the kernel's codebase is not a very fruitful endeavor when you're interested in a specific kind of version.</p>
<div class="highlight"><pre><span></span><code>$ rg &quot;(\.|\-&gt;)version\b&quot; | wc -l
5718
</code></pre></div>

<p>I tried tracing the usage of <code>init_uts_ns</code>, but didn't get very far.</p>
<p>By now I'd already posted this in chat and another developer, <a href="https://shenki.github.io/">Joel Stanley</a>, was also investigating this bizarre bug. They had been testing different timestamp values and made the horrifying discovery that the bug sticks around after a rebuild. So you could start with a broken build, set the timestamp back to the correct value, rebuild, and the resulting kernel would <em>still be broken</em>. The boot log would report the correct time, but the root disk mounter panicked all the same.</p>
<h2>Getting sidetracked</h2>
<p>I wasn't prepared to investigate the boot panic directly until the persistence bug was fixed. Having to run <code>make clean</code> and rebuild everything would take an annoyingly long time, even with ccache. Fortunately, I had a plan. All I had to do was work out which generated files are different between a broken and working build, and binary search by deleting half of them until deleting only one made the difference between the bug persisting or not. We can use <code>diff</code> for this. Running the initial diff we get</p>
<div class="highlight"><pre><span></span><code>$ diff -q --exclude System.map --exclude .tmp_vmlinux* --exclude tools broken/ working/
Common subdirectories: broken/arch and working/arch
Common subdirectories: broken/block and working/block
Files broken/built-in.a and working/built-in.a differ
Common subdirectories: broken/certs and working/certs
Common subdirectories: broken/crypto and working/crypto
Common subdirectories: broken/drivers and working/drivers
Common subdirectories: broken/fs and working/fs
Common subdirectories: broken/include and working/include
Common subdirectories: broken/init and working/init
Common subdirectories: broken/io_uring and working/io_uring
Common subdirectories: broken/ipc and working/ipc
Common subdirectories: broken/kernel and working/kernel
Common subdirectories: broken/lib and working/lib
Common subdirectories: broken/mm and working/mm
Common subdirectories: broken/net and working/net
Common subdirectories: broken/scripts and working/scripts
Common subdirectories: broken/security and working/security
Common subdirectories: broken/sound and working/sound
Common subdirectories: broken/usr and working/usr
Files broken/.version and working/.version differ
Common subdirectories: broken/virt and working/virt
Files broken/vmlinux and working/vmlinux differ
Files broken/vmlinux.a and working/vmlinux.a differ
Files broken/vmlinux.o and working/vmlinux.o differ
Files broken/vmlinux.strip.gz and working/vmlinux.strip.gz differ
</code></pre></div>

<p>Hmm, OK so only some top level files are different. Deleting all the different files doesn't fix the persistence bug though, and I know that a proper <code>make clean</code> does fix it, so what could possibly be the difference when all the remaining files are identical?</p>
<p>Oh wait. <code>man diff</code> reports that <code>diff</code> only compares the top level folder entries by default. So it was literally just telling me "yes, both the broken and working builds have a folder named X". How GNU of it. Re-running the diff command with actually useful options, we get a more promising story</p>
<div class="highlight"><pre><span></span><code>$ diff -qr --exclude System.map --exclude .tmp_vmlinux* --exclude tools build/broken/ build/working/
Files build/broken/arch/powerpc/boot/zImage and build/working/arch/powerpc/boot/zImage differ
Files build/broken/arch/powerpc/boot/zImage.epapr and build/working/arch/powerpc/boot/zImage.epapr differ
Files build/broken/arch/powerpc/boot/zImage.pseries and build/working/arch/powerpc/boot/zImage.pseries differ
Files build/broken/built-in.a and build/working/built-in.a differ
Files build/broken/include/generated/utsversion.h and build/working/include/generated/utsversion.h differ
Files build/broken/init/built-in.a and build/working/init/built-in.a differ
Files build/broken/init/utsversion-tmp.h and build/working/init/utsversion-tmp.h differ
Files build/broken/init/version.o and build/working/init/version.o differ
Files build/broken/init/version-timestamp.o and build/working/init/version-timestamp.o differ
Files build/broken/usr/built-in.a and build/working/usr/built-in.a differ
Files build/broken/usr/initramfs_data.cpio and build/working/usr/initramfs_data.cpio differ
Files build/broken/usr/initramfs_data.o and build/working/usr/initramfs_data.o differ
Files build/broken/usr/initramfs_inc_data and build/working/usr/initramfs_inc_data differ
Files build/broken/.version and build/working/.version differ
Files build/broken/vmlinux and build/working/vmlinux differ
Files build/broken/vmlinux.a and build/working/vmlinux.a differ
Files build/broken/vmlinux.o and build/working/vmlinux.o differ
Files build/broken/vmlinux.strip.gz and build/working/vmlinux.strip.gz differ
</code></pre></div>

<p>There are some new entries here: notably <code>init/version*</code> and <code>usr/initramfs*</code>. Binary searching these files results in a single culprit: <code>usr/initramfs_data.cpio</code>. This is quite fitting, as the <code>.cpio</code> file is an archive defining a filesystem layout, <a href="https://docs.kernel.org/filesystems/ramfs-rootfs-initramfs.html?highlight=initramfs#why-cpio-rather-than-tar">much like <code>.tar</code> files</a>. This file is actually embedded into the kernel image, and loaded as a bare-bones shim filesystem when the user doesn't provide their own initramfs<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.</p>
<p>So it would make sense that if the CPIO archive wasn't being rebuilt, then the initial filesystem wouldn't change. And it would make sense for the initial filesystem to be causing mount issues of the proper root disk filesystem.</p>
<p>This just leaves the question of how <code>KBUILD_BUILD_TIMESTAMP</code> is breaking the CPIO archive. And it's around this time that a third developer, <a href="https://twitter.com/ajdlinux">Andrew</a>, who I'd roped into this bug hunt for having the (mis)fortune to sit next to me, pointed out that the generator script for this CPIO archive was passing the <code>KBUILD_BUILD_TIMESTAMP</code> to <code>date</code>. Whoop, we've found the murder weapon<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>!</p>
<p>The persistence bug could be explained now: because the script was only using <code>KBUILD_BUILD_TIMESTAMP</code> internally, <code>make</code> had no way of knowing that the archive generation depended on this variable. So even when I changed the variable to a valid value, <code>make</code> didn't know to rebuild the corrupt archive. Let's now get back to the main issue: why boot panics.</p>
<h2>Solving the case</h2>
<p>Following along the CPIO generation script, the <code>KBUILD_BUILD_TIMESTAMP</code> variable is turned into a timestamp by <code>date -d"$KBUILD_BUILD_TIMESTAMP" +%s</code>. Testing this in the shell with <code>0000-01-01</code> we get this (somewhat amusing, but also painful) result</p>
<div class="highlight"><pre><span></span><code>date -d&quot;$KBUILD_BUILD_TIMESTAMP&quot; +%s
-62167255492
</code></pre></div>

<p>This timestamp is then passed to a C program that assigns it to a variable <code>default_mtime</code>. Looking over the source, it seems this variable is used to set the <code>mtime</code> field on the files in the CPIO archive. The timestamp is stored as a <code>time_t</code>, which is an alias for <code>int64_t</code>. That's 64 bits of data, up to 16 hexadecimal characters. And yes, that's relevant: CPIO stores the <code>mtime</code> (and all other numerical fields) as 32 bit unsigned integers represented by ASCII hexadecimal characters. The <code>sprintf()</code> call that ultimately embeds the timestamp uses the <code>%08lX</code> format specifier. This formats a <code>long</code> as hexadecimal, padded to at least 8 characters. Hang on... <strong><em>at least</em></strong> 8 characters? What if our timestamp happens to be more?</p>
<p>It turns out that large timestamps are already guarded against. The program will error during build if the date is later than 2106-02-07 (maximum unsigned 8 hex digit timestamp).</p>
<div class="highlight"><pre><span></span><code><span class="cm">/*</span>
<span class="cm"> * Timestamps after 2106-02-07 06:28:15 UTC have an ascii hex time_t</span>
<span class="cm"> * representation that exceeds 8 chars and breaks the cpio header</span>
<span class="cm"> * specification.</span>
<span class="cm"> */</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">default_mtime</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mh">0xffffffff</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w">    </span><span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;ERROR: Timestamp too large for cpio format</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
<span class="w">    </span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>

<p>But we are using an <code>int64_t</code>. What would happen if one were to provide a negative timestamp?</p>
<p>Well, <code>sprintf()</code> happily spits out <code>FFFFFFF1868AF63C</code> when we pass in our negative timestamp representing <code>0000-01-01</code>. That's 16 characters, 8 too many for the CPIO header<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.</p>
<p>So at last we've found the cause of the panic: the timestamp is being formatted too long, which breaks the CPIO header and the kernel doesn't create an initial filesystem correctly. This includes the <code>/dev</code> folder (which surprisingly is not hardcoded into kernel, but must be declared by the initramfs). So when the root disk mounter tries to open <code>/dev/vda2</code>, it correctly complains that it failed to create a device in the non-existent <code>/dev</code>.</p>
<h2>Postmortem</h2>
<p>After discovering all this, I sent in a couple of patches to fix <a href="https://lore.kernel.org/all/20230320040839.660475-1-bgray@linux.ibm.com/">the CPIO generation</a> and <a href="https://lore.kernel.org/all/20230320040839.660475-2-bgray@linux.ibm.com/">rebuild logic</a>. They were not complicated fixes, but wow were they time consuming to track down. I didn't see the error initially because I typically only boot with my own initramfs over the embedded one, and not with the intent to load a root disk. Then the panic itself was quite far away from the real issue, and there were many dead ends to explore.</p>
<p>I also got curious as to why the kernel didn't complain about a corrupt initramfs earlier. A brief investigation showed a streaming parser that is <em>extremely</em> fault tolerant, silently skipping invalid entries (like ones missing or having too long a name). The corrupted header was being interpreted as an entry with an empty name and 2 gigabyte body contents, which meant that (1) the kernel skipped inserting it due to the empty name, and (2) the kernel skipped the rest of the initramfs because it thought that up to 2 GB of the remaining content was part of that first entry.</p>
<p>Perhaps this could be improved to require that all input is consumed without unexpected EOF, such as how the userspace <code>cpio</code> tool works (which, by the way, recognises the corrupt archive as such and refuses to decompress it). The parsing logic is mostly from the before-times though (i.e., pre initial git commit), so it's difficult to distinguish intentional leniency and bugs.</p>
<h2>Afterword</h2>
<p>Incidentally, in investigating this I came across another bug. There is a helper function <code>panic_show_mem()</code> in the initramfs that's meant to dump memory information and then call <code>panic()</code>. It takes in standard <code>printf()</code> style format string and arguments, and tries to forward them to <code>panic()</code> which ultimately prints them.</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">panic_show_mem</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span>
<span class="p">{</span>
<span class="w">    </span><span class="kt">va_list</span><span class="w"> </span><span class="n">args</span><span class="p">;</span>

<span class="w">    </span><span class="n">show_mem</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="w">    </span><span class="n">va_start</span><span class="p">(</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">);</span>
<span class="w">    </span><span class="n">panic</span><span class="p">(</span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">);</span>
<span class="w">    </span><span class="n">va_end</span><span class="p">(</span><span class="n">args</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span><span class="w"> </span><span class="nf">panic</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="p">...);</span>
</code></pre></div>

<p>But variadic arguments don't quite work this way: instead of forwarding the list <code>args</code> as intended, <code>panic()</code> will instead interpret <code>args</code> as a single argument for the format string <code>fmt</code>. Standard library functions address this by providing <code>v*</code> variants of <code>printf()</code> and friends. For example,</p>
<div class="highlight"><pre><span></span><code><span class="kt">int</span><span class="w"> </span><span class="nf">printf</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="p">...);</span>

<span class="kt">int</span><span class="w"> </span><span class="nf">vprintf</span><span class="p">(</span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="kt">va_list</span><span class="w"> </span><span class="n">args</span><span class="p">);</span>
</code></pre></div>

<p>We might create a <code>vpanic()</code> function in the kernel that follows this style, but it seems easier to just make <code>panic_show_mem()</code> a macro and 'forward' the arguments in the source code</p>
<div class="highlight"><pre><span></span><code><span class="cp">#define panic_show_mem(fmt, ...) \</span>
<span class="cp">    ({ show_mem(0, NULL); panic(fmt, ##__VA_ARGS__); })</span>
</code></pre></div>

<p><a href="https://lore.kernel.org/all/20230320230534.50174-1-bgray@linux.ibm.com/">Patch sent</a>.</p>
<p>And that's where I've left things. Big thanks to Joel and Andrew for helping me with this bug. It was certainly a trip.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>initramfs, or initrd for the older format, are specific kinds of CPIO archives. The initramfs is intended to be loaded as the initial filesystem of a booted kernel, typically in preparation for loading your normal root filesystem. It might contain modules necessary to mount the disk for example.&#160;<a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">&#8617;</a></p>
</li>
<li id="fn:2">
<p>Hindsight again would suggest it was obvious to look here because it shows up when searching for <code>KBUILD_BUILD_TIMESTAMP</code>. I unfortunately wasn't familiar with the <code>usr/</code> source folder initially, and focused on the core kernel components too much earlier. Oh well, we found it eventually.&#160;<a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">&#8617;</a></p>
</li>
<li id="fn:3">
<p>I almost missed this initially. Thanks to the ASCII header format, <code>strings</code> was able to print the headers without any CPIO specific tooling. I did a double take when I noticed the headers for the broken CPIO were a little longer than the headers in the working one.&#160;<a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">&#8617;</a></p>
</li>
</ol>
</div></div>
  		</article>
<div class="pagination">
    <a class="prev" href="https://sthbrx.github.io/index2.html">&larr; Older</a>

  <br />
</div></div>
<aside class="sidebar">
  <section>
    <h1>Recent Posts</h1>
    <ul id="recent_posts">
      <li class="post">
          <a href="https://sthbrx.github.io/blog/2023/08/07/going-out-on-a-limb-efficient-elliptic-curve-arithmetic-in-openssl/">Going out on a Limb: Efficient Elliptic Curve Arithmetic in OpenSSL</a>
      </li>
      <li class="post">
          <a href="https://sthbrx.github.io/blog/2023/08/04/quirks-of-parsing-ssh-configs/">Quirks of parsing SSH configs</a>
      </li>
      <li class="post">
          <a href="https://sthbrx.github.io/blog/2023/04/05/detecting-rootless-docker/">Detecting rootless Docker</a>
      </li>
      <li class="post">
          <a href="https://sthbrx.github.io/blog/2023/04/04/dumb-bugs-the-pci-device-that-wasnt/">Dumb bugs: the PCI device that wasn't</a>
      </li>
      <li class="post">
          <a href="https://sthbrx.github.io/blog/2023/03/24/dumb-bugs-when-a-date-breaks-booting-the-kernel/">Dumb bugs: When a date breaks booting the kernel</a>
      </li>
    </ul>
  </section>
  <section>
      
    <h1>Categories</h1>
    <ul id="recent_posts">
        <li><a href="https://sthbrx.github.io/category/cryptography.html">Cryptography</a></li>
        <li><a href="https://sthbrx.github.io/category/development.html">Development</a></li>
        <li><a href="https://sthbrx.github.io/category/education.html">Education</a></li>
        <li><a href="https://sthbrx.github.io/category/openpower.html">OpenPOWER</a></li>
        <li><a href="https://sthbrx.github.io/category/performance.html">Performance</a></li>
        <li><a href="https://sthbrx.github.io/category/petitboot.html">Petitboot</a></li>
        <li><a href="https://sthbrx.github.io/category/snowpatch.html">snowpatch</a></li>
        <li><a href="https://sthbrx.github.io/category/virtualisation-and-emulation.html">Virtualisation and Emulation</a></li>
    </ul>
  </section>
 

  <section>
  <h1>Tags</h1>
    <a href="https://sthbrx.github.io/tag/ssh.html">ssh</a>,    <a href="https://sthbrx.github.io/tag/docker.html">Docker</a>,    <a href="https://sthbrx.github.io/tag/syzkaller.html">syzkaller</a>,    <a href="https://sthbrx.github.io/tag/linux.html">linux</a>,    <a href="https://sthbrx.github.io/tag/power8.html">power8</a>,    <a href="https://sthbrx.github.io/tag/distro.html">distro</a>,    <a href="https://sthbrx.github.io/tag/kernel.html">kernel</a>,    <a href="https://sthbrx.github.io/tag/hardening.html">hardening</a>,    <a href="https://sthbrx.github.io/tag/testing.html">testing</a>,    <a href="https://sthbrx.github.io/tag/conferences.html">conferences</a>,    <a href="https://sthbrx.github.io/tag/instruction-set-architecture.html">Instruction Set Architecture</a>,    <a href="https://sthbrx.github.io/tag/openpower.html">openpower</a>,    <a href="https://sthbrx.github.io/tag/firmware.html">firmware</a>,    <a href="https://sthbrx.github.io/tag/goodposts.html">goodposts</a>,    <a href="https://sthbrx.github.io/tag/realcontent.html">realcontent</a>,    <a href="https://sthbrx.github.io/tag/madposting.html">madposting</a>,    <a href="https://sthbrx.github.io/tag/op-test.html">op-test</a>,    <a href="https://sthbrx.github.io/tag/qemu.html">qemu</a>,    <a href="https://sthbrx.github.io/tag/pci.html">pci</a>,    <a href="https://sthbrx.github.io/tag/sparseposting.html">sparseposting</a>,    <a href="https://sthbrx.github.io/tag/petitboot.html">petitboot</a>,    <a href="https://sthbrx.github.io/tag/security.html">security</a>,    <a href="https://sthbrx.github.io/tag/vscode.html">vscode</a>,    <a href="https://sthbrx.github.io/tag/code.html">code</a>,    <a href="https://sthbrx.github.io/tag/openbmc.html">openbmc</a>,    <a href="https://sthbrx.github.io/tag/ipmi.html">ipmi</a>,    <a href="https://sthbrx.github.io/tag/opencapi.html">opencapi</a>,    <a href="https://sthbrx.github.io/tag/openpower-summit.html">openpower summit</a>,    <a href="https://sthbrx.github.io/tag/easyposts.html">easyposts</a>,    <a href="https://sthbrx.github.io/tag/linuxboot.html">linuxboot</a>,    <a href="https://sthbrx.github.io/tag/google.html">google</a>,    <a href="https://sthbrx.github.io/tag/intel.html">intel</a>,    <a href="https://sthbrx.github.io/tag/osfc.html">osfc</a>,    <a href="https://sthbrx.github.io/tag/shortposts.html">shortposts</a>,    <a href="https://sthbrx.github.io/tag/facebook.html">facebook</a>,    <a href="https://sthbrx.github.io/tag/performance.html">performance</a>,    <a href="https://sthbrx.github.io/tag/phoronix.html">phoronix</a>,    <a href="https://sthbrx.github.io/tag/benchmarks.html">benchmarks</a>,    <a href="https://sthbrx.github.io/tag/stupid-ideas.html">stupid ideas</a>,    <a href="https://sthbrx.github.io/tag/network.html">network</a>,    <a href="https://sthbrx.github.io/tag/power.html">power</a>,    <a href="https://sthbrx.github.io/tag/xdp.html">xdp</a>,    <a href="https://sthbrx.github.io/tag/networking.html">networking</a>,    <a href="https://sthbrx.github.io/tag/remoteposts.html">remoteposts</a>,    <a href="https://sthbrx.github.io/tag/ceph.html">ceph</a>,    <a href="https://sthbrx.github.io/tag/raid.html">raid</a>,    <a href="https://sthbrx.github.io/tag/storage.html">storage</a>,    <a href="https://sthbrx.github.io/tag/erasure.html">erasure</a>,    <a href="https://sthbrx.github.io/tag/lustre.html">lustre</a>,    <a href="https://sthbrx.github.io/tag/hpc.html">hpc</a>,    <a href="https://sthbrx.github.io/tag/nvlink.html">nvlink</a>,    <a href="https://sthbrx.github.io/tag/namd.html">namd</a>,    <a href="https://sthbrx.github.io/tag/cuda.html">cuda</a>,    <a href="https://sthbrx.github.io/tag/gpu.html">gpu</a>,    <a href="https://sthbrx.github.io/tag/minsky.html">minsky</a>,    <a href="https://sthbrx.github.io/tag/s822lc-for-hpc.html">S822LC for hpc</a>,    <a href="https://sthbrx.github.io/tag/debug.html">debug</a>,    <a href="https://sthbrx.github.io/tag/virtualisation.html">virtualisation</a>,    <a href="https://sthbrx.github.io/tag/dmesg.html">dmesg</a>,    <a href="https://sthbrx.github.io/tag/printk.html">printk</a>,    <a href="https://sthbrx.github.io/tag/boot.html">boot</a>,    <a href="https://sthbrx.github.io/tag/early.html">early</a>,    <a href="https://sthbrx.github.io/tag/error.html">error</a>,    <a href="https://sthbrx.github.io/tag/centos.html">centos</a>,    <a href="https://sthbrx.github.io/tag/centos7.html">centos7</a>,    <a href="https://sthbrx.github.io/tag/p8.html">p8</a>,    <a href="https://sthbrx.github.io/tag/bmc.html">bmc</a>,    <a href="https://sthbrx.github.io/tag/rhel.html">RHEL</a>,    <a href="https://sthbrx.github.io/tag/skiroot.html">skiroot</a>,    <a href="https://sthbrx.github.io/tag/devmapper.html">devmapper</a>,    <a href="https://sthbrx.github.io/tag/lvm.html">lvm</a>,    <a href="https://sthbrx.github.io/tag/cgroups.html">cgroups</a>,    <a href="https://sthbrx.github.io/tag/numa.html">numa</a>,    <a href="https://sthbrx.github.io/tag/development.html">Development</a>,    <a href="https://sthbrx.github.io/tag/netboot.html">netboot</a>,    <a href="https://sthbrx.github.io/tag/pxe.html">pxe</a>,    <a href="https://sthbrx.github.io/tag/education.html">Education</a>,    <a href="https://sthbrx.github.io/tag/work-experience.html">work experience</a>,    <a href="https://sthbrx.github.io/tag/asm.html">asm</a>,    <a href="https://sthbrx.github.io/tag/vdso.html">vdso</a>,    <a href="https://sthbrx.github.io/tag/snowpatch.html">snowpatch</a>,    <a href="https://sthbrx.github.io/tag/tools.html">tools</a>,    <a href="https://sthbrx.github.io/tag/intern.html">intern</a>,    <a href="https://sthbrx.github.io/tag/srop.html">SROP</a>,    <a href="https://sthbrx.github.io/tag/mitigation.html">mitigation</a>,    <a href="https://sthbrx.github.io/tag/double.html">double</a>,    <a href="https://sthbrx.github.io/tag/float.html">float</a>,    <a href="https://sthbrx.github.io/tag/hex.html">hex</a>,    <a href="https://sthbrx.github.io/tag/debugging.html">debugging</a>,    <a href="https://sthbrx.github.io/tag/skiboot.html">skiboot</a>,    <a href="https://sthbrx.github.io/tag/opal.html">OPAL</a>,    <a href="https://sthbrx.github.io/tag/fsp.html">FSP</a>,    <a href="https://sthbrx.github.io/tag/patches.html">patches</a>,    <a href="https://sthbrx.github.io/tag/based16.html">based16</a>,    <a href="https://sthbrx.github.io/tag/linux-gods.html">Linux Gods</a>,    <a href="https://sthbrx.github.io/tag/ozlabs.html">Ozlabs</a>,    <a href="https://sthbrx.github.io/tag/offtopic.html">offtopic</a>,    <a href="https://sthbrx.github.io/tag/autoboot.html">autoboot</a>,    <a href="https://sthbrx.github.io/tag/kexec.html">kexec</a>,    <a href="https://sthbrx.github.io/tag/aufs.html">aufs</a>,    <a href="https://sthbrx.github.io/tag/overlay.html">overlay</a>,    <a href="https://sthbrx.github.io/tag/php.html">php</a>,    <a href="https://sthbrx.github.io/tag/capi.html">capi</a>  </section>

  <section>
    <h1><a href="https://sthbrx.github.io/authors.html">Authors</a></h1>
    <ul id="authors_list">
        <li><a href="https://sthbrx.github.io/author/alastair-dsilva.html">Alastair D'Silva</a></li>
        <li><a href="https://sthbrx.github.io/author/andrew-donnellan.html">Andrew Donnellan</a></li>
        <li><a href="https://sthbrx.github.io/author/anton-blanchard.html">Anton Blanchard</a></li>
        <li><a href="https://sthbrx.github.io/author/benjamin-gray.html">Benjamin Gray</a></li>
        <li><a href="https://sthbrx.github.io/author/callum-scarvell.html">Callum Scarvell</a></li>
        <li><a href="https://sthbrx.github.io/author/cyril-bur.html">Cyril Bur</a></li>
        <li><a href="https://sthbrx.github.io/author/daniel-axtens.html">Daniel Axtens</a></li>
        <li><a href="https://sthbrx.github.io/author/daniel-black.html">Daniel Black</a></li>
        <li><a href="https://sthbrx.github.io/author/joel-stanley.html">Joel Stanley</a></li>
        <li><a href="https://sthbrx.github.io/author/nick-piggin.html">Nick Piggin</a></li>
        <li><a href="https://sthbrx.github.io/author/rashmica-gupta.html">Rashmica Gupta</a></li>
        <li><a href="https://sthbrx.github.io/author/rohan-mclure.html">Rohan McLure</a></li>
        <li><a href="https://sthbrx.github.io/author/russell-currey.html">Russell Currey</a></li>
        <li><a href="https://sthbrx.github.io/author/samuel-mendoza-jonas.html">Samuel Mendoza-Jonas</a></li>
        <li><a href="https://sthbrx.github.io/author/suraj-jitindar-singh.html">Suraj Jitindar Singh</a></li>
    </ul>
  </section>


    <section>
        <h1>Social</h1>
        <ul>
            <li><a href="https://sthbrx.github.io/rss.xml" type="application/rss+xml" rel="alternate">RSS</a></li>
            <li><a href="https://github.com/sthbrx/" target="_blank">GitHub</a></li>
            <li><a href="https://lists.ozlabs.org/listinfo/linuxppc-dev" target="_blank">linuxppc mailing list</a></li>
            <li><a href="https://lists.ozlabs.org/listinfo/skiboot" target="_blank">Skiboot mailing list</a></li>
        </ul>
    </section>
    <section>
        <h1>Blogroll</h1>
        <ul>
            <li><a href="http://ozlabs.org" target="_blank">OzLabs</a></li>
        </ul>
    </section>

    <section>
        <h1>Disclaimer</h1>
        <div>
This blog represents the views of the individual authors, and doesn't necessarily represent IBM's positions, strategies or opinions.        </div>
    </section>
</aside>    </div>
  </div>
  <footer role="contentinfo"><p>
    Copyright &copy;  2015&ndash;2023  OzLabs &mdash;
  <span class="credit">Powered by <a href="http://getpelican.com">Pelican</a></span>
</p></footer>
</body>
</html>