-
Notifications
You must be signed in to change notification settings - Fork 1
/
a-closer-look-at-chromes-security-understanding-v8.html
295 lines (254 loc) · 25 KB
/
a-closer-look-at-chromes-security-understanding-v8.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>A Closer Look at Chrome's Security: Understanding V8</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="Marina von Steinkirch">
<!-- Le styles -->
<link rel="stylesheet" href="./theme/css/bootstrap.dark.css" type="text/css" />
<style type="text/css">
body {
padding-top: 60px;
padding-bottom: 40px;
}
.tag-1 {
font-size: 13pt;
}
.tag-2 {
font-size: 11pt;
}
.tag-2 {
font-size: 10pt;
}
.tag-4 {
font-size: 8pt;
}
</style>
<link href="./theme/css/bootstrap-responsive.dark.css" rel="stylesheet">
<link href="./theme/css/font-awesome.css" rel="stylesheet">
<link href="./theme/css/pygments.css" rel="stylesheet">
<!-- Le fav and touch icons -->
<link rel="shortcut icon" href="./theme/images/favicon.ico">
<link rel="apple-touch-icon" href="./theme/images/apple-touch-icon.png">
<link rel="apple-touch-icon" sizes="72x72" href="./theme/images/apple-touch-icon-72x72.png">
<link rel="apple-touch-icon" sizes="114x114" href="./theme/images/apple-touch-icon-114x114.png">
<link href="./feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="chmod +x singularity.sh ATOM Feed" />
</head>
<body>
<div class="navbar navbar-fixed-top">
<div class="navbar-inner">
<div class="container-fluid">
<a class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</a>
<a class="brand" href="./index.html">chmod +x singularity.sh </a>
<div class="nav-collapse">
<ul class="nav">
<li class="divider-vertical"></li>
<ul class="nav pull-right">
<li><a href="./authors.html">About</a></li>
<li><a href="./archives.html"><b>Archives</b></a></li>
<li>
<a href="https://github.com/bt3gl">github
<!--<i class="icon-github-sign icon-large" ></i>-->
</a></li>
<li>
<a href="https://twitter.com/1bt337">
<!--<i class="icon-twitter-sign icon-large"></i> -->
twitter
</a></li>
<li><a href="http://bt3gl.github.io/projects_page/index.html">Bygone Playful Times
</a></li>
</ul>
</ul>
<!--<p class="navbar-text pull-right">Logged in as <a href="#">username</a></p>-->
</div><!--/.nav-collapse -->
</div>
</div>
</div>
<div class="container-fluid">
<div class="row">
<div class="span9" id="content">
<section id="content">
<article>
<header>
<h1>
<a href=""
rel="bookmark"
title="Permalink to A Closer Look at Chrome's Security: Understanding V8">
A Closer Look at Chrome's Security: Understanding V8
</a>
</h1>
</header>
<div class="entry-content">
<div class="well">
<footer class="post-info">
<abbr class="published" title="2014-11-01T04:20:00">
Sat 01 November 2014 </abbr>
<span class="label"> Category</span>
<a href="./category/web-security.html"><i class="icon-folder-open"></i>Web Security</a>
<span class="label">Tags</span>
<a href="./tag/v8.html"><i class="icon-tag"></i>V8</a>
<a href="./tag/jit.html"><i class="icon-tag"></i>JIT</a>
<a href="./tag/javascript.html"><i class="icon-tag"></i>JavaScript</a>
<a href="./tag/garbage_collection.html"><i class="icon-tag"></i>garbage_collection</a>
<a href="./tag/cache.html"><i class="icon-tag"></i>cache</a>
<a href="./tag/bytecode.html"><i class="icon-tag"></i>bytecode</a>
<a href="./tag/chrome.html"><i class="icon-tag"></i>Chrome</a>
<a href="./tag/python.html"><i class="icon-tag"></i>Python</a>
</footer><!-- /.post-info --> </div>
<p><a href="http://blogoscoped.com/google-chrome/">In 2008, Google released a sandbox-oriented browser</a>, that was assembled from several different code libraries from Google and third parties (for instance, it borrowed a rendering machinery from the open-source <a href="https://www.webkit.org/">Webkit layout engine</a>, later changing it to a forked version, <a href="http://en.wikipedia.org/wiki/Blink_(layout_engine)">Blink</a>). Six years later, Chrome has became the preferred browser for <a href="http://en.wikipedia.org/wiki/File:Usage_share_of_web_browsers_(Source_StatCounter).svg">half of users in the Internet</a>. This is enough reason to investigate further how security is dealt in this engine. With this motivation in mind, I summarize the main features of Chrome and its <a href="http://www.chromium.org/Home">Chromium Project</a>, describing the pristine way of processing JavaScript with the <strong>V8 JavaScript virtual machine</strong>.</p>
<h2>They way computers talk...</h2>
<p>In mainstream computer languages, a <a href="http://www.openbookproject.net/thinkcs/python/english2e/ch01.html">source code in a <strong>high level language</strong> is transformed to a <strong>low level language</strong></a> (a machine or assembly language) by either being <strong>compiled</strong> or <strong>interpreted</strong> . This is <a href="https://www.youtube.com/watch?v=_C5AHaS1mOA">a very simple concept</a> but it is a fundamental one!</p>
<h3>Compilers and Interpreters</h3>
<p><strong>Compilers</strong> produce an intermediate form called <strong>code object</strong>, which is like machine code but augmented with symbols tables to make executable blocks (library files, with file objects). A linker is used to finally combine them to form executables.</p>
<p><strong>Interpreters</strong> execute instructions without compiling into machine language first. They are first translated into a lower level intermediate representations such as <strong>byte code</strong> or <strong>abstract syntax trees</strong> (ASTs). Then they are interpreted by a <strong>virtual machine</strong>.</p>
<p>The truth is that things are generally mixed. For example, when you type some instruction in Python's REPL, <a href="http://akaptur.com/blog/2013/11/17/introduction-to-the-python-interpreter-3/">the language executes four steps</a>: <em>lexing</em> (breaks the code into pieces), <em>parsing</em> (generates an AST with those pieces - it is the syntax analysis), <em>compiling</em> (converts the AST into code objects - which are attributes of the function objects), and <em>interpreting</em> (executes the code objects).</p>
<p>In Python, byte-compiled code, in form of <strong>.pyc</strong> files, is used by the compiler to speed-up the start-up time (load time) for short programs that use a lot of standard modules. And, by the way, byte codes are attributes of the code object so to see them, you just need to call <code>func_code</code> (code object) and <code>co_code</code>(bytecode)[1]:</p>
<div class="highlight"><pre><span class="o">>>></span> <span class="k">def</span> <span class="nf">some_function</span><span class="p">():</span>
<span class="o">...</span> <span class="k">return</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">some_function</span><span class="o">.</span><span class="n">func_code</span><span class="o">.</span><span class="n">co_code</span>
<span class="s">'d</span><span class="se">\x00\x00</span><span class="s">S'</span>
</pre></div>
<p>On the other hand, traditional JavaScript code is represented as a bytecode or an AST, and then executed in a <em>virtual machine</em> or further compiled into machine code. When JavaScript interprets code, it executes roughly the following steps: <em>parsing</em> and <em>preprocessing</em>, <em>scope analysis</em>, and <em>bytecode or translation to native</em>. Just a note: the JavaScript engine represents bytecode using <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Internals/Bytecode">SpiderMonkey</a>.</p>
<p>So we see that when modern languages choose the way they compile or interpret code, they are trading off with the speed they want things to run. Since browsers are preoccupied with delivering content the faster they can, this is a fundamental concept.</p>
<h3>Method JITs and Tracing JITs</h3>
<p>To speed things up, instead of having the code being parsed and then executed (<a href="http://en.wikipedia.org/wiki/Ahead-of-time_compilation">one at time</a>), <strong>dynamic translators</strong> (<em>Just-in-time</em> translators, or JIT) can be used. JITs <em>translate intermediate representation into machine language at runtime</em>. They have the efficiency of running native code with the cost of startup time plus increased memory (when the bytecode or AST are first compiled).</p>
<p>Engines have different policies on code generation, which can roughly be grouped into types: <strong>tracing</strong> and <strong>method</strong>.</p>
<p><strong>Method JITs</strong> emit native code for every block (method) of code and update references dynamically. Method JITs can implement an <em>inline cache</em> for rewriting type lookups at runtime.</p>
<p>In <strong>tracing JITs</strong>, native code is only emitted when a certain block (method) is considered <em>important</em>. An example is given by traditional JavaScript: if you load a script with functions that are never used, they are never compiled. Additionally, in JavaScript a <em>cache</em> is usually implemented due to the nature of its <em>dynamic typing system</em>.</p>
<p>As we will see below, V8 performs direct JIT compilation from (JavaScript) source code to native machine code (IA-32, x86-64, ARM, or MIPS ISAs), <strong>without transforming it to bytecode first</strong>. In addition, V8 performs dynamic several optimizations at runtime (including <strong>inline caching</strong>). But let's not get ahead of ourselves! Also, as a note, Google has implemented a technology called <a href="http://code.google.com/p/nativeclient/"><strong>Native Client</strong></a> (NaCl), which allows one to provide compiled code to the Chrome browser.</p>
<hr />
<h2>The way JavaScript rolls...</h2>
<p>JavaScript's integration with <a href="http://en.wikipedia.org/wiki/Netscape_Navigator">Netscape Navigator</a> in the mid-90s made it easier for developers to access HTML page elements such as <em>forms</em>, <em>frames</em>, and <em>images</em>. This was essential for JavaScript's accession to become the most popular scripting engine for the web.</p>
<p>However, the language's high dynamical behavior (that I'm briefly discussing here) came with a price: in the mid-2000s browsers had very slow implementations that did not scale with code size or <em>object allocation</em>. Issues such as <em>memory leaks</em> when running web apps were becoming mainstream. It was clear that things would only get worse and a new JavaScript engine was a need.</p>
<h3>JavaScript's Structure</h3>
<p>In JavaScript, every object has a <em>prototype</em>, and the prototype is also an object. All JavaScript objects inherit their properties and methods from their prototype.</p>
<p>So, for example, supposing an application that has an object <em>Point</em> (borrowed from the <a href="https://developers.google.com/v8/design">official documentation</a>):</p>
<div class="highlight"><pre><span class="kd">function</span> <span class="nx">Point</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span><span class="nx">y</span><span class="p">){</span>
<span class="k">this</span><span class="p">.</span><span class="nx">x</span> <span class="o">=</span> <span class="nx">x</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">y</span> <span class="o">=</span> <span class="nx">y</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>We can create several objects:</p>
<div class="highlight"><pre><span class="n">var</span> <span class="n">a</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Point</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="n">var</span> <span class="n">b</span> <span class="o">=</span> <span class="n">new</span> <span class="n">Point</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">);</span>
</pre></div>
<p>And we can access the propriety <code>x</code> in these object by:</p>
<div class="highlight"><pre><span class="n">a</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">b</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
</pre></div>
<p>In the above implementation, we would have two different Point objects that do not share any structure. This is because JavaScript is <strong>classless</strong>: you create new objects on the fly and dynamically add or remove proprieties. Functions can move from an object to another. Objects with same type can appear in the same sites in the program with no constraints.</p>
<p>Furthermore, to store their object proprieties, most JavaScript engines use a <em>dictionary-like data structure</em>. Each property access demands a dynamic lookup to resolve their location in memory. This contrasts <em>static</em> languages such as Java, where instance variables are located at fixed offsets determined by the compiler (due to the <em>fixed</em> object layout by the <em>object's class</em>). In this case, access is given by a simple memory load or store (a single instruction).</p>
<h3>JavaScript's Garbage Collection</h3>
<p>Garbage collection is a form of <em>automatic memory management</em>: an attempt to reclaim the memory occupied by objects that are not being used any longer (<em>i.e.</em>, if an object loses its reference, the object's memory has to be reclaimed).</p>
<p>The other possibility is <em>manual memory management</em>, which requires the developer to specify which objects need to be deallocated. However, manual garbage collection can result in bugs such as:</p>
<ol>
<li>
<p><strong>Dangling pointers</strong>: when a piece of memory is freed while there are still pointers to it.</p>
</li>
<li>
<p><strong>Double free bugs</strong>: when the program tries to free a region of memory that it had already freed.</p>
</li>
<li>
<p><strong>Memory leaks</strong>: when the program fails to free memory occupied by an object that had became unreachable, leading to memory exhaustion.</p>
</li>
</ol>
<p>As one could guess, JavaScript has automatic memory management. Actually, the core design flaw of traditional JavaScript engines is <strong>bad garbage collection behavior</strong>. The problem is that JavaScript engines do not know exactly where all the pointers are, and they will search through the entire execution stack to see what data looks like pointers (for instance, integers can look like a pointer to an address in the heap).</p>
<hr />
<h2>Introducing V8</h2>
<p>A solution for the issues presented above came from Google, with the <strong>V8 Engine</strong>. V8 is an <a href="https://code.google.com/p/v8/">open source JavaScript engine</a> written in C++ that gave birth to Chrome. V8 has a way to categorize the highly-dynamic JavaScript objects into classes, bringing techniques from static class-based languages. In addition, as I mentioned in the the beginning, V8 compiles JavaScript to native machine code before executing it.</p>
<p>In terms of performance, besides direct compilation to native code, three main features in V8 are fundamental:</p>
<div class="highlight"><pre><span class="mf">1.</span> <span class="n">Hidden</span> <span class="n">classes</span><span class="p">.</span>
<span class="mf">2.</span> <span class="n">In</span><span class="o">-</span><span class="n">line</span> <span class="n">caching</span> <span class="n">as</span> <span class="n">an</span> <span class="n">optimization</span> <span class="n">technique</span><span class="p">.</span>
<span class="mf">3.</span> <span class="n">Efficient</span> <span class="n">memory</span> <span class="n">management</span> <span class="n">system</span> <span class="p">(</span><span class="n">garbage</span> <span class="n">collection</span><span class="p">).</span>
</pre></div>
<p>Let's take a look at each of them.</p>
<h3>V8's Hidden Class</h3>
<p>In V8, as execution goes on, objects that end up with the same properties will share the same <strong>hidden class</strong>. This way the engine applies dynamic optimizations.</p>
<p>Consider the Point example from before: we have two different objects, <code>a</code> and <code>b</code>. Instead of having them completely independent, V8 makes them share a hidden class. So instead of creating two objects, we have <em>three</em>. The hidden class shows that both objects have the same proprieties and an object changes its hidden class when a new property is added.</p>
<p>So, for our example, if another Point object is created:</p>
<ol>
<li>
<p>Initially the Point object has no properties so the newly created object refers to the initial class <strong>C0</strong>. The value is stored at offset zero of the Point object.</p>
</li>
<li>
<p>When property <code>x</code> is added, V8 follows the hidden class transition from <strong>C0</strong> to <strong>C1</strong> and writes the value of <code>x</code> at the offset specified by <strong>C1</strong>.</p>
</li>
<li>
<p>When property <code>y</code> is added, V8 follows the hidden class transition from <strong>C1</strong> to <strong>C2</strong> and writes the value of <code>y</code> at the offset specified by <strong>C2</strong>.</p>
</li>
</ol>
<p>Instead of having a generic lookup for a propriety, V8 generates an efficient machine code to search the propriety. The machine code generated for accessing <code>x</code> is something like this:</p>
<div class="highlight"><pre><span class="cp"># ebx = the point object</span>
<span class="n">cmp</span> <span class="p">[</span><span class="n">ebx</span><span class="p">,</span> <span class="o"><</span><span class="n">class</span> <span class="n">offset</span><span class="o">></span><span class="p">],</span> <span class="o"><</span><span class="n">cached</span> <span class="n">class</span><span class="o">></span>
<span class="n">jne</span> <span class="o"><</span><span class="kr">inline</span> <span class="n">cache</span> <span class="n">miss</span><span class="o">></span>
<span class="n">mv</span> <span class="n">eax</span><span class="p">,</span> <span class="p">[</span><span class="n">ebx</span><span class="p">,</span> <span class="o"><</span><span class="n">cached</span> <span class="n">x</span> <span class="n">offset</span><span class="o">></span><span class="p">]</span>
</pre></div>
<p>Instead of a complicated lookup at the propriety, the propriety reading translates into three machine operations!</p>
<p>It might seem inefficient to create a new hidden class whenever a property is added. However, because of the class transitions the hidden classes can be reused several times. It turns out that most of the access to objects are within the same hidden class.</p>
<h3>V8's Inline caching</h3>
<p>When the engine runs the code, it does not know about the hidden class. V8 optimizes property access by predicting that the class will also be used for all future objects accessed in the same section of code, and adds the information to the <strong>inline cache code</strong>.</p>
<p>Inline caching is a class-based object-oriented optimization technique employed by some language runtimes. The concept of inline caching is based on the observation that the objects that occur at a particular call site are often of the same type. Therefore, performance can be increased by storing the result of a method lookup <em>inline</em> (at the call site).</p>
<p>If V8 has predicted correctly the property's value, this is assigned in a single operation. If the prediction is incorrect, V8 patches the code to remove the optimization. To facilitate this process, call sites are assigned in four different states:</p>
<ol>
<li>
<p><strong>Unitilized</strong>: Initial state, for any object that was never seen before.</p>
</li>
<li>
<p><strong>Pre-monomorphic</strong>: Behaves like an uninitialized but do a one-time lookup and rewrite it to the monophorfic state. It's good for code executed only once (such as initialization and setup).</p>
</li>
<li>
<p><strong>Monomphorpic</strong>: Very fast. Recodes the hidden class of the object already seen.</p>
</li>
<li>
<p><strong>Megamorphic</strong>: Like the initialized stub (since it always does runtime lookup) except that it never replaces itself.</p>
</li>
</ol>
<p>In conclusion, the combination of using hidden classes to access properties with inline caching (plus machine code generation) does optimize in cases where type of objects are frequently created and accessed in a similar way. This greatly improves the speed at which most JavaScript code can be executed.</p>
<h3>V8's Efficient Garbage Collecting</h3>
<p>In V8, a <strong>precise garbage collection</strong> is used. <em>Every pointer's location are known on the execution stack</em>, so V8 is able to implement incremental garbage collection. V8 can migrate an object to another place and just rewire the pointer.</p>
<p>In summary, <a href="https://developers.google.com/v8/design#garb_coll">V8's garbage collection</a>:</p>
<div class="highlight"><pre><span class="mf">1.</span> <span class="n">stops</span> <span class="n">program</span> <span class="n">execution</span> <span class="n">when</span> <span class="n">performing</span> <span class="n">a</span> <span class="n">garbage</span> <span class="n">collection</span> <span class="n">cycle</span><span class="p">,</span>
<span class="mf">2.</span> <span class="n">processes</span> <span class="n">only</span> <span class="n">part</span> <span class="n">of</span> <span class="n">the</span> <span class="n">object</span> <span class="n">heap</span> <span class="n">in</span> <span class="n">most</span> <span class="n">collection</span> <span class="n">cycles</span> <span class="p">(</span><span class="n">minimizing</span> <span class="n">the</span> <span class="n">impact</span> <span class="n">of</span> <span class="n">stopping</span> <span class="n">the</span> <span class="n">application</span><span class="p">),</span>
<span class="mf">3.</span> <span class="n">always</span> <span class="n">knows</span> <span class="n">exactly</span> <span class="n">where</span> <span class="n">all</span> <span class="n">objects</span> <span class="n">and</span> <span class="n">pointers</span> <span class="n">are</span> <span class="n">in</span> <span class="n">memory</span> <span class="p">(</span><span class="n">avoiding</span> <span class="n">falsely</span> <span class="n">identifying</span> <span class="n">objects</span> <span class="n">as</span> <span class="n">pointers</span><span class="p">).</span>
</pre></div>
<hr />
<h2>Further Readings:</h2>
<ul>
<li><a href="https://noncombatant.org/2014/03/11/privacy-and-security-settings-in-chrome/">Privacy And Security Settings in Chrome</a></li>
</ul>
<p>[1] When the Python interpreter is invoked with the <code>-O</code> flag, optimized code is generated and stored in <strong><em>.pyo</em></strong> files. The optimizer removes assert statements.</p>
</div><!-- /.entry-content -->
<div class="comments">
<h2>Comments !</h2>
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_identifier = "a-closer-look-at-chromes-security-understanding-v8.html";
(function() {
var dsq = document.createElement('script');
dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = 'http://bt3gl.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] ||
document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
</div>
</article>
</section>
</div><!--/span-->
</div><!--/row-->
<footer>
<address id="about">
</address><!-- /#about -->
</footer>
</div><!--/.fluid-container-->
<script src="./theme/js/jquery-1.7.2.min.js"></script>
<script src="./theme/js/bootstrap.min.js"></script>
</body>
</html>