-
Notifications
You must be signed in to change notification settings - Fork 6
/
from-pythran-import-typing.html
501 lines (489 loc) · 46.2 KB
/
from-pythran-import-typing.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Pythran stories - from pythran import typing</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css"/>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.2/css/all.min.css"/>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto+Slab|Ruda"/>
<link rel="stylesheet" type="text/css" href="./theme/css/main.css"/>
<link href="http://serge-sans-paille.github.io/pythran-stories/
feeds/all.atom.xml"
type="application/atom+xml" rel="alternate" title="Pythran stories Atom Feed"/>
</head>
<body>
<style>.github-corner:hover .octo-arm {
animation: octocat-wave 560ms ease-in-out
}
@keyframes octocat-wave {
0%, 100% {
transform: rotate(0)
}
20%, 60% {
transform: rotate(-25deg)
}
40%, 80% {
transform: rotate(10deg)
}
}
@media (max-width: 500px) {
.github-corner:hover .octo-arm {
animation: none
}
.github-corner .octo-arm {
animation: octocat-wave 560ms ease-in-out
}
}</style><div id="container">
<header>
<h1><a href="./">Pythran stories</a></h1>
<ul class="social-media">
<li><a href="https://github.com/serge-sans-paille/pythran"><i class="fab fa-github fa-lg" aria-hidden="true"></i></a></li>
<li><a href="http://serge-sans-paille.github.io/pythran-stories/
feeds/all.atom.xml"
type="application/atom+xml" rel="alternate"><i class="fa fa-rss fa-lg"
aria-hidden="true"></i></a></li>
</ul>
<p><em></em></p>
</header>
<nav>
<ul>
<li><a href="./category/benchmark.html"> benchmark </a></li>
<li><a class="active" href="./category/compilation.html"> compilation </a></li>
<li><a href="./category/engineering.html"> engineering </a></li>
<li><a href="./category/examples.html"> examples </a></li>
<li><a href="./category/mozilla.html"> mozilla </a></li>
<li><a href="./category/optimisation.html"> optimisation </a></li>
<li><a href="./category/release.html"> release </a></li>
</ul>
</nav>
<main>
<article>
<h1>from pythran import typing</h1>
<aside>
<ul>
<li>
<time datetime="2016-12-10 00:00:00+01:00">Dec 10, 2016</time>
</li>
<li>
Categories:
<a href="./category/compilation.html"><em>compilation</em></a>
</li>
</li>
</ul>
</aside>
<p>Pythran is currently part of <a class="reference external" href="http://opendreamkit.org/">OpenDreamKit</a>,
a project that aims at improving the open source computational mathematics
ecosystem.</p>
<p>The goal of Pythran is indeed to improve some Python kernel computations, but
there's something that actually makes Pythran difficult to use for new
comers. What is it? Let's have a look at the following Python code, inspired by
a <a class="reference external" href="http://stackoverflow.com/questions/13815719/creating-grid-with-numpy-performance">stack overflow thread</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># create_grid.py</span>
<span class="c1">#pythran export create_grid(float [])</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="k">def</span> <span class="nf">create_grid</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span>
<span class="n">z</span><span class="p">[:,:,</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">z</span><span class="p">[:,:,</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">x</span>
<span class="n">fast_grid</span> <span class="o">=</span> <span class="n">z</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">N</span><span class="o">*</span><span class="n">N</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">return</span> <span class="n">fast_grid</span>
</pre></div>
<p>An attempt to compile it with Pythran would return a very long C++ template instantiation trace, with very little clue concerning the origin of the problem.</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>pythran<span class="w"> </span>create_grid.py
In<span class="w"> </span>file<span class="w"> </span>included<span class="w"> </span>from<span class="w"> </span>/tmp/tmpP0xYa2.cpp:10:
In<span class="w"> </span>file<span class="w"> </span>included<span class="w"> </span>from<span class="w"> </span>./pythran/pythonic/include/types/ndarray.hpp:12:
In<span class="w"> </span>file<span class="w"> </span>included<span class="w"> </span>from<span class="w"> </span>./pythran/pythonic/include/utils/broadcast_copy.hpp:4:
./pythran/pythonic/include/types/tuple.hpp:122:25:<span class="w"> </span>error:<span class="w"> </span>array<span class="w"> </span>is<span class="w"> </span>too<span class="w"> </span>large<span class="w"> </span><span class="o">(</span><span class="m">18446744073709551615</span><span class="w"> </span>elements<span class="o">)</span>
<span class="w"> </span>value_type<span class="w"> </span>buffer<span class="o">[</span>N<span class="w"> </span>?<span class="w"> </span>N<span class="w"> </span>:<span class="w"> </span><span class="m">1</span><span class="o">]</span><span class="p">;</span>
<span class="w"> </span>^~~~~~~~~
./pythran/pythonic/include/types/numpy_iexpr.hpp:57:26:<span class="w"> </span>note:<span class="w"> </span><span class="k">in</span><span class="w"> </span>instantiation<span class="w"> </span>of<span class="w"> </span>template<span class="w"> </span>class<span class="w"> </span><span class="s1">'pythonic::types::array<long, 18446744073709551615>'</span><span class="w"> </span>requested<span class="w"> </span>here
<span class="w"> </span>array<long,<span class="w"> </span>value><span class="w"> </span>_shape<span class="p">;</span>
<span class="w"> </span>^
<span class="o">[</span>...<span class="o">]</span>
<span class="w"> </span>^
./pythran/pythonic/include/utils/seq.hpp:19:19:<span class="w"> </span>note:<span class="w"> </span>use<span class="w"> </span>-ftemplate-depth<span class="o">=</span>N<span class="w"> </span>to<span class="w"> </span>increase<span class="w"> </span>recursive<span class="w"> </span>template<span class="w"> </span>instantiation<span class="w"> </span>depth
<span class="w"> </span>struct<span class="w"> </span>gens<span class="w"> </span>:<span class="w"> </span>gens<N<span class="w"> </span>-<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>N<span class="w"> </span>-<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>S...><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>^
<span class="m">3</span><span class="w"> </span>errors<span class="w"> </span>generated.
CRITICAL<span class="w"> </span>Cover<span class="w"> </span>me<span class="w"> </span>Jack.<span class="w"> </span>Jack?<span class="w"> </span>Jaaaaack!!!!
E:<span class="w"> </span><span class="o">(</span><span class="s1">'error: Command "clang++-3.8 -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -g -O2 -fPIC -DUSE_GMP -DENABLE_PYTHON_MODULE -D__PYTHRAN__=2 -I./pythran -I./pythran/pythonic/patch -I/home/serge/.venvs/pythran/local/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c /tmp/tmpP0xYa2.cpp -o /tmp/tmpM2Eiso/tmp/tmpP0xYa2.o -std=c++11 -fno-math-errno -w -fwhole-program -fvisibility=hidden" failed with exit status 1'</span>,<span class="o">)</span>
What<span class="w"> </span>we<span class="w"> </span>now<span class="w"> </span>have<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>slightly<span class="w"> </span>friendlier<span class="w"> </span>message:
</pre></div>
<div class="highlight"><pre><span></span>><span class="w"> </span>pythran<span class="w"> </span>create_grid.py
CRITICAL<span class="w"> </span>You<span class="w"> </span>shall<span class="w"> </span>not<span class="w"> </span>pass!
E:<span class="w"> </span>Dimension<span class="w"> </span>mismatch<span class="w"> </span>when<span class="w"> </span>slicing<span class="w"> </span><span class="sb">`</span>Array<span class="o">[</span>2d,<span class="w"> </span>float<span class="o">]</span><span class="sb">`</span><span class="w"> </span><span class="o">(</span>create_grid.py,<span class="w"> </span>line<span class="w"> </span><span class="m">7</span><span class="o">)</span>
</pre></div>
<p>Indeed, the correct declaration for <tt class="docutils literal">z</tt> was <tt class="docutils literal">z = <span class="pre">np.zeros((N,</span> N, 3))</tt>.</p>
<div class="section" id="a-quick-glance-at-pythran-typing-system">
<h2>A Quick Glance at Pythran Typing System</h2>
<p>As you probably know, Python uses a dynamic type system, often called <em>duck typing</em>: what
matters is not the type of an object, but its structure, i.e. the available
methods and fields: <em>If it walks like a duck and talks like a duck, then it's a
duck</em>. That's a kind of <a class="reference external" href="https://en.wikipedia.org/wiki/Structural_type_system">structural typing</a>.</p>
<p>On the opposite side C++ uses a static type system, and if you adhere to OOP
<a class="footnote-reference" href="#footnote-2" id="footnote-reference-1">[1]</a> you may require an object to derive from the <tt class="docutils literal">Duck</tt> class to be
considered a duck; That's a kind of <a class="reference external" href="https://en.wikipedia.org/wiki/Nominal_type_system">nominal typing</a>.</p>
<p>Pythran uses a trick to make both world meet: <em>ad-hoc polymorphism</em>, as
supported in C++ through <tt class="docutils literal">template</tt> meta programing. Upon a template
instantiation, there's no type name verification, only a check that given
methods and attributes make sense in the current context. And that's exactly
what we need to get closer to Python typing!</p>
<p>This all is very nice, except in the case of a bad typing. Consider this trivial
Python code:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">twice</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="k">return</span> <span class="n">s</span> <span class="o">*</span> <span class="mi">2</span>
</pre></div>
<p>integer, for instance <tt class="docutils literal">str</tt>, <tt class="docutils literal">list</tt>, <tt class="docutils literal">int</tt>. The C++ equivalent would
be (taking into account move semantics):</p>
<div class="highlight"><pre><span></span><span class="k">template</span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">T</span><span class="o">></span>
<span class="k">auto</span><span class="w"> </span><span class="n">twice</span><span class="p">(</span><span class="n">T</span><span class="o">&&</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>In Python's case, type checking is done at runtime, during a lookup in <tt class="docutils literal">s</tt> for
a <tt class="docutils literal">__mul__</tt> magic method. In C++ it's done at compile time, when performing
instantiation of <tt class="docutils literal">twice</tt> for a given type value of <tt class="docutils literal">T</tt>. What lacked was a
human-readable error message to warn about the coming winter. And that's
exactly the topic of this post ;-)</p>
</div>
<div class="section" id="a-few-words-about-mypy">
<h2>A Few Words About MyPy</h2>
<p>Type hints, as introduced by <a class="reference external" href="https://www.python.org/dev/peps/pep-0484/">PEP484</a>, make it possible to leverage on
arbitrary function annotations introduced by <a class="reference external" href="https://www.python.org/dev/peps/pep-3107">PEP 3107</a> to specify the expected type of a
function parameter and its resulting return type. No check occur at runtime, but
a third party compiler, say <a class="reference external" href="http://mypy-lang.org/">MyPy</a> can take advantage
of these hints to perform an ahead-of-time check. And that's <strong>great</strong>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">In this post, we use the type annotation introduced by PEP484 and used in
MyPy to describe types. <tt class="docutils literal">int</tt> is an integer, <tt class="docutils literal">List[str]</tt> is a list of
string and so on.</p>
</div>
<p>So, did we trade <tt class="docutils literal">#pythran export twice(str)</tt> for <tt class="docutils literal">def twice(s: str):</tt>? No.
Did we consider the option? Yes. First there's the issue of MyPy only running
on Python3. It can process Python2 code, but it runs on Python3. We've been
struggling so much to keep Python2.7 compatibility in addition to the recent
addition of broader Python3 support. We're not going to leave it apart without
good reasons.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">It also turns out that the <tt class="docutils literal">typing</tt> module has a different internal API
between Python2 and Python3. This makes it quite difficult to use for my
purpose. What a joy to discover this when you think you're done with all
your tests :-/</p>
</div>
<p>No, the main problem is <a class="reference external" href="https://github.com/python/mypy/issues/978">this MyPy issue</a> that basically states that Numpy
does not fit into the model:</p>
<blockquote>
Of course, the best behavior would be to provide a stub for Numpy, but some
features in Numpy make it difficult to provide a good stub</blockquote>
<p>Meanwhile, someone that did not read this issue wrote <a class="reference external" href="https://github.com/machinalis/mypy-data/tree/master/numpy-mypy">A Numpy stub for MyPy</a>. It turns
out that <a class="reference external" href="http://www.machinalis.com/blog/writing-type-stubs-for-numpy/">it' **is** a pain</a>, mostly due to
the flexibility of many Numpy methods.</p>
<p>Additionally, Pythran currently infers type inter-procedurally, while MyPy
requires type annotation on every functions, to keep the problem within
reasonable bounds.</p>
<p>But wait. MyPy author did his PhD on the subject, and he now works hand in hand
with Guildo van Rossum on the subject. Is there any chance for us to do a
better job? Let's be honest. There is not.</p>
<p>What can we do in such a situation? Take advantage of some extra assumptions
Pythran can afford. We focus on scientific computing, all existing types are
known (no user-defined types in Pythran) and we only need to handle small size
kernels, so we can spend some extra computing resources in the process.</p>
</div>
<div class="section" id="a-variant-of-hindley-milner-for-pythran">
<h2>A Variant of Hindley-Milner for Pythran</h2>
<p><a class="reference external" href="https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system">Hindley-Milner (HM)</a> is
a relatively easy to understand type system that supports parametric
polymorphism. A simple implementation has been <a class="reference external" href="http://smallshire.org.uk/sufficientlysmall/2010/04/11/a-hindley-milner-type-inference-implementation-in-python/">written in Python</a>,
but <em>not</em> for Python, even not for the subset supported by Pythran.</p>
<p>The main issue comes with overloaded functions. Consider the <tt class="docutils literal">map</tt> function:
it has a varying number of parameters, and for a given number of parameters,
two possible overloads exist (the first argument being <tt class="docutils literal">None</tt> or a
<tt class="docutils literal">Callable</tt>). Some extra stuff are not as critical but also important: it's
not possible to infer implicit option types (the one that comes with usage of
<tt class="docutils literal">None</tt>). Ocaml uses <tt class="docutils literal">Some</tt> as a counterpart of <tt class="docutils literal">None</tt> to handle this
issue. but there's no such hint in Python (and we don't want to introduce one).</p>
<p>Still, the whole subject of typing is reaaaaaalllllly difficult, and I wanted
to stick as close as possible to Hindley-Milner because of its simplicity. So
what got introduced is the concept of <tt class="docutils literal">MultiType</tt>, which is the type of an
object that can hold several types at the same time. So that's not exactly a
<tt class="docutils literal">UnionType</tt> which is the type of an object that can be of one type among
many. The difference exists because of the situation described by the following code:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">m</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="k">pass</span>
<span class="n">foo</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">foo</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</pre></div>
<p>In that case foo really has two types, namely <tt class="docutils literal"><span class="pre">Callable[[Any],</span> None]</tt> and
<tt class="docutils literal"><span class="pre">Callalble[[Any,</span> Any], None]</tt>. That's what <tt class="docutils literal">MultiType</tt> represents.</p>
<div class="section" id="handling-overloading">
<h3>Handling Overloading</h3>
<p>So we handle overloading through a unique object that has a specific type, a
<tt class="docutils literal">MultiType</tt> that is just a list of possible types.</p>
<p>Abusing from <tt class="docutils literal">Multiype</tt> can quickly make the combinatorics of the type
possibilities go wild, so we had to make a decision. Consider the following code:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="k">return</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">x</span>
</pre></div>
<p>The <tt class="docutils literal">in</tt> operator could be implemented as a <tt class="docutils literal">MultiType</tt>, enumerating the
possible valid signature (remember we know of all possible types in Pythran):</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">Callable[[List[T0],</span> T0], bool]</tt>, a function that takes a list of <tt class="docutils literal">T0</tt> and a <tt class="docutils literal">T0</tt> and returns a boolean,</li>
<li><tt class="docutils literal"><span class="pre">Callable[[str,</span> str], bool]</tt>, a function that takes two strings and returns a boolean,</li>
</ul>
<p>And so on, including for numpy arrays, but we'll comme back to this later and
assume for now we only have these two types. So what is the type of <tt class="docutils literal">foo</tt>? From the
<tt class="docutils literal">x in y</tt> expression, HM tells us that <tt class="docutils literal">x</tt> can be a list of <tt class="docutils literal">T0</tt>, and in
that case <tt class="docutils literal">y</tt> must be of type <tt class="docutils literal">T0</tt>, <strong>or</strong> <tt class="docutils literal">x</tt> is a string and so must be
<tt class="docutils literal">y</tt>. And in both cases, a boolean is returned.</p>
<p>We could consider both alternatives, follow the two type paths and
in the end, compute the signature of <tt class="docutils literal">foo</tt> as a <tt class="docutils literal">MultiType</tt> holding the
outcome of all paths. But that could mean a lot! What we do is an
over-approximation: what is the common structure between <tt class="docutils literal">List[T0]</tt> and
<tt class="docutils literal">str</tt>? Both are iterable, therefeore <tt class="docutils literal">x</tt> must be iterable. Nothing good comes
from <tt class="docutils literal">T0</tt> and <tt class="docutils literal">str</tt>, and <tt class="docutils literal">bool</tt> compared to <tt class="docutils literal">bool</tt> results in a
<tt class="docutils literal">bool</tt>, so in the end <tt class="docutils literal">foo</tt> takes an iterable and any value, and returns a
boolean. That's not as strict as it could be, but that's definitively enough.
However our type system is no longer <em>sound</em> (it does not reject all bad program).</p>
<p>In order to make it easier to perform this approximation, we chose a dedicated representation for containers. In our type system (oh, it's named <em>tog</em> by the way, so in the tog type system), containers are roughly described as a tuple of <tt class="docutils literal">(name, sized, key, value, iter)</tt>:</p>
<ul class="simple">
<li>a <tt class="docutils literal">List[T0]</tt> is considered as <tt class="docutils literal">(List, Sized, int, T0, T0)</tt></li>
<li>a <tt class="docutils literal">Set[T0]</tt> is considered as <tt class="docutils literal">(Set, Sized, NoKey, T0, T0)</tt></li>
<li>a <tt class="docutils literal">Dict[T0, T1]</tt> is considered as <tt class="docutils literal">(Dict, Sized, T0, T1, T0)</tt></li>
<li>a <tt class="docutils literal">str</tt> is considered as <tt class="docutils literal">(Str, Sized, int, Str, Str)</tt></li>
<li>a <tt class="docutils literal">Generator[T0]</tt> is considered as <tt class="docutils literal">(Generator, NoSized, NoKey, T0, T0)</tt></li>
</ul>
<p>As a consequence, an <tt class="docutils literal">Iterable[T0]</tt>, to be compatible with the
over-approximation defined above, is a <tt class="docutils literal">(Any, Any, Any, Any, T0)</tt>.</p>
</div>
<div class="section" id="handling-option-types">
<h3>Handling Option Types</h3>
<p>When HM runs on the following Python code:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
<span class="k">if</span> <span class="n">a</span><span class="p">:</span>
<span class="n">n</span> <span class="o">=</span> <span class="mi">1</span>
<span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="k">return</span> <span class="n">n</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
</pre></div>
<p>It runs into some troubles. The <tt class="docutils literal">return</tt> from the <tt class="docutils literal">True</tt> branch sets the
return type of <tt class="docutils literal">foo</tt> to <tt class="docutils literal">int</tt> but the one from the <tt class="docutils literal">False</tt> branch sets it
to <tt class="docutils literal">None</tt>. How could we make this unification valid? Option types are
generally described as a parametric type, <tt class="docutils literal">Optional[T0]</tt>. To be able to unify
<tt class="docutils literal">int</tt> and <tt class="docutils literal">None</tt>, we would instead need to unify <tt class="docutils literal">Optional[int]</tt> and
<tt class="docutils literal">None</tt>, thus marking <tt class="docutils literal">n</tt> as <tt class="docutils literal">Optional[int]</tt>, which does not work, because
<tt class="docutils literal">range</tt> expects an <tt class="docutils literal">int</tt>.</p>
<p>The solution we have adopted is to make type inference control-flow sensitive. When
meeting an <tt class="docutils literal">if</tt>, we generate a new copy of the variable environment for each
branch, and we <em>merge</em> (not <em>unify</em>) the environments.</p>
<p>Likewise, if the condition is <em>explicitely</em> a check for <tt class="docutils literal">None</tt>, as in:</p>
<div class="highlight"><pre><span></span><span class="k">if</span> <span class="n">a</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">stuff</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">stuff</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
</pre></div>
<p>the environment in the <tt class="docutils literal">True</tt> branch holds the <tt class="docutils literal">None</tt> type for <tt class="docutils literal">a</tt>, and
the <tt class="docutils literal">int</tt> type in the <tt class="docutils literal">False</tt> branch. This could be improved, as we support
only a few patterns as the condition expression, there is something more
generic to be done there.</p>
<p>This even led to improvement in our test base, as the following code was no longer correct:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">v</span> <span class="o">+</span> <span class="mi">1</span>
</pre></div>
<p>Type inference computes that v is of type <tt class="docutils literal">Optional[T0]</tt>, which is not compatible with <tt class="docutils literal">v + 1</tt> and a <tt class="docutils literal">PythranTypeError</tt> is raised. A compatible way to write this would be:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">v</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">pass</span> <span class="c1"># or do stuff</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">v</span> <span class="o">+</span> <span class="mi">1</span>
</pre></div>
</div>
<div class="section" id="handling-type-promotion">
<h3>Handling Type Promotion</h3>
<p>It's not uncommon to find this kind of code:</p>
<div class="highlight"><pre><span></span><span class="n">l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">l</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mf">3.14</span><span class="p">)</span>
</pre></div>
<p>And there's nothing wrong with this in Python, but is this a type error for
Pythran? In classical HM systems, that's a type error: <tt class="docutils literal">[]</tt> is of type
<tt class="docutils literal">List[TO]</tt>, <tt class="docutils literal">list.append</tt> is of type <tt class="docutils literal"><span class="pre">Callable[[List[T0],</span> T0], None]</tt> so
unification sets <tt class="docutils literal">T0</tt> to <tt class="docutils literal">int</tt> after first <tt class="docutils literal">append</tt>, and fails upon the
second <tt class="docutils literal">append</tt> because unification between an <tt class="docutils literal">int</tt> and a <tt class="docutils literal">float</tt> fails.</p>
<p>Looking back in Python typing history, it seems that <a class="reference external" href="https://shedskin.github.io/">shedskin</a> made the decision to consider it's not an error
(see the <a class="reference external" href="http://shed-skin.blogspot.fr/2011/09/shed-skin-09.html">blogpost announce on the topic</a>. Several test cases
of Pythran test suite would fail with a stricter typing, so let's try to
achieve the same behavior as Shedskin, within HM.</p>
<p>The trick here is to consider a scalar as a tuple of four elements <a class="footnote-reference" href="#footnote-1" id="footnote-reference-2">[0]</a>, one per
scalar type we want to support. And then apply the following rule: the actual
type of the scalar is the type of the first non variable type, starting from
the lower index. Under that assumption,</p>
<ul class="simple">
<li>a <tt class="docutils literal">bool</tt> is a <tt class="docutils literal">(T0, T1, T2, bool)</tt></li>
<li>an <tt class="docutils literal">int</tt> is a <tt class="docutils literal">(T0, T1, int, T2)</tt></li>
<li>a <tt class="docutils literal">float</tt> is a <tt class="docutils literal">(T0, float, T1, T2)</tt></li>
<li>a <tt class="docutils literal">complex</tt> is a <tt class="docutils literal">(complex, T0, T1, T2)</tt></li>
</ul>
<p>When unifying an <tt class="docutils literal">int</tt> with a <tt class="docutils literal">float</tt>, regular unification yields <tt class="docutils literal">(T0,
float, int, T2)</tt> which is a <tt class="docutils literal">float</tt> according to the previous definition.</p>
<p>If we want to enforce an <tt class="docutils literal">int</tt>, say as argument of <tt class="docutils literal">range</tt>, then we can
define <tt class="docutils literal">strict_int</tt> as <tt class="docutils literal"><span class="pre">(no-complex,</span> <span class="pre">no-float,</span> int, T0)</tt> which still allows
up-casting from <tt class="docutils literal">bool</tt> to <tt class="docutils literal">int</tt> but prevents up-casting from <tt class="docutils literal">int</tt> to
<tt class="docutils literal">float</tt>.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last"><tt class="docutils literal">numpy</tt> introduces many sized type for integers, floating point numbers
and complex numbers, with a set of rules to handle conversion between one
and the other. As these conversions are generally possible in <tt class="docutils literal">numpy</tt>
(i.e. they dont raise a <tt class="docutils literal">TypeError</tt>), we just use four scalar types:
<tt class="docutils literal">bool`, ``int</tt>, <tt class="docutils literal">complex</tt> and <tt class="docutils literal">float</tt>. <tt class="docutils literal">long</tt> is merged into
<tt class="docutils literal">int</tt>, which also makes the Python2/3 compatibility easier.</p>
</div>
</div>
<div class="section" id="handling-ndarray-type">
<h3>Handling NDArray Type</h3>
<p><tt class="docutils literal">numpy.ndarray</tt> is the corner stone of the <tt class="docutils literal">numpy</tt> package. And it's
super-flexible, allowing all kinds of broadcasting, reshaping, up-casting etc.
Even if Pythran is far from supporting all of its features, it does support a
wide set. The good news is that Pythran supports a lower version of <tt class="docutils literal">ndarray</tt>,
where the number of dimensions of an array does not change: it cannot be
reshaped in place. For instance the C++ type returned by <tt class="docutils literal"><span class="pre">numpy.ones((10,</span>
10))</tt> is <tt class="docutils literal"><span class="pre">types::ndarray<double</span> <span class="pre">/*dtype*/,</span> 2 <span class="pre">/*nbdim*/></span></tt>.</p>
<p>We've extended the <tt class="docutils literal">typing</tt> module to provide <tt class="docutils literal">NDArray</tt>. For Pythran, the
Python equivalent of the above C++ type is <tt class="docutils literal">NDArray[float, :, :]</tt>.</p>
<p>And as we want it to be compatible with the way we defined an <tt class="docutils literal">Iterable</tt>, an <tt class="docutils literal">NDArray</tt> is actually a:</p>
<ul class="simple">
<li><tt class="docutils literal">List[T0]</tt> is considered as <tt class="docutils literal">(List, Sized, int, T0, T0)</tt></li>
<li><tt class="docutils literal">Dict[T0, T1]</tt> is considered as <tt class="docutils literal">(Dict, Sized, T0, T1, T0)</tt></li>
<li>...</li>
<li><tt class="docutils literal">NDArray[complex, :]</tt> is considered as <tt class="docutils literal">(Array, Sized, T0, complex, complex)</tt></li>
<li><tt class="docutils literal">NDArray[complex, :, :]</tt> is considered as <tt class="docutils literal">(Array, Sized, T0, complex, NDArray[complex, :])</tt></li>
<li><tt class="docutils literal">NDArray[complex, :, :, :]</tt> is considered as <tt class="docutils literal">(Array, Sized, T0, complex, NDArray[complex, :, :])</tt></li>
</ul>
<p>That's a recursive definition, and that's pretty useful when used with our
<tt class="docutils literal">MultiType</tt> resolution. If we need to merge an <tt class="docutils literal">NDArray[complex, :, :]</tt> and
an <tt class="docutils literal">NDArray[complex, :, :, :]</tt>, we end up with <tt class="docutils literal">(Array, Sized, T0, complex,
(Array, Sized, T1, complex, T1))</tt> which actually means <em>an array of complex
with at least two dimensions</em>.</p>
</div>
</div>
<div class="section" id="testing-the-brew">
<h2>Testing the Brew</h2>
<p>Let's be honest: the <tt class="docutils literal">tog</tt> type system is more the result of tinkering than
great research. Type systems is a complex field and I did my best to apply what
I learned during my bibliography on the subject, but it still falls short in
various places. So instead of a formal proof, here is some testing results :-).</p>
<p>First, the whole test suite passes without much modifications. It helped
to spot a few <em>errors</em> in the tests, mostly code that was incorrect with
respect to option types. We also updated the way we specify tests input type to rely on PEP484. A typical Pythran unit-test now looks like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">test_shadow_import2</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">run_test</span><span class="p">(</span>
<span class="w"> </span><span class="sd">'''def shadow_import2(s):</span>
<span class="sd"> for set in s : set.add(1)'''</span><span class="p">,</span>
<span class="p">[{</span><span class="mi">1</span><span class="p">},{</span><span class="mi">2</span><span class="p">}],</span>
<span class="n">shadow_import2</span><span class="o">=</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="n">Set</span><span class="p">[</span><span class="nb">int</span><span class="p">]]]</span>
<span class="p">)</span>
</pre></div>
<p>where the <tt class="docutils literal">List[Set[int]]</tt> expression describes the type for which the code
must be instantiated.</p>
<p>The following code sample is adapted from the <a class="reference external" href="http://www.mypy-lang.org/examples.html">MyPy example page</a>. It requires a type comment to be
correctly typed, while Pythran correctly type checks it without annotation.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">wc</span><span class="p">(</span><span class="n">content</span><span class="p">):</span>
<span class="n">d</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">content</span><span class="o">.</span><span class="n">split</span><span class="p">():</span>
<span class="n">d</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="c1"># Use list comprehension</span>
<span class="n">l</span> <span class="o">=</span> <span class="p">[(</span><span class="n">freq</span><span class="p">,</span> <span class="n">word</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">freq</span> <span class="ow">in</span> <span class="n">d</span><span class="o">.</span><span class="n">items</span><span class="p">()]</span>
<span class="k">return</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">l</span><span class="p">)</span>
</pre></div>
<p>If we turn the <tt class="docutils literal">1</tt> into <tt class="docutils literal">"1"</tt>, we get the following error:</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>pythran<span class="w"> </span>wc.py
CRITICAL<span class="w"> </span>You<span class="w"> </span>shall<span class="w"> </span>not<span class="w"> </span>pass!
E:<span class="w"> </span>Invalid<span class="w"> </span>operand<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="sb">`</span>+<span class="sb">`</span>:<span class="w"> </span><span class="sb">`</span>int<span class="sb">`</span><span class="w"> </span>and<span class="w"> </span><span class="sb">`</span>str<span class="sb">`</span><span class="w"> </span><span class="o">(</span>wc.py,<span class="w"> </span>line<span class="w"> </span><span class="m">5</span><span class="o">)</span>
</pre></div>
<p>And if we remove the <tt class="docutils literal">0</tt>, <tt class="docutils literal">d.get(word)</tt> may return <tt class="docutils literal">None</tt> and the error message becomes:</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>pythran<span class="w"> </span>wc.py
CRITICAL<span class="w"> </span>You<span class="w"> </span>shall<span class="w"> </span>not<span class="w"> </span>pass!
E:<span class="w"> </span>Invalid<span class="w"> </span>operand<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="sb">`</span>+<span class="sb">`</span>:<span class="w"> </span><span class="sb">`</span>Option<span class="o">[</span>T0<span class="o">]</span><span class="sb">`</span><span class="w"> </span>and<span class="w"> </span><span class="sb">`</span>int<span class="sb">`</span><span class="w"> </span><span class="o">(</span>wc.py,<span class="w"> </span>line<span class="w"> </span><span class="m">5</span><span class="o">)</span>
</pre></div>
<p>Great!</p>
<p>Considering Numpy functions, we don't model all of them in tog, but we can
still detect several interesting errors. For instance on a gaussian kernel
(<a class="reference external" href="http://stats.stackexchange.com/questions/15798/how-to-calculate-a-gaussian-kernel-effectively-in-numpy">error-safe version from stackexchange</a>):</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="k">def</span> <span class="nf">vectorized_RBF_kernel</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">sigma</span><span class="p">):</span>
<span class="n">X2</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">X</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># sum colums of the matrix</span>
<span class="n">K0</span> <span class="o">=</span> <span class="n">X2</span> <span class="o">+</span> <span class="n">X2</span><span class="o">.</span><span class="n">T</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">X</span> <span class="o">*</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span>
<span class="n">K</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">power</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="mf">1.0</span> <span class="o">/</span> <span class="n">sigma</span><span class="o">**</span><span class="mi">2</span><span class="p">),</span> <span class="n">K0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">K</span>
<span class="k">def</span> <span class="nf">badcall</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="k">return</span> <span class="n">vectorized_RBF_kernel</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
</pre></div>
<p>Pythran correctly catches the error on <tt class="docutils literal">vectorized_RBF_kernel</tt> call:</p>
<div class="highlight"><pre><span></span>><span class="w"> </span>pythran<span class="w"> </span>gaussian.py
CRITICAL<span class="w"> </span>You<span class="w"> </span>shall<span class="w"> </span>not<span class="w"> </span>pass!
E:<span class="w"> </span>Invalid<span class="w"> </span>argument<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="k">function</span><span class="w"> </span>call<span class="w"> </span>to<span class="w"> </span><span class="sb">`</span>Callable<span class="o">[[</span>int,<span class="w"> </span>T3<span class="o">]</span>,<span class="w"> </span>...<span class="o">]</span><span class="sb">`</span>,<span class="w"> </span>tried<span class="w"> </span>Callable<span class="o">[[</span>Array<span class="o">[</span><span class="m">1</span><span class="w"> </span>d+,<span class="w"> </span>T0<span class="o">]</span>,<span class="w"> </span>T1<span class="o">]</span>,<span class="w"> </span>Array<span class="o">[</span><span class="m">1</span><span class="w"> </span>d+,<span class="w"> </span>T2<span class="o">]]</span><span class="w"> </span><span class="o">(</span>gaussian.py,<span class="w"> </span>line<span class="w"> </span><span class="m">9</span><span class="o">)</span>
</pre></div>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>I'm still not satisfied with the tog engine: it's relatively slow, not as
accurate as I'd like it to be, and it's just a type checker: another (simpler)
type engine is used to generate the actual C++ code. That's a lot of not very
enthusiastic concluding remarks, but... I'm French :-)</p>
<p>On the good side, I happened to learn a <em>lot</em> about typing and about Python, while
developing this. And Pythran is in a much better shape now, much more usable,
easier to maintain too, so that was worth the effort :-)</p>
<div class="section" id="acknowledgments">
<h3>Acknowledgments</h3>
<p>As usual, I'd like to thanks Pierrick Brunet for all his help. He keeps feeding
me with relevant insights, criticisms and great ideas. Thanks to <a class="reference external" href="http://opendreamkit.org/">OpenDreamKit</a> for sponsoring that work, and in particular to
<a class="reference external" href="http://www.logilab.fr/">Logilab</a> for their support. Thanks to Lancelot Six,
w1gz and Nicolas M. Thiéry for proof reading this post too :-)</p>
<p>And last, I'm in debt to all Pythran users for keeping the motivation high!</p>
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[0]</a></td><td>That could be more actually, for instance to distinguish single
precision float from double prcesion float, the <tt class="docutils literal">float32</tt> and <tt class="docutils literal">float64</tt>
from numpy. But four types is enough for the envisonned type checking.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>The OOP style in C++ is not enforced by the Standard Library as much as it is in the Java SDK though.</td></tr>
</tbody>
</table>
</div>
</div>
</article>
<section class="post-nav">
<div id="left-page">
<div id="left-link">
</div>
</div>
<div id="right-page">
<div id="right-link">
</div>
</div>
</section>
<div>
</div>
</main>
<footer>
<h6>
Rendered by <a href="http://getpelican.com/">Pelican</a> • Theme by <a
href="https://github.com/aleylara/Peli-Kiera">Peli-Kiera</a> • Copyright
© ‑ serge-sans-paille and other pythraners </h6>
</footer>
</div>
</body>
</html>