-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathen_ton.tex
6133 lines (5438 loc) · 325 KB
/
en_ton.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt,oneside]{article}
\usepackage[T1]{fontenc}
%\usepackage{euler}
\usepackage{amssymb, amsmath, amsfonts, stmaryrd}
\usepackage[mathscr]{euscript}
\usepackage{mathrsfs}
\usepackage{theorem}
\usepackage[english]{babel}
\usepackage{bm}
\usepackage[all]{xy}
%\usepackage{chngcntr}
%\CompileMatrices
\usepackage[bookmarks=false,pdfauthor={Nikolai Durov},pdftitle={Telegram Open Network}]{hyperref}
\usepackage{fancyhdr}
\usepackage{caption}
%
\setlength{\headheight}{15.2pt}
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0.5pt}
%
\def\makepoint#1{\medbreak\noindent{\bf #1.\ }}
\def\zeropoint{\setcounter{subsection}{-1}}
\def\zerosubpoint{\setcounter{subsubsection}{-1}}
\def\nxpoint{\refstepcounter{subsection}%
\smallbreak\makepoint{\thesubsection}}
\def\nxsubpoint{\refstepcounter{subsubsection}%
\smallbreak\makepoint{\thesubsubsection}}
\def\nxsubsubpoint{\refstepcounter{paragraph}%
\makepoint{\paragraph}}
%\setcounter{secnumdepth}{4}
%\counterwithin{paragraph}{subsubsection}
\def\refpoint#1{{\rm\textbf{\ref{#1}}}}
\let\ptref=\refpoint
\def\embt(#1.){\textbf{#1.}}
\def\embtx(#1){\textbf{#1}}
\long\def\nodo#1{}
%
%\def\markbothsame#1{\markboth{#1}{#1}}
\fancyhf{}
\fancyfoot[C]{\thepage}
\def\markbothsame#1{\fancyhead[C]{#1}}
\def\mysection#1{\section{#1}\fancyhead[C]{\textsc{Chapter \textbf{\thesection.} #1}}}
\def\mysubsection#1{\subsection{#1}\fancyhead[C]{\small{\textsc{\textrm{\thesubsection.} #1}}}}
\def\myappendix#1{\section{#1}\fancyhead[C]{\textsc{Appendix \textbf{\thesection.} #1}}}
%
\let\tp=\textit
\let\vr=\textit
\def\workchainid{\vr{workchain\_id\/}}
\def\shardpfx{\vr{shard\_prefix}}
\def\accountid{\vr{account\_id\/}}
\def\currencyid{\vr{currency\_id\/}}
\def\uint{\tp{uint}}
\def\opsc#1{\operatorname{\textsc{#1}}}
\def\blkseqno{\opsc{blk-seqno}}
\def\blkprev{\opsc{blk-prev}}
\def\blkhash{\opsc{blk-hash}}
\def\Hash{\opsc{Hash}}
\def\Sha{\opsc{sha256}}
\def\height{\opsc{height}}
\def\len{\opsc{len}}
\def\leaf{\opsc{Leaf}}
\def\node{\opsc{Node}}
\def\root{\opsc{Root}}
\def\emptyroot{\opsc{EmptyRoot}}
\def\code{\opsc{code}}
\def\Ping{\opsc{Ping}}
\def\Store{\opsc{Store}}
\def\FindNode{\opsc{Find\_Node}}
\def\FindValue{\opsc{Find\_Value}}
\def\Bytes{\tp{Bytes}}
\def\Transaction{\tp{Transaction}}
\def\Account{\tp{Account}}
\def\State{\tp{State}}
\def\Maybe{\opsc{Maybe}}
\def\List{\opsc{List}}
\def\Block{\tp{Block}}
\def\Blockchain{\tp{Blockchain}}
\def\isValidBc{\tp{isValidBc}}
\def\evtrans{\vr{ev\_trans}}
\def\evblock{\vr{ev\_block}}
\def\Hashmap{\tp{Hashmap}}
\def\Type{\tp{Type}}
\def\nat{\tp{nat\/}}
\def\hget{\vr{hget\/}}
\def\bbB{{\mathbb{B}}}
\def\st#1{{\mathbf{#1}}}
%
\hfuzz=0.8pt
\title{Telegram Open Network}
\author{Dr.\ Nikolai Durov}% a.k.a. K.O.T.
\begin{document}
%\pagestyle{myheadings}
\maketitle
\begin{abstract}
The aim of this text is to provide a first description of the
Telegram Open Network (TON) and related blockchain, peer-to-peer,
distributed storage and service hosting technologies. To reduce the
size of this document to reasonable proportions, we focus mainly on
the unique and defining features of the TON platform that are
important for it to achieve its stated goals.
\end{abstract}
\section*{Introduction}
\markbothsame{Introduction}
The {\em Telegram Open Network (TON)} is a fast, secure and scalable
blockchain and network project, capable of handling millions of
transactions per second if necessary, and both user-friendly and
service provider-friendly. We aim for it to be able to host all
reasonable applications currently proposed and conceived. One might
think about TON as a huge distributed supercomputer, or rather a huge
``superserver'', intended to host and provide a variety of services.
This text is not intended to be the ultimate reference with respect to
all implementation details. Some particulars are likely to change
during the development and testing phases.
\clearpage
\tableofcontents
\clearpage
\mysection{Brief Description of TON Components}\label{sect:ton.components}
The {\em Telegram Open Network (TON)} is a combination of the
following components:
\begin{itemize}
\item A flexible multi-blockchain platform ({\em TON Blockchain};
cf.\ Chapter~\ptref{sect:blockchain}), capable of processing
millions of transactions per second, with Turing-complete smart
contracts, upgradable formal blockchain specifications,
multi-cryptocurrency value transfer, support for micropayment
channels and off-chain payment networks. {\em TON Blockchain\/}
presents some new and unique features, such as the ``self-healing''
vertical block\-chain mechanism (cf.~\ptref{sp:inv.sh.blk.corr}) and
Instant Hypercube Routing (cf.~\ptref{sp:instant.hypercube}), which
enable it to be fast, reliable, scalable and self-consistent at the
same time.
\item A peer-to-peer network ({\em TON P2P Network}, or just {\em TON
Network}; cf.\ Chapter~\ptref{sect:network}), used for accessing the
TON Block\-chain, sending transaction candidates, and receiving
updates about only those parts of the blockchain a client is
interested in (e.g., those related to the client's accounts and
smart contracts), but also able to support arbitrary distributed
services, blockchain-related or not.
\item A distributed file storage technology {\em (TON Storage;}
cf.~\ptref{sp:ex.ton.storage}), accessible through {\em TON
Network}, used by the TON Blockchain to store archive copies of
blocks and status data (snapshots), but also available for storing
arbitrary files for users or other services running on the platform,
with torrent-like access technology.
\item A network proxy/anonymizer layer {\em (TON Proxy;}
cf.~\ptref{sp:ex.ton.proxy} and~\ptref{sp:tunnels}), similar to the
$I^2P$ (Invisible Internet Project), used to hide the identity and
IP addresses of {\em TON Network\/} nodes if necessary (e.g., nodes
committing transactions from accounts with large amounts of
cryptocurrency, or high-stake blockchain validator nodes who wish to
hide their exact IP address and geographical location as a measure
against DDoS attacks).
\item A Kademlia-like distributed hash table ({\em TON DHT};
cf.~\ptref{sect:kademlia}), used as a ``torrent tracker'' for {\em
TON Storage} (cf.~\ptref{sp:distr.torr.tr}), as an ``input tunnel
locator'' for {\em TON Proxy\/} (cf.~\ptref{sp:loc.abs.addr}), and
as a service locator for {\em TON Services}
(cf.~\ptref{sp:loc.serv}).
\item A platform for arbitrary services ({\em TON Services};
cf.\ Chapter~\ptref{sect:services}), residing in and available
through {\em TON Network\/} and {\em TON Proxy}, with formalized
interfaces (cf.~\ptref{sp:pub.int.smartc}) enabling browser-like or
smartphone application interaction. These formal interfaces and
persistent service entry points can be published in the TON
Blockchain (cf.~\ptref{sp:ui.ton.dns}); actual nodes providing
service at any given moment can be looked up through the {\em TON
DHT\/} starting from information published in the TON Blockchain
(cf.~\ptref{sp:loc.serv}). Services may create smart contracts in
the TON Blockchain to offer some guarantees to their clients
(cf.~\ptref{sp:mixed.serv}).
\item {\em TON DNS\/} (cf.~\ptref{sp:ton.dns}), a service for
assigning human-readable names to accounts, smart contracts,
services and network nodes.
\item {\em TON Payments\/} (cf.\ Chapter~\ptref{sect:payments}), a
platform for micropayments, micropayment channels and a micropayment
channel network. It can be used for fast off-chain value transfers,
and for paying for services powered by {\em TON Services}.
\item TON will allow easy integration with third-party messaging and
social networking applications, thus making blockchain technologies
and distributed services finally available and accessible to
ordinary users (cf.~\ptref{sp:ton.www}), rather than just to a
handful of early cryptocurrency adopters. We will provide an example
of such an integration in another of our projects, the Telegram
Messenger (cf.~\ptref{sp:telegram.integr}).
\end{itemize}
While the TON Blockchain is the core of the TON project, and the other
components might be considered as playing a supportive role for the
blockchain, they turn out to have useful and interesting functionality
by themselves. Combined, they allow the platform to host more
versatile applications than it would be possible by just using the TON
Blockchain (cf.~\ptref{sp:blockchain.facebook}
and~\ptref{sect:ton.service.impl}).
\clearpage
\mysection{TON Blockchain}\label{sect:blockchain}
We start with a description of the Telegram Open Network (TON)
Blockchain, the core component of the project. Our approach here is
``top-down'': we give a general description of the whole first, and
then provide more detail on each component.
For simplicity, we speak here about {\em the\/} TON Blockchain, even
though in principle several instances of this blockchain protocol may
be running independently (for example, as a result of hard forks). We
consider only one of them.
\mysubsection{TON Blockchain as a Collection of 2-Blockchains}
The TON Blockchain is actually a {\em collection\/} of blockchains
(even a collection of {\em blockchains of blockchains}, or {\em
2-blockchains}---this point will be clarified later
in~\ptref{sp:inv.sh.blk.corr}), because no single blockchain project
is capable of achieving our goal of processing millions of
transactions per second, as opposed to the now-standard dozens of
transactions per second.
\nxsubpoint\label{sp:list.blkch.typ}
\embt(List of blockchain types.) The blockchains in this collection
are:
\begin{itemize}
\item The unique {\em master blockchain}, or {\em masterchain\/} for
short, containing general information about the protocol and the
current values of its parameters, the set of validators and their
stakes, the set of currently active workchains and their ``shards'',
and, most importantly, the set of hashes of the most recent blocks
of all workchains and shardchains.
\item Several (up to $2^{32}$) {\em working blockchains}, or {\em
workchains\/} for short, which are actually the ``workhorses'',
containing the value-transfer and smart-contract
transactions. Different workchains may have different ``rules'',
meaning different formats of account addresses, different formats of
transactions, different virtual machines (VMs) for smart contracts,
different basic cryptocurrencies and so on. However, they all must
satisfy certain basic interoperability criteria to make interaction
between different work\-chains possible and relatively simple. In
this respect, the TON Blockchain is {\em heterogeneous\/}
(cf.~\ptref{sp:blkch.hom.het}), similarly to the EOS
(cf.~\ptref{sp:discuss.EOS}) and PolkaDot
(cf.~\ptref{sp:discuss.PolkaDot}) projects.
\item Each workchain is in turn subdivided into up to $2^{60}$ {\em
shard blockchains}, or {\em shardchains\/} for short, having the
same rules and block format as the workchain itself, but responsible
only for a subset of accounts, depending on several first (most
significant) bits of the account address. In other words, a form of
sharding is built into the system
(cf.~\ptref{sp:shard.supp}). Because all these shardchains share a
common block format and rules, the TON Blockchain is {\em
homogeneous\/} in this respect (cf.~\ptref{sp:blkch.hom.het}),
similarly to what has been discussed in one of Ethereum scaling
proposals.\footnote{\url{https://github.com/ethereum/wiki/wiki/Sharding-FAQ}}
\item Each block in a shardchain (and in the masterchain) is actually
not just a block, but a small blockchain. Normally, this ``block
blockchain'' or ``vertical blockchain'' consists of exactly one
block, and then we might think this is just the corresponding block
of the shardchain (also called ``horizontal block\-chain'' in this
situation). However, if it becomes necessary to fix incorrect
shardchain blocks, a new block is committed into the ``vertical
block\-chain'', containing either the replacement for the invalid
``horizontal blockchain'' block, or a ``block difference'',
containing only a description of those parts of the previous version
of this block that need to be changed. This is a TON-specific
mechanism to replace detected invalid blocks without making a true
fork of all shardchains involved; it will be explained in more
detail in~\ptref{sp:inv.sh.blk.corr}. For now, we just remark that
each shardchain (and the masterchain) is not a conventional
blockchain, but a {\em blockchain of blockchains}, or {\em
2D-blockchain}, or just a {\em 2-blockchain}.
\end{itemize}
\nxsubpoint\label{sp:ISP} \embt(Infinite Sharding Paradigm.) Almost
all blockchain sharding proposals are ``top-down'': one first imagines
a single blockchain, and then discusses how to split it into several
interacting shardchains to improve performance and achieve
scalability.
The TON approach to sharding is ``bottom-up'', explained as follows.
Imagine that sharding has been taken to its extreme, so that exactly
one account or smart contract remains in each shardchain. Then we have
a huge number of ``account-chains'', each describing the state and
state transitions of only one account, and sending value-bearing
messages to each other to transfer value and information.
Of course, it is impractical to have hundreds of millions of blockchains, with updates (i.e., new blocks) usually appearing quite rarely in each of them. In order to implement them more efficiently, we group these ``account-chains'' into ``shardchains'', so that each block of the shardchain is essentially a collection of blocks of account-chains that have been assigned to this shard. Thus the ``account-chains'' have only a purely virtual or logical existence inside the ``shardchains''.
We call this perspective the {\em Infinite Sharding Paradigm}. It explains many of the design decisions for the TON Blockchain.
\nxsubpoint\label{sp:msg.IHR} \embt(Messages. Instant Hypercube Routing.)
The Infinite Sharding Para\-digm instructs us to regard each account
(or smart contract) as if it were in its own shardchain by
itself. Then the only way one account might affect the state of
another is by sending a {\em message\/} to it (this is a special
instance of the so-called Actor model, with accounts as Actors;
cf.~\ptref{sp:actors}). Therefore, a system of messages between
accounts (and shardchains, because the source and destination accounts
are, generally speaking, located in different shardchains) is of
paramount importance to a scalable system such as the TON
Blockchain. In fact, a novel feature of the TON Blockchain, called
{\em Instant Hypercube Routing\/} (cf.~\ptref{sp:instant.hypercube}),
enables it to deliver and process a message created in a block of one
shardchain into the very next block of the destination shardchain,
{\em regardless of the total number of shardchains in the system.}
\nxsubpoint \embt(Quantity of masterchains, workchains and
shardchains.) A TON Blockchain contains exactly one
masterchain. However, the system can potentially accommodate up to
$2^{32}$ workchains, each subdivided into up to $2^{60}$ shardchains.
\nxsubpoint \embt(Workchains can be virtual blockchains, not true
blockchains.) Because a workchain is usually subdivided into
shardchains, the existence of the workchain is ``virtual'', meaning
that it is not a true blockchain in the sense of the general
definition provided in~\ptref{sp:gen.blkch.def} below, but just a
collection of shardchains. When only one shardchain corresponds to a
workchain, this unique shardchain may be identified with the
workchain, which in this case becomes a ``true'' blockchain, at least
for some time, thus gaining a superficial similarity to customary
single-blockchain design. However, the Infinite Sharding Paradigm
(cf.~\ptref{sp:ISP}) tells us that this similarity is indeed
superficial: it is just a coincidence that the potentially huge number
of ``account-chains'' can temporarily be grouped into one blockchain.
\nxsubpoint \embt(Identification of workchains.) Each workchain is
identified by its {\em number\/} or {\em workchain identifier\/}
($\workchainid:\uint_{32}$), which is simply an unsigned 32-bit
integer. Workchains are created by special transactions in the
masterchain, defining the (previously unused) workchain identifier and
the formal description of the workchain, sufficient at least for the
interaction of this workchain with other workchains and for
superficial verification of this workchain's blocks.
\nxsubpoint \embt(Creation and activation of new workchains.) The
creation of a new workchain may be initiated by essentially any member
of the community, ready to pay the (high) masterchain transaction fees
required to publish the formal specification of a new
workchain. However, in order for the new workchain to become active, a
two-thirds consensus of validators is required, because they will need
to upgrade their software to process blocks of the new workchain, and
signal their readiness to work with the new workchain by special
masterchain transactions. The party interested in the activation of
the new workchain might provide some incentive for the validators to
support the new workchain by means of some rewards distributed by a
smart contract.
\nxsubpoint\label{sp:shard.ident} \embt(Identification of
shardchains.) Each shardchain is identified by a couple
$(w,s)=(\workchainid, \shardpfx)$, where $\workchainid:\uint_{32}$
identifies the corresponding workchain, and
$\shardpfx:\st2^{0\ldots60}$ is a bit string of length at most 60,
defining the subset of accounts for which this shardchain is
responsible. Namely, all accounts with $\accountid$ starting with
$\shardpfx$ (i.e., having $\shardpfx$ as most significant bits) will
be assigned to this shardchain.
\nxsubpoint \embt(Identification of account-chains.) Recall that
account-chains have only a virtual existence
(cf.~\ptref{sp:ISP}). However, they have a natural
identifier---namely, $(\workchainid,\accountid)$---because any
account-chain contains information about the state and updates of
exactly one account (either a simple account or smart contract---the
distinction is unimportant here).
\nxsubpoint\label{sp:dyn.split.merge} \embt(Dynamic splitting and
merging of shardchains; cf.~\ptref{sect:split.merge}.) A less
sophisticated system might use {\em static sharding}---for example, by
using the top eight bits of the $\accountid$ to select one of 256
pre-defined shards.
An important feature of the TON Blockchain is that it implements {\em
dynamic sharding}, meaning that the number of shards is not
fixed. Instead, shard $(w,s)$ can be automatically subdivided into
shards $(w,s.0)$ and $(w,s.1)$ if some formal conditions are met
(essentially, if the transaction load on the original shard is high
enough for a prolonged period of time). Conversely, if the load stays
too low for some period of time, the shards $(w,s.0)$ and $(w,s.1)$
can be automatically merged back into shard $(w,s)$.
Initially, only one shard $(w,\emptyset)$ is created for workchain
$w$. Later, it is subdivided into more shards, if and when this becomes necessary (cf.~\ptref{sp:split.necess} and~\ptref{sp:merge.necess}).
\nxsubpoint\label{sp:basic.workchain} \embt(Basic workchain or
Workchain Zero.) While up to $2^{32}$ workchains can be defined with
their specific rules and transactions, we initially define only one,
with $\workchainid=0$. This workchain, called Workchain Zero or the
basic workchain, is the one used to work with {\em TON smart
contracts\/} and transfer {\em TON coins}, also known as {\em
Grams\/} (cf.\ Appendix~\ref{app:coins}). Most applications are
likely to require only Workchain Zero. Shardchains of the basic
workchain will be called {\em basic shardchains}.
\nxsubpoint \embt(Block generation intervals.) We expect a new block
to be generated in each shardchain and the masterchain approximately
once every five seconds. This will lead to reasonably small
transaction confirmation times. New blocks of all shardchains are
generated approximately simultaneously; a new block of the masterchain
is generated approximately one second later, because it must contain
the hashes of the latest blocks of all shardchains.
\nxsubpoint\label{sp:sc.hash.mc} \embt(Using the masterchain to make
workchains and shardchains tightly coupled.) Once the hash of a block
of a shardchain is incorporated into a block of the masterchain, that
shardchain block and all its ancestors are considered ``canonical'',
meaning that they can be referenced from the subsequent blocks of all
shardchains as something fixed and immutable. In fact, each new
shardchain block contains a hash of the most recent masterchain block,
and all shardchain blocks referenced from that masterchain block are
considered immutable by the new block.
Essentially, this means that a transaction or a message committed in a
shardchain block may be safely used in the very next blocks of the
other shardchains, without needing to wait for, say, twenty
confirmations (i.e., twenty blocks generated after the original block
in the same blockchain) before forwarding a message or taking other
actions based on a previous transaction, as is common in most proposed
``loosely-coupled'' systems (cf.~\ptref{sp:blkch.interact}), such as
EOS. This ability to use transactions and messages in other
shardchains a mere five seconds after being committed is one of the
reasons we believe our ``tightly-coupled'' system, the first of its
kind, will be able to deliver unprecedented performance
(cf.~\ptref{sp:shard.supp} and~\ptref{sp:blkch.interact}).
\nxsubpoint \embt(Masterchain block hash as a global state.)
According to~\ptref{sp:sc.hash.mc}, the hash of the last masterchain
block completely determines the overall state of the system from the
perspective of an external observer. One does not need to monitor the
state of all shardchains separately.
\nxsubpoint \embt(Generation of new blocks by validators;
cf.~\ptref{sect:validators}.) The TON Blockchain uses a
Proof-of-Stake (PoS) approach for generating new blocks in the
shardchains and the masterchain. This means that there is a set of,
say, up to a few hundred {\em validators}---special nodes that have
deposited {\em stakes\/} (large amounts of TON coins) by a special
masterchain transaction to be eligible for new block generation and
validation.
Then a smaller subset of validators is assigned to each shard $(w,s)$
in a deterministic pseudorandom way, changing approximately every 1024
blocks. This subset of validators suggests and reaches consensus on
what the next shardchain block would be, by collecting suitable
proposed transactions from the clients into new valid block
candidates. For each block, there is a pseudorandomly chosen order on
the validators to determine whose block candidate has the highest
priority to be committed at each turn.
Validators and other nodes check the validity of the proposed block
candidates; if a validator signs an invalid block candidate, it may be
automatically punished by losing part or all of its stake, or by being
suspended from the set of validators for some time. After that, the
validators should reach consensus on the choice of the next block,
essentially by an efficient variant of the BFT (Byzantine Fault
Tolerant; cf.~\ptref{sp:dpos.bft}) consensus protocol, similar to
PBFT~\cite{PBFT} or Honey Badger BFT~\cite{HoneyBadger}. If consensus
is reached, a new block is created, and validators divide between
themselves the transaction fees for the transactions included, plus
some newly-created (``minted'') coins.
Each validator can be elected to participate in several validator
subsets; in this case, it is expected to run all validation and
consensus algorithms in parallel.
After all new shardchain blocks are generated or a timeout is passed,
a new masterchain block is generated, including the hashes of the
latest blocks of all shardchains. This is done by BFT consensus of
{\em all\/} validators.\footnote{Actually, two-thirds by stake is
enough to achieve consensus, but an effort is made to collect as
many signatures as possible.}
More detail on the TON PoS approach and its economical model is
provided in section~\ptref{sect:validators}.
\nxsubpoint \embt(Forks of the masterchain.) A complication that
arises from our tightly-coupled approach is that switching to a
different fork in the masterchain will almost necessarily require
switching to another fork in at least some of the shardchains. On the
other hand, as long as there are no forks in the masterchain, no forks
in the shardchain are even possible, because no blocks in the
alternative forks of the shardchains can become ``canonical'' by
having their hashes incorporated into a masterchain block.
The general rule is that {\em if masterchain block $B'$ is a
predecessor of $B$, $B'$ includes hash $\Hash(B'_{w,s})$ of
$(w,s)$-shardchain block $B'_{w,s}$, and $B$ includes hash
$\Hash(B_{w,s})$, then $B'_{w,s}$ {\bf must} be a predecessor of
$B_{w,s}$; otherwise, the masterchain block $B$ is invalid.}
We expect masterchain forks to be rare, next to non-existent, because
in the BFT paradigm adopted by the TON Blockchain they can happen only
in the case of incorrect behavior by a {\em majority\/} of validators
(cf.~\ptref{sp:validators} and~\ptref{sp:new.master.blk}), which would
imply significant stake losses by the offenders. Therefore, no true
forks in the shardchains should be expected. Instead, if an invalid
shardchain block is detected, it will be corrected by means of the
``vertical blockchain'' mechanism of the 2-blockchain
(cf.~\ptref{sp:inv.sh.blk.corr}), which can achieve this goal without
forking the ``horizontal blockchain'' (i.e., the shardchain). The same
mechanism can be used to fix non-fatal mistakes in the masterchain
blocks as well.
\nxsubpoint\label{sp:inv.sh.blk.corr} \embt(Correcting invalid
shardchain blocks.) Normally, only valid shardchain blocks will be
committed, because validators assigned to the shardchain must reach a
two-thirds Byzantine consensus before a new block can be
committed. However, the system must allow for detection of previously
committed invalid blocks and their correction.
Of course, once an invalid shardchain block is found---either by a
validator (not necessarily assigned to this shardchain) or by a
``fisherman'' (any node of the system that made a certain deposit to
be able to raise questions about block validity;
cf.~\ptref{sp:fish})---the invalidity claim and its proof are
committed into the masterchain, and the validators that have signed
the invalid block are punished by losing part of their stake and/or
being temporarily suspended from the set of validators (the latter
measure is important for the case of an attacker stealing the private
signing keys of an otherwise benign validator).
However, this is not sufficient, because the overall state of the
system (TON Block\-chain) turns out to be invalid because of the
invalid shardchain block previously committed. This invalid block must
be replaced by a newer valid version.
Most systems would achieve this by ``rolling back'' to the last block
before the invalid one in this shardchain and the last blocks
unaffected by messages propagated from the invalid block in each of
the other shardchains, and creating a new fork from these blocks. This
approach has the disadvantage that a large number of otherwise correct
and committed transactions are suddenly rolled back, and it is unclear
whether they will be included later at all.
The TON Blockchain solves this problem by making each ``block'' of
each shardchain and of the masterchain (``horizontal blockchains'') a
small blockchain (``vertical blockchain'') by itself, containing
different versions of this ``block'', or their
``differences''. Normally, the vertical blockchain consists of exactly
one block, and the shardchain looks like a classical
blockchain. However, once the invalidity of a block is confirmed and
committed into a masterchain block, the ``vertical blockchain'' of the
invalid block is allowed to grow by a new block in the vertical
direction, replacing or editing the invalid block. The new block is
generated by the current validator subset for the shardchain in
question.
The rules for a new ``vertical'' block to be valid are quite
strict. In particular, if a virtual ``account-chain block''
(cf.~\ptref{sp:ISP}) contained in the invalid block is valid by
itself, it must be left unchanged by the new vertical block.
Once a new ``vertical'' block is committed on top of the invalid
block, its hash is published in a new masterchain block (or rather in
a new ``vertical'' block, lying above the original masterchain block
where the hash of the invalid shardchain block was originally
published), and the changes are propagated further to any shardchain
blocks referring to the previous version of this block (e.g., those
having received messages from the incorrect block). This is fixed by
committing new ``vertical'' blocks in vertical blockchains for all
blocks previously referring to the ``incorrect'' block; new vertical
blocks will refer to the most recent (corrected) versions
instead. Again, strict rules forbid changing account-chains that are
not really affected (i.e., that receive the same messages as in the
previous version). In this way, fixing an incorrect block generates
``ripples'' that are ultimately propagated towards the most recent
blocks of all affected shardchains; these changes are reflected in new
``vertical'' masterchain blocks as well.
Once the ``history rewriting'' ripples reach the most recent blocks,
the new shardchain blocks are generated in one version only, being
successors of the newest block versions only. This means that they
will contain references to the correct (most recent) vertical blocks
from the very beginning.
The masterchain state implicitly defines a map transforming the hash
of the first block of each ``vertical'' blockchain into the hash of
its latest version. This enables a client to identify and locate any
vertical blockchain by the hash of its very first (and usually the
only) block.
\nxsubpoint \embt(TON coins and multi-currency workchains.) The TON
Block\-chain supports up to $2^{32}$ different ``cryptocurrencies'',
``coins'', or ``tokens'', distinguished by a 32-bit $\currencyid$. New
cryptocurrencies can be added by special transactions in the
masterchain. Each workchain has a basic cryptocurrency, and can have
several additional cryptocurrencies.
There is one special cryptocurrency with $\currencyid=0$, namely, the
{\em TON coin}, also known as the {\em Gram\/}
(cf.\ Appendix~\ref{app:coins}). It is the basic cryptocurrency of
Workchain Zero. It is also used for transaction fees and validator
stakes.
In principle, other workchains may collect transaction fees in other
tokens. In this case, some smart contract for automated conversion of
these transaction fees into Grams should be provided.
\nxsubpoint \embt(Messaging and value transfer.) Shardchains
belonging to the same or different workchains may send {\em
messages\/} to each other. While the exact form of the messages
allowed depends on the receiving workchain and receiving account
(smart contract), there are some common fields making inter-workchain
messaging possible. In particular, each message may have some {\em
value} attached, in the form of a certain amount of Grams (TON
coins) and/or other registered cryptocurrencies, provided they are
declared as acceptable cryptocurrencies by the receiving workchain.
The simplest form of such messaging is a value transfer from one
(usually not a smart-contract) account to another.
\nxsubpoint\label{sp:tonvm} \embt(TON Virtual Machine.) The {\em TON
Virtual Machine}, also abbreviated as {\em TON VM\/} or {\em TVM\/},
is the virtual machine used to execute smart-contract code in the
masterchain and in the basic workchain. Other workchains may use other
virtual machines alongside or instead of the TVM.
Here we list some of its features. They are discussed further
in~\ptref{sp:pec.tvm}, \ptref{sp:tvm.cells} and elsewhere.
\begin{itemize}
\item TVM represents all data as a collection of {\em (TVM) cells\/}
(cf.~\ptref{sp:tvm.cells}). Each cell contains up to 128 data bytes
and up to 4 references to other cells. As a consequence of the
``everything is a bag of cells'' philosophy
(cf.~\ptref{sp:everything.is.BoC}), this enables TVM to work with
all data related to the TON Blockchain, including blocks and
blockchain global state if necessary.
\item TVM can work with values of arbitrary algebraic data types
(cf.~\ptref{sp:pec.tvm}), represented as trees or directed acyclic
graphs of TVM cells. However, it is agnostic towards the existence
of algebraic data types; it just works with cells.
\item TVM has built-in support for hashmaps (cf.~\ptref{sp:patricia}).
\item TVM is a stack machine. Its stack keeps either 64-bit integers
or cell references.
\item 64-bit, 128-bit and 256-bit arithmetic is supported. All $n$-bit
arithmetic operations come in three flavors: for unsigned integers,
for signed integers and for integers modulo $2^n$ (no automatic
overflow checks in the latter case).
\item TVM has unsigned and signed integer conversion from $n$-bit to
$m$-bit, for all $0\leq m,n\leq 256$, with overflow checks.
\item All arithmetic operations perform overflow checks by default,
greatly simplifying the development of smart contracts.
\item TVM has ``multiply-then-shift'' and ``shift-then-divide''
arithmetic operations with intermediate values computed in a larger
integer type; this simplifies implementing fixed-point arithmetic.
\item TVM offers support for bit strings and byte strings.
\item Support for 256-bit Elliptic Curve Cryptography (ECC) for some
predefined curves, including Curve25519, is present.
\item Support for Weil pairings on some elliptic curves, useful for
fast implementation of zk-SNARKs, is also present.
\item Support for popular hash functions, including $\Sha$, is
present.
\item TVM can work with Merkle proofs
(cf.~\ptref{sp:ton.smart.pc.supp}).
\item TVM offers support for ``large'' or ``global'' smart
contracts. Such smart contracts must be aware of sharding
(cf.~\ptref{sp:loc.glob.smct} and \ptref{sp:tvm.data.shard}). Usual
(local) smart contracts can be sharding-agnostic.
\item TVM supports closures.
\item A ``spineless tagless $G$-machine'' \cite{STGM} can be easily
implemented inside TVM.
\end{itemize}
Several high-level languages can be designed for TVM, in addition to
the ``TVM assembly''. All these languages will have static types and
will support algebraic data types. We envision the following
possibilities:
\begin{itemize}
\item A Java-like imperative language, with each smart contract
resembling a separate class.
\item A lazy functional language (think of Haskell).
\item An eager functional language (think of ML).
\end{itemize}
\nxsubpoint\label{sp:config.param} \embt(Configurable parameters.) An
important feature of the TON Block\-chain is that many of its
parameters are {\em configurable}. This means that they are part of
the masterchain state, and can be changed by certain special
proposal/vote/result transactions in the masterchain, without any need
for hard forks. Changing such parameters will require collecting
two-thirds of validator votes and more than half of the votes of all
other participants who would care to take part in the voting process
in favor of the proposal.
\mysubsection{Generalities on Blockchains}
\nxsubpoint\label{sp:gen.blkch.def} \embt(General blockchain
definition.) In general, any {\em (true) blockchain\/} is a sequence
of {\em blocks}, each block $B$ containing a reference $\blkprev(B)$
to the previous block (usually by including the hash of the previous
block into the header of the current block), and a list of {\em
transactions}. Each transaction describes some transformation of the
{\em global blockchain state}; the transactions listed in a block are
applied sequentially to compute the new state starting from the old
state, which is the resulting state after the evaluation of the
previous block.
\nxsubpoint \embt(Relevance for the TON Blockchain.) Recall that the
{\em TON Block\-chain\/} is not a true blockchain, but a
collection of 2-blockchains (i.e., of blockchains of
blockchains; cf.~\ptref{sp:list.blkch.typ}), so the above
is not directly applicable to it. However, we start with
these generalities on true blockchains to use them as
building blocks for our more sophisticated constructions.
\nxsubpoint \embt(Blockchain instance and blockchain type.) One often
uses the word {\em blockchain\/} to denote both a general {\em
blockchain type\/} and its specific {\em blockchain instances},
defined as sequences of blocks satisfying certain conditions. For
example, \ptref{sp:gen.blkch.def} refers to blockchain instances.
In this way, a blockchain type is usually a ``subtype'' of the type
$\Block^*$ of lists (i.e., finite sequences) of blocks, consisting of
those sequences of blocks that satisfy certain compatibility and
validity conditions:
\begin{equation}
\Blockchain \subset \Block^*
\end{equation}
A better way to define $\Blockchain$ would be to say that
$\Blockchain$ is a {\em dependent couple type}, consisting of couples
$(\bbB,v)$, with first component $\bbB:\Block^*$ being of type
$\Block^*$ (i.e., a list of blocks), and the second component
$v:\isValidBc(\bbB)$ being a proof or a witness of the validity of
$\bbB$. In this way,
\begin{equation}
\Blockchain\equiv\Sigma_{(\bbB:\Block^*)}\isValidBc(\bbB)
\end{equation}
We use here the notation for dependent sums of types borrowed from~\cite{HoTT}.
\nxsubpoint \embt(Dependent type theory, Coq and TL.) Note that we
are using (Martin-L\"of) dependent type theory here, similar to that
used in the Coq\footnote{\url{https://coq.inria.fr}} proof
assistant. A simplified version of dependent type theory is also used
in {\em TL (Type
Language)},\footnote{\url{https://core.telegram.org/mtproto/TL}}
which will be used in the formal specification of the TON Blockchain
to describe the serialization of all data structures and the layouts
of blocks, transactions, and the like.
In fact, dependent type theory gives a useful formalization of what a
proof is, and such formal proofs (or their serializations) might
become handy when one needs to provide proof of invalidity for some
block, for example.
\nxsubpoint\label{sp:TL} \embt(TL, or the Type Language.) Since TL
(Type Language) will be used in the formal specifications of TON
blocks, transactions, and network datagrams, it warrants a brief
discussion.
TL is a language suitable for description of dependent algebraic {\em
types}, which are allowed to have numeric (natural) and type
parameters. Each type is described by means of several {\em
constructors}. Each constructor has a (human-readable) identifier
and a {\em name,} which is a bit string (32-bit integer by
default). Apart from that, the definition of a constructor contains a
list of fields along with their types.
A collection of constructor and type definitions is called a {\em
TL-scheme}. It is usually kept in one or several files with the
suffix \texttt{.tl}.
An important feature of TL-schemes is that they determine an
unambiguous way of serializing and deserializing values (or objects)
of algebraic types defined. Namely, when a value needs to be
serialized into a stream of bytes, first the name of the constructor
used for this value is serialized. Recursively computed serializations
of each field follow.
The description of a previous version of TL, suitable for serializing
arbitrary objects into sequences of 32-bit integers, is available at
\url{https://core.telegram.org/mtproto/TL}. A new version of TL,
called {\em TL-B}, is being developed for the purpose of describing
the serialization of objects used by the TON Project. This new version
can serialize objects into streams of bytes and even bits (not just
32-bit integers), and offers support for serialization into a tree of
TVM cells (cf.~\ptref{sp:tvm.cells}). A description of TL-B will be a
part of the formal specification of the TON Blockchain.
\nxsubpoint\label{sp:blk.transf} \embt(Blocks and transactions as
state transformation operators.) Normally, any blockchain (type)
$\Blockchain$ has an associated global state (type) $\State$, and a
transaction (type) $\Transaction$. The semantics of a blockchain are
to a large extent determined by the transaction application function:
\begin{equation}
\evtrans':\Transaction\times\State\to\State^?
\end{equation}
Here $X^?$ denotes $\Maybe X$, the result of applying the $\Maybe$
monad to type $X$. This is similar to our use of $X^*$ for $\List
X$. Essentially, a value of type $X^?$ is either a value of type $X$
or a special value $\bot$ indicating the absence of an actual value
(think about a null pointer). In our case, we use $\State^?$ instead
of $\State$ as the result type because a transaction may be invalid if
invoked from certain original states (think about attempting to
withdraw from an account more money than it is actually there).
We might prefer a curried version of $\evtrans'$:
\begin{equation}
\evtrans:\Transaction\to\State\to\State^?
\end{equation}
Because a block is essentially a list of transactions, the block
evaluation function
\begin{equation}
\evblock:\Block\to\State\to\State^?
\end{equation}
can be derived from $\evtrans$. It takes a block $B:\Block$ and the
previous blockchain state $s:\State$ (which might include the hash of
the previous block) and computes the next blockchain state
$s'=\evblock(B)(s):\State$, which is either a true state or a special
value $\bot$ indicating that the next state cannot be computed (i.e.,
that the block is invalid if evaluated from the starting state
given---for example, the block includes a transaction trying to debit
an empty account.)
\nxsubpoint \embt(Block sequence numbers.) Each block $B$ in the
blockchain can be referred to by its {\em sequence number}
$\blkseqno(B)$, starting from zero for the very first block, and
incremented by one whenever passing to the next block. More formally,
\begin{equation}
\blkseqno(B)=\blkseqno\bigl(\blkprev(B)\bigr)+1
\end{equation}
Notice that the sequence number does not identify a block uniquely in
the presence of {\em forks}.
\nxsubpoint \embt(Block hashes.) Another way of referring to a block
$B$ is by its hash $\blkhash(B)$, which is actually the hash of the
{\em header\/} of block $B$ (however, the header of the
block usually contains hashes that depend on all content
of block $B$). Assuming that there are no collisions for
the hash function used (or at least that they are very
improbable), a block is uniquely identified by its hash.
\nxsubpoint \embt(Hash assumption.) During formal analysis of
blockchain algorithms, we assume that there are no collisions for the
$k$-bit hash function $\Hash:\Bytes^*\to\st2^{k}$ used:
\begin{equation}\label{eq:hash.coll}
\Hash(s)=\Hash(s')\Rightarrow s=s'\quad\text{for any $s$,
$s'\in\Bytes^*$}
\end{equation}
Here $\Bytes=\{0\ldots255\}=\st2^8$ is the type of bytes, or the set
of all byte values, and $\Bytes^*$ is the type or set of arbitrary
(finite) lists of bytes; while $\st2=\{0,1\}$ is the bit type, and
$\st2^k$ is the set (or actually the type) of all $k$-bit sequences
(i.e., of $k$-bit numbers).
Of course, \eqref{eq:hash.coll} is impossible mathematically, because
a map from an infinite set to a finite set cannot be injective. A more
rigorous assumption would be
\begin{equation}\label{eq:hash.coll.prec}
\forall s, s': s\neq s', P\bigl(\Hash(s)=\Hash(s')\bigr)=2^{-k}
\end{equation}
However, this is not so convenient for the proofs. If
\eqref{eq:hash.coll.prec} is used at most $N$ times in a proof with
$2^{-k}N<\epsilon$ for some small $\epsilon$ (say,
$\epsilon=10^{-18}$), we can reason as if \eqref{eq:hash.coll} were
true, provided we accept a failure probability $\epsilon$ (i.e., the
final conclusions will be true with probability at least
$1-\epsilon$).
Final remark: in order to make the probability statement
of~\eqref{eq:hash.coll.prec} really rigorous, one must introduce a
probability distribution on the set $\Bytes^*$ of all byte
sequences. A way of doing this is by assuming all byte sequences of
the same length $l$ equiprobable, and setting the probability of
observing a sequence of length $l$ equal to $p^l-p^{l+1}$ for some
$p\to1-$. Then \eqref{eq:hash.coll.prec} should be understood as a
limit of conditional probability $P\bigl(\Hash(s)=\Hash(s')|s\neq
s'\bigr)$ when $p$ tends to one from below.
\nxsubpoint\label{sp:hash.change} \embt(Hash used for the TON
Blockchain.) We are using the 256-bit $\Sha$ hash for the TON
Blockchain for the time being. If it turns out to be weaker than
expected, it can be replaced by another hash function in the
future. The choice of the hash function is a configurable parameter of
the protocol, so it can be changed without hard forks as explained
in~\ptref{sp:config.param}.
\mysubsection{Blockchain State, Accounts and Hashmaps}
We have noted above that any blockchain defines a certain global
state, and each block and each transaction defines a transformation of
this global state. Here we describe the global state used by TON
blockchains.
\nxsubpoint \embt(Account IDs.) The basic account IDs used by TON
blockchains---or at least by its masterchain and Workchain Zero---are
256-bit integers, assumed to be public keys for 256-bit Elliptic Curve
Cryptography (ECC) for a specific elliptic curve. In this way,
\begin{equation}
\accountid:\Account=\uint_{256}=\st2^{256}
\end{equation}
Here $\Account$ is the account {\em type}, while $\accountid:\Account$
is a specific variable of type $\Account$.
Other workchains can use other account ID formats, 256-bit or
otherwise. For example, one can use Bitcoin-style account IDs, equal
to $\Sha$ of an ECC public key.
However, the bit length $l$ of an account ID must be fixed during the
creation of the workchain (in the masterchain), and it must be at
least 64, because the first 64 bits of $\accountid$ are used for
sharding and message routing.
\nxsubpoint \embt(Main component: {\em Hashmaps}.) The principal
component of the TON blockchain state is a {\em hashmap}. In some
cases we consider (partially defined) ``maps''
$h:\st2^n\dashrightarrow\st2^m$. More generally, we might be
interested in hashmaps $h:\st2^n\dashrightarrow X$ for a composite
type $X$. However, the source (or index) type is almost always
$\st2^n$.
Sometimes, we have a ``default value'' $\vr{empty}:X$, and the hashmap
$h:\st2^n\to X$ is ``initialized'' by its ``default value''
$i\mapsto\vr{empty}$.
\nxsubpoint \embt(Example: TON account balances.) An important
example is given by TON account balances. It is a hashmap
\begin{equation}
\vr{balance}:\Account\to\uint_{128}
\end{equation}
mapping $\Account=\st2^{256}$ into a Gram (TON coin) balance of type
$\uint_{128}=\st2^{128}$. This hashmap has a default value of zero,
meaning that initially (before the first block is processed) the
balance of all accounts is zero.
\nxsubpoint \embt(Example: smart-contract persistent storage.)
Another example is given by smart-contract persistent storage, which
can be (very approximately) represented as a hashmap
\begin{equation}
\vr{storage}:\st2^{256}\dashrightarrow\st2^{256}
\end{equation}
This hashmap also has a default value of zero, meaning that
uninitialized cells of persistent storage are assumed to be zero.
\nxsubpoint \embt(Example: persistent storage of all smart contracts.)
Because we have more than one smart contract, distinguished by
$\accountid$, each having its separate persistent storage, we must
actually have a hashmap
\begin{equation}
\vr{Storage}:\Account\dashrightarrow(\st2^{256}\dashrightarrow\st2^{256})
\end{equation}
mapping $\accountid$ of a smart contract into its persistent storage.
\nxsubpoint \embt(Hashmap type.) The hashmap is not just an abstract
(partially defined) function $\st2^n\dashrightarrow X$; it has a
specific representation. Therefore, we suppose that we have a special
hashmap type
\begin{equation}
\Hashmap (n,X):\Type
\end{equation}
corresponding to a data structure encoding a (partial) map
$\st2^n\dashrightarrow X$. We can also write
\begin{equation}
\Hashmap (n:\nat) (X:\Type) : \Type
\end{equation}
or
\begin{equation}
\Hashmap:\nat\to\Type\to\Type
\end{equation}
We can always transform $h:\Hashmap(n,X)$ into a map
$\hget(h):\st2^n\to X^?$. Henceforth, we usually write $h[i]$ instead
of $\hget(h)(i)$:
\begin{equation}
h[i]:\equiv\hget(h)(i):X^?\quad\text{for any $i:\st2^n$,
$h:\Hashmap(n,X)$}
\end{equation}
\nxsubpoint\label{sp:patricia} \embt(Definition of hashmap type as a
Patricia tree.) Logically, one might define $\Hashmap(n,X)$ as an
(incomplete) binary tree of depth $n$ with edge labels $0$ and $1$ and
with values of type $X$ in the leaves. Another way to describe the
same structure would be as a {\em (bitwise) trie\/} for binary strings
of length equal to $n$.
In practice, we prefer to use a compact representation of this trie,
by compressing each vertex having only one child with its parent. The
resulting representation is known as a {\em Patricia tree\/} or a {\em
binary radix tree\/}. Each intermediate vertex now has exactly two
children, labeled by two non-empty binary strings, beginning with zero
for the left child and with one for the right child.
In other words, there are two types of (non-root) nodes in a Patricia
tree:
\begin{itemize}
\item $\leaf(x)$, containing value $x$ of type $X$.
\item $\node(l,s_l,r,s_r)$, where $l$ is the (reference to the) left
child or subtree, $s_l$ is the bitstring labeling the edge
connecting this vertex to its left child (always beginning with 0),
$r$ is the right subtree, and $s_r$ is the bitstring labeling the
edge to the right child (always beginning with 1).
\end{itemize}
A third type of node, to be used only once at the root of the Patricia
tree, is also necessary: