-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathmiles_span.w
1744 lines (1553 loc) · 68.4 KB
/
miles_span.w
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% This file is part of the Stanford GraphBase (c) Stanford University 1993
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
@i gb_types.w
\def\title{MILES\_\,SPAN}
\def\<#1>{$\langle${\rm#1}$\rangle$}
\prerequisite{GB\_\,MILES}
@* Minimum spanning trees.
A classic paper by R. L. Graham and Pavol Hell about the history of
@^Graham, Ronald Lewis@>
@^Hell, Pavol@>
algorithms to find the minimum-length spanning tree of a graph
[{\sl Annals of the History of Computing \bf7} (1985), 43--57]
describes three main approaches to that problem. Algorithm~1,
``two nearest fragments,'' repeatedly adds a shortest edge that joins
two hitherto unconnected fragments of the graph; this algorithm was
first published by J.~B. Kruskal in 1956. Algorithm~2, ``nearest
@^Kruskal, Joseph Bernard@>
neighbor,'' repeatedly adds a shortest edge that joins a particular
fragment to a vertex not in that fragment; this algorithm was first
published by V. Jarn\'{\i}k in 1930. Algorithm~3, ``all nearest
@^Jarn{\'\i}k, Vojt\u ech@>
fragments,'' repeatedly adds to each existing fragment the shortest
edge that joins it to another fragment; this method, seemingly the
most sophisticated in concept, also turns out to be the oldest,
being first published by Otakar Bor{\accent23u}vka in 1926.
@^Bor{\accent23u}vka, Otakar@>
The present program contains simple implementations of all three
approaches, in an attempt to make practical comparisons of how
they behave on ``realistic'' data. One of the main goals of this
program is to demonstrate a simple way to make machine-independent
comparisons of programs written in \CEE/, by counting memory
references or ``mems.'' In other words, this program is intended
to be read, not just performed.
The author believes that mem counting sheds considerable light on
the problem of determining the relative efficiency of competing
algorithms for practical problems. He hopes other researchers will
enjoy rising to the challenge of devising algorithms that find minimum
spanning trees in significantly fewer mem units than the algorithms
presented here, on problems of the size considered here.
Indeed, mem counting promises to be significant for combinatorial
algorithms of all kinds. The standard graphs available in the
Stanford GraphBase should make it possible to carry out a large
number of machine-independent experiments concerning the practical
efficiency of algorithms that have previously been studied
only asymptotically.
@ The graphs we will deal with are produced by the |miles| subroutine,
found in the {\sc GB\_\,MILES} module. As explained there,
|miles(n,north_weight,west_weight,pop_weight,0,max_degree,seed)| produces a
graph of |n<=128| vertices based on the driving distances between
North American cities. By default we take |n=100|, |north_weight=west_weight
=pop_weight=0|, and |max_degree=10|; this gives billions of different sparse
graphs, when different |seed| values are specified, since a different
random number seed generally results in the selection of another
one of the $\,128\,\choose100$ possible subgraphs.
The default parameters can be changed by specifying options on the
command line, at least in a \UNIX/ implementation, thereby obtaining a
variety of special effects. For example, the value of |n| can be
raised or lowered and/or the graph can be made more or less sparse.
The user can bias the selection by ranking cities according to their
population and/or position, if nonzero values are given to any of the
parameters |north_weight|, |west_weight|, or |pop_weight|.
Command-line options \.{-n}\<number>, \.{-N}\<number>, \.{-W}\<number>,
\.{-P}\<number>, \.{-d}\<number>, and \.{-s}\<number>
are used to specify non-default values of the respective quantities |n|,
|north_weight|, |west_weight|, |pop_weight|, |max_degree|, and |seed|.
If the user specifies a \.{-r} option, for example by saying `\.{miles\_span}
\.{-r10}', this program will investigate the spanning trees of a
series of, say, 10 graphs having consecutive |seed| values. (This
option makes sense only if |north_weight=west_weight=pop_weight=0|,
because |miles| chooses the top |n| cities by weight. The procedure rarely
needs to use random numbers to break ties when the weights are nonzero,
because cities rarely have exactly the same weight in that case.)
The special command-line option \.{-g}$\langle\,$filename$\,\rangle$
overrides all others. It substitutes an external graph previously saved by
|save_graph| for the graphs produced by |miles|.
@^UNIX dependencies@>
Here is the overall layout of this \CEE/ program:
@p
#include "gb_graph.h" /* the GraphBase data structures */
#include "gb_save.h" /* |restore_graph| */
#include "gb_miles.h" /* the |miles| routine */
@h@#
@<Global variables@>@;
@<Procedures to be declared early@>@;
@<Priority queue subroutines@>@;
@<Subroutines@>@;
main(argc,argv)
int argc; /* the number of command-line arguments */
char *argv[]; /* an array of strings containing those arguments */
{@+unsigned long n=100; /* the desired number of vertices */
unsigned long n_weight=0; /* the |north_weight| parameter */
unsigned long w_weight=0; /* the |west_weight| parameter */
unsigned long p_weight=0; /* the |pop_weight| parameter */
unsigned long d=10; /* the |max_degree| parameter */
long s=0; /* the random number seed */
unsigned long r=1; /* the number of repetitions */
char *file_name=NULL; /* external graph to be restored */
@<Scan the command-line options@>;
while (r--) {
if (file_name) g=restore_graph(file_name);
else g=miles(n,n_weight,w_weight,p_weight,0L,d,s);
if (g==NULL || g->n<=1) {
fprintf(stderr,"Sorry, can't create the graph! (error code %ld)\n",
panic_code);
return -1; /* error code 0 means the graph is too small */
}
@<Report the number of mems needed to compute a minimum spanning tree
of |g| by various algorithms@>;
gb_recycle(g);
s++; /* increase the |seed| value */
}
return 0; /* normal exit */
}
@ @<Global...@>=
Graph *g; /* the graph we will work on */
@ @<Scan the command-line options@>=
while (--argc) {
@^UNIX dependencies@>
if (sscanf(argv[argc],"-n%lu",&n)==1) ;
else if (sscanf(argv[argc],"-N%lu",&n_weight)==1) ;
else if (sscanf(argv[argc],"-W%lu",&w_weight)==1) ;
else if (sscanf(argv[argc],"-P%lu",&p_weight)==1) ;
else if (sscanf(argv[argc],"-d%lu",&d)==1) ;
else if (sscanf(argv[argc],"-r%lu",&r)==1) ;
else if (sscanf(argv[argc],"-s%ld",&s)==1) ;
else if (strcmp(argv[argc],"-v")==0) verbose=1;
else if (strncmp(argv[argc],"-g",2)==0) file_name=argv[argc]+2;
else {
fprintf(stderr,
"Usage: %s [-nN][-dN][-rN][-sN][-NN][-WN][-PN][-v][-gfoo]\n",
argv[0]);
return -2;
}
}
if (file_name) r=1;
@ We will try out four basic algorithms that have received prominent
attention in the literature. Graham and Hell's Algorithm~1 is represented
by the |krusk| procedure, which uses Kruskal's algorithm after the
edges have been sorted by length with a radix sort. Their Algorithm~2
is represented by the |jar_pr| procedure, which incorporates a
priority queue structure that we implement in two ways, either as
a simple binary heap or as a Fibonacci heap. And their Algorithm~3
is represented by the |cher_tar_kar| procedure, which implements a
method similar to Bor{\accent23u}vka's that was independently
discovered by Cheriton and Tarjan and later simplified and refined by
Karp and Tarjan.
@^Cheriton, David Ross@>
@^Tarjan, Robert Endre@>
@^Karp, Richard Manning@>
@d INFINITY (unsigned long)-1
/* value returned when there's no spanning tree */
@<Report the number...@>=
printf("The graph %s has %ld edges,\n",g->id,g->m/2);
sp_length=krusk(g);
if (sp_length==INFINITY) printf(" and it isn't connected.\n");
else printf(" and its minimum spanning tree has length %ld.\n",sp_length);
printf(" The Kruskal/radix-sort algorithm takes %ld mems;\n",mems);
@<Execute |jar_pr(g)| with binary heaps as the priority queue algorithm@>;
printf(" the Jarnik/Prim/binary-heap algorithm takes %ld mems;\n",mems);
@<Allocate additional space needed by the more complex algorithms;
or |goto done| if there isn't enough room@>;
@<Execute |jar_pr(g)| with Fibonacci heaps as
the priority queue algorithm@>;
printf(" the Jarnik/Prim/Fibonacci-heap algorithm takes %ld mems;\n",mems);
if (sp_length!=cher_tar_kar(g)) {
if (gb_trouble_code) printf(" ...oops, I've run out of memory!\n");
else printf(" ...oops, I've got a bug, please fix fix fix\n");
return -3;
}
printf(" the Cheriton/Tarjan/Karp algorithm takes %ld mems.\n\n",mems);
done:;
@ @<Glob...@>=
unsigned long sp_length; /* length of the minimum spanning tree */
@ When the |verbose| switch is nonzero, edges found by the various
algorithms will call the |report| subroutine.
@<Sub...@>=
report(u,v,l)
Vertex *u,*v; /* adjacent vertices in the minimum spanning tree */
long l; /* the length of the edge between them */
{ printf(" %ld miles between %s and %s [%ld mems]\n",
l,u->name,v->name,mems);
}
@*Strategies and ground rules.
Let us say that a {\sl fragment\/} is any subtree of a minimum
spanning tree. All three algorithms we implement make use of a basic
principle first stated in full generality by R.~C. Prim in 1957:
@^Prim, Robert Clay@>
``If a fragment~$F$ does not include all the vertices, and if $e$~is
a shortest edge joining $F$ to a vertex not in~$F$, then $F\cup e$
is a fragment.'' To prove Prim's principle, let $T$ be a minimum
spanning tree that contains $F$ but not~$e$. Adding $e$ to~$T$ creates
a circuit containing some edge $e'\ne e$, where $e'$ runs from a vertex
in~$F$ to a vertex not in~$F$. Deleting $e'$ from
$T\cup e$ produces a spanning tree~$T'$ of total length no larger
than the total length of~$T$. Hence $T'$ is a minimum spanning
tree containing $F\cup e$, QED.
@ The graphs produced by |miles| have special properties, and it is fair game
to make use of those properties if we can.
First, the length of each edge is a positive integer less than $2^{12}$.
Second, the $k$th vertex $v_k$ of the graph is represented in \CEE/ programs by
the pointer expression |g->vertices+k|. If weights have been assigned,
these vertices will be in order by weight. For example, if |north_weight=1|
but |west_weight=pop_weight=0|, vertex $v_0$ will be the most northerly city
and vertex $v_{n-1}$ will be the most southerly.
Third, the edges accessible from a vertex |v| appear in a linked list
starting at |v->arcs|. An edge from |v| to $v_j$ will precede an
edge from |v| to $v_k$ in this list if and only if $j>k$.
Fourth, the vertices have coordinates |v->x_coord| and |v->y_coord|
that are correlated with the length of edges between them: The
Euclidean distance between the coordinates of two vertices tends to be small
if and only if those vertices are connected by a relatively short edge.
(This is only a tendency, not a certainty; for example, some cities
around Chesapeake Bay are fairly close together as the crow flies, but not
within easy driving range of each other.)
Fifth, the edge lengths satisfy the triangle inequality: Whenever
three edges form a cycle, the longest is no longer than the sum of
the lengths of the two others. (It can be proved that
the triangle inequality is of no use in finding minimum spanning
trees; we mention it here only to exhibit yet another way in which
the data produced by |miles| is known to be nonrandom.)
Our implementation of Kruskal's algorithm will make use of the first
property, and it also uses part of the third to avoid considering an
edge more than once. We will not exploit the other properties, but a
reader who wants to design algorithms that use fewer mems to find minimum
spanning trees of these graphs is free to use any idea that helps.
@ Speaking of mems, here are the simple \CEE/ instrumentation macros that we
use to count memory references. The macros are called |o|, |oo|, |ooo|,
and |oooo|; hence Jon Bentley has called this a ``little oh analysis.''
@^Bentley, Jon Louis@>
Implementors who want to count mems are supposed to say, e.g., `|oo|,'
just before an assignment statement or boolean expression that makes
two references to memory. The \CEE/ preprocessor will convert this
to a statement that increases |mems| by~2 as that statement or expression
is evaluated.
The semantics of \CEE/ tell us that the evaluation of an expression
like `|a&&(o,a->len>10)|' will increment |mems| if and only if the
pointer variable~|a| is non-null. Warning: The parentheses are very
important in this example, because \CEE/'s operator |&&| (i.e.,
\.{\&\&}) has higher precedence than comma.
Values of significant variables, like |a| in the previous example,
can be assumed to be in ``registers,'' and no charge is made for
arithmetic computations that involve only registers. But the total
number of registers in an implementation must be finite and fixed,
independent of the problem size.
@^discussion of \\{mems}@>
\CEE/ does not allow the |o| macros to appear in declarations, so we cannot
take full advantage of \CEE/'s initialization mechanism when we are
counting mems. But it's easy to initialize variables in separate
statements after the declarations are done.
@d o mems++
@d oo mems+=2
@d ooo mems+=3
@d oooo mems+=4
@<Glob...@>=
long mems; /* the number of memory references counted */
@ Examples of these mem-counting conventions appear throughout the
program that follows. Some people will undoubtedly ask why the insertion of
macros by hand is being recommended here, when it would be possible to
develop a fancy system that counts mems automatically. The author
believes that it is best to rely on programmers to introduce |o| and
|oo|, etc., by themselves, for several reasons. (1)~The macros can be
inserted easily and quickly using a text editor. (2)~An implementation
need not pay for mems that could be avoided by a suitable optimizing
compiler or by making the \CEE/ program text slightly more complex;
thus, authors can use their good judgment to keep programs more
readable than if the code were overly hand-optimized. (3)~The
programmer should be able to see exactly where mems are being charged,
as an aid to bottleneck elimination. Occurrences of |o| and |oo| make
this plain without messing up the program text. (4)~An implementation
need not be charged for mems that merely provide diagnostic output, or
mems that do redundant computations just to double-check the validity
of ``proven'' assertions as a program is being tested.
@^discussion of \\{mems}@>
Computer architecture is converging rapidly these days to the
design of machines in which the exact running time of a program
depends on complicated interactions between pipelined circuitry and
the dynamic properties of cache mapping in a memory hierarchy,
not to mention the effects of compilers and operating systems.
But a good approximation to running time is usually obtained if we
assume that the amount of computation is proportional to the activity
of the memory bus between registers and main memory. This
approximation is likely to get even better in the future, as
RISC computers get faster and faster in comparison to memory devices.
Although the mem measure is far from perfect, it appears to be
significantly less distorted than any other measurement that can
be obtained without considerably more work. An implementation that
is designed to use few mems will almost certainly be efficient
on today's sequential computers, as well as on the sequential computers
we can expect to be built in the foreseeable future. And the converse
statement is even more true: An algorithm that runs fast will not
consume many mems.
Of course authors are expected to be reasonable and fair when they
are competing for minimum-mem prizes. They must be ready to
submit their programs to inspection by impartial judges. A good
algorithm will not need to abuse the spirit of realistic mem-counting.
Mems can be analyzed theoretically as well as empirically.
This means we can attach constants to estimates of running time, instead of
always resorting to $O$~notation.
@*Kruskal's algorithm.
The first algorithm we shall implement and instrument is the simplest:
It considers the edges one by one in order of nondecreasing length,
selecting each edge that does not form a cycle with previously
selected edges.
We know that the edge lengths are less than $2^{12}$, so we can sort them
into order with two passes of a $2^6$-bucket radix sort.
We will arrange to have them appear in the buckets as linked lists
of |Arc| records; the two utility fields of an |Arc| will be called
|from| and |klink|, respectively.
@d from a.V /* an edge goes from vertex |a->from| to vertex |a->tip| */
@d klink b.A /* the next longer edge after |a| will be |a->klink| */
@<Put all the edges into |bucket[0]| through |bucket[63]|@>=
o,n=g->n;
for (l=0;l<64;l++) oo,aucket[l]=bucket[l]=NULL;
for (o,v=g->vertices;v<g->vertices+n;v++)
for (o,a=v->arcs;a&&(o,a->tip>v);o,a=a->next) {
o,a->from=v;
o,l=a->len&0x3f; /* length mod 64 */
oo,a->klink=aucket[l];
o,aucket[l]=a;
}
for (l=63;l>=0;l--)
for (o,a=aucket[l];a;) {@+register long ll;
register Arc *aa=a;
o,a=a->klink;
o,ll=aa->len>>6; /* length divided by 64 */
oo,aa->klink=bucket[ll];
o,bucket[ll]=aa;
}
@ @<Glob...@>=
Arc *aucket[64], *bucket[64]; /* heads of linked lists of arcs */
@ Kruskal's algorithm now takes the following form.
@<Sub...@>=
unsigned long krusk(g)
Graph *g;
{@+@<Local variables for |krusk|@>@;@#
mems=0;
@<Put all the edges...@>;
if (verbose) printf(" [%ld mems to sort the edges into buckets]\n",mems);
@<Put all the vertices into components by themselves@>;
for (l=0;l<64;l++)
for (o,a=bucket[l];a;o,a=a->klink) {
o,u=a->from;
o,v=a->tip;
@<If |u| and |v| are already in the same component, |continue|@>;
if (verbose) report(a->from,a->tip,a->len);
o,tot_len+=a->len;
if (--components==1) return tot_len;
@<Merge the components containing |u| and |v|@>;
}
return INFINITY; /* the graph wasn't connected */
}
@ Lest we forget, we'd better declare all the local variables we've
been using.
@<Local variables for |krusk|@>=
register Arc *a; /* current edge of interest */
register long l; /* current bucket of interest */
register Vertex *u,*v,*w; /* current vertices of interest */
unsigned long tot_len=0; /* total length of edges already chosen */
long n; /* the number of vertices */
long components;
@ The remaining things that |krusk| needs to do are easily recognizable
as an application of ``equivalence algorithms'' or ``union/find''
data structures. We will use a simple approach whose average running
time on random graphs was shown to be linear by Knuth and Sch\"onhage
@^Knuth, Donald Ervin@>
@^Sch\"onhage, Arnold@>
in {\sl Theoretical Computer Science\/ \bf 6} (1978), 281--315.
The vertices of each component (that is, of each connected fragment defined by
the edges selected so far) will be linked circularly by |clink| pointers.
Each vertex also has a |comp| field that points to a unique vertex
representing its component. Each component representative also has
a |csize| field that tells how many vertices are in the component.
@d clink z.V /* pointer to another vertex in the same component */
@d comp y.V /* pointer to component representative */
@d csize x.I /* size of the component (maintained only for representatives) */
@<If |u| and |v| are already in the same component, |continue|@>=
if (oo,u->comp==v->comp) continue;
@ We don't need to charge any mems for fetching |g->vertices|, because
|krusk| has already referred to it.
@^discussion of \\{mems}@>
@<Put all the vertices...@>=
for (v=g->vertices;v<g->vertices+n;v++) {
oo,v->clink=v->comp=v;
o,v->csize=1;
}
components=n;
@ The operation of merging two components together requires us to
change two |clink| pointers, one |csize| field, and the |comp|
fields in each vertex of the smaller component.
Here we charge two mems for the first |if| test, since |u->csize| and
|v->csize| are being fetched from memory. Then we charge only one mem
when |u->csize| is being updated, since the values being added together
have already been fetched. True, the compiler has to be smart to
realize that it's safe to add the fetched values |u->csize+v->csize|
even though |u| and |v| might have been swapped in the meantime;
but we are assuming that the compiler is extremely clever. (Otherwise we
would have to clutter up our program every time we don't trust the compiler.
After all, programs that count mems are intended primarily to be read.
They aren't intended for production jobs.) % Prim-arily?
@^discussion of \\{mems}@>
@<Merge the components containing |u| and |v|@>=
u=u->comp; /* |u->comp| has already been fetched from memory */
v=v->comp; /* ditto for |v->comp| */
if (oo,u->csize<v->csize) {
w=u;@+u=v;@+v=w;
} /* now |v|'s component is smaller than |u|'s (or equally small) */
o,u->csize+=v->csize;
o,w=v->clink;
oo,v->clink=u->clink;
o,u->clink=w;
for (;;o,w=w->clink) {
o,w->comp=u;
if (w==v) break;
}
@* Jarn{\'\i}k and Prim's algorithm.
A second approach to minimum spanning trees is also pretty simple,
except for one technicality: We want to write it in a sufficiently
general manner that different priority queue algorithms can be plugged in.
The basic idea is to choose an arbitrary vertex $v_0$ and connect it to its
nearest neighbor~$v_1$, then to connect that fragment to its nearest
neighbor~$v_2$, and so on. A priority queue holds all vertices that
are adjacent to but not already in the current fragment; the key value
stored with each vertex is its distance to the current fragment.
We want the priority queue data structure to support the four
operations |init_queue(d)|, |enqueue(v,d)|, |requeue(v,d)|, and
|del_min()|, described in the {\sc GB\_\,DIJK} module. Dijkstra's
algorithm for shortest paths, described there, is remarkably similar
to Jarn{\'\i}k and Prim's algorithm for minimum spanning trees; in
fact, Dijkstra discovered the latter algorithm independently, at the
@^Dijkstra, Edsger Wybe@>
same time as he came up with his procedure for shortest paths.
As in {\sc GB\_\,DIJK}, we define pointers to priority queue subroutines
so that the queueing mechanism can be varied.
@d dist z.I /* this is the key field for vertices in the priority queue */
@d backlink y.V /* this vertex is the stated |dist| away */
@<Glob...@>=
void @[@] (*init_queue)(); /* create an empty priority queue */
void @[@] (*enqueue)(); /* insert a new element in the priority queue */
void @[@] (*requeue)(); /* decrease the key of an element in the queue */
Vertex *(*del_min)(); /* remove an element with smallest key */
@ The vertices in this algorithm are initially ``unseen''; they become
``seen'' when they enter the priority queue, and finally ``known''
when they leave it and enter the current fragment.
We will put a special constant in the |backlink| field
of known vertices. A vertex will be unseen if and only if its
|backlink| is~|NULL|.
@d KNOWN (Vertex*)1 /* special |backlink| to mark known vertices */
@<Sub...@>=
unsigned long jar_pr(g)
Graph *g;
{@+register Vertex *t; /* vertex that is just becoming known */
long fragment_size; /* number of vertices in the tree so far */
unsigned long tot_len=0; /* sum of edge lengths in the tree so far */
mems=0;
@<Make |t=g->vertices| the only vertex seen; also make it known@>;
while (fragment_size<g->n) {
@<Put all unseen vertices adjacent to |t| into the queue,
and update the distances of the other vertices adjacent to~|t|@>;
t=(*del_min)();
if (t==NULL) return INFINITY; /* the graph is disconnected */
if (verbose) report(t->backlink,t,t->dist);
o,tot_len+=t->dist;
o,t->backlink=KNOWN;
fragment_size++;
}
return tot_len;
}
@ Notice that we don't charge any mems for the subroutine call
to |init_queue|, except for mems counted in the subroutine itself.
What should we charge in general for subroutine linkage when we are
counting mems? The parameters to subroutines generally go into
registers, and registers are ``free''; also, a compiler can often
choose to implement a procedure in line, thereby reducing the
overhead to zero. Hence, the recommended method for charging mems
with respect to subroutines is: Charge nothing if the subroutine
is not recursive; otherwise charge twice the number of things that need
to be saved on a runtime stack. (The return address is one of the
things that needs to be saved.)
@^discussion of \\{mems}@>
@<Make |t=g->vertices| the only vertex seen; also make it known@>=
for (oo,t=g->vertices+g->n-1;t>g->vertices;t--) o,t->backlink=NULL;
o,t->backlink=KNOWN;
fragment_size=1;
(*init_queue)(0L); /* make the priority queue empty */
@ @<Put all unseen vertices adjacent to |t| into the queue,
and update the distances of the other vertices adjacent to~|t|@>=
{@+register Arc *a; /* an arc leading from |t| */
for (o,a=t->arcs; a; o,a=a->next) {
register Vertex *v; /* a vertex adjacent to |t| */
o,v=a->tip;
if (o,v->backlink) { /* |v| has already been seen */
if (v->backlink>KNOWN) {
if (oo,a->len<v->dist) {
o,v->backlink=t;
(*requeue)(v,a->len); /* we found a better way to get there */
}
}
}@+else { /* |v| hasn't been seen before */
o,v->backlink=t;
o,(*enqueue)(v,a->len);
}
}
}
@*Binary heaps.
To complete the |jar_pr| routine, we need to fill in the four
priority queue functions. Jarn{\'\i}k wrote his original paper before
computers were known; Prim and Dijkstra wrote theirs before efficient priority
queue algorithms were known. Their original algorithms therefore
took $\Theta(n^2)$ steps.
Kerschenbaum and Van Slyke pointed out in 1972 that binary heaps could
@^Kerschenbaum, A.@>
@^Van Slyke, Richard Maurice@>
do better. A simplified version of binary heaps (invented by Williams
@^Williams, John William Joseph@>
in 1964) is presented here.
A binary heap is an array of $n$ elements, and we need space for it.
Fortunately the space is already there; we can use utility field
|u| in each of the vertex records of the graph. Moreover, if
|heap_elt(i)| points to vertex~|v|, we will arrange things so that
|v->heap_index=i|.
@d heap_elt(i) (gv+i)->u.V /* the |i|th vertex of the heap; |gv=g->vertices| */
@d heap_index v.I
/* the |v| utility field says where a vertex is in the heap */
@<Glob...@>=
Vertex *gv; /* |g->vertices|, the base of the heap array */
long hsize; /* the number of elements currently in the heap */
@ To initialize the heap, we need only initialize two ``registers'' to
known values, so we don't have to charge any mems at all. (In a production
implementation, this code would appear in-line as part of the
spanning tree algorithm.)
@^discussion of \\{mems}@>
Important Note: This routine refers to the global variable |g|, which is
set in |main| (not in |jar_pr|). Suitable changes need to be made
if these binary heap routines are used in other programs.
@<Priority queue subroutines@>=
void init_heap(d) /* makes the heap empty */
long d;
{
gv=g->vertices;
hsize=0;
}
@ The key invariant property that makes heaps work is
$$\hbox{|heap_elt(k/2)->dist<=heap_elt(k)->dist|, \qquad for |1<k<=hsize|.}$$
(A reader who has not seen heap ordering before should stop at this
point and study the beautiful consequences of this innocuously simple
set of inequalities.) The enqueueing operation turns out to be quite simple:
@<Priority queue subroutines@>=
void enq_heap(v,d)
Vertex *v; /* vertex that is entering the queue */
long d; /* its key (aka |dist|) */
{@+register unsigned long k; /* position of a ``hole'' in the heap */
register unsigned long j; /* the parent of that position */
register Vertex *u; /* |heap_elt(j)| */
o,v->dist=d;
k=++hsize;
j=k>>1; /* |k/2| */
while (j>0 && (oo,(u=heap_elt(j))->dist>d)) {
o,heap_elt(k)=u; /* the hole moves to parent position */
o,u->heap_index=k;
k=j;
j=k>>1;
}
o,heap_elt(k)=v;
o,v->heap_index=k;
}
@ And in fact, the general requeueing operation is almost identical to
enqueueing. This operation is popularly called ``siftup,'' because
the vertex whose key is being reduced may displace its ancestors
higher in the heap. We could have implemented enqueueing by first
placing the new element at the end of the heap, then requeueing it;
that would have cost at most a couple mems more.
@<Priority queue subroutines@>=
void req_heap(v,d)
Vertex *v; /* vertex whose key is being reduced */
long d; /* its new |dist| */
{@+register unsigned long k; /* position of a ``hole'' in the heap */
register unsigned long j; /* the parent of that position */
register Vertex *u; /* |heap_elt(j)| */
o,v->dist=d;
o,k=v->heap_index; /* now |heap_elt(k)=v| */
j=k>>1; /* |k/2| */
if (j>0 && (oo,(u=heap_elt(j))->dist>d)) { /* change is needed */
do@+{
o,heap_elt(k)=u; /* the hole moves to parent position */
o,u->heap_index=k;
k=j;
j=k>>1; /* |k/2| */
}@+while (j>0 && (oo,(u=heap_elt(j))->dist>d));
o,heap_elt(k)=v;
o,v->heap_index=k;
}
}
@ Finally, the procedure for removing the vertex with smallest key is
only a bit more difficult. The vertex to be removed is always
|heap_elt(1)|. After we delete it, we ``sift down'' |heap_elt(hsize)|,
until the basic heap inequalities hold once again.
At a crucial point in this process, we have |j->dist<u->dist|. We cannot
then have
|j=hsize+1|, because the previous steps have made |(hsize+1)->dist=u->dist=d|.
@<Prior...@>=
Vertex *del_heap()
{@+Vertex *v; /* vertex to return */
register Vertex *u; /* vertex being sifted down */
register unsigned long k; /* hole in the heap */
register unsigned long j; /* child of that hole */
register long d; /* |u->dist|, the key of the vertex being sifted */
if (hsize==0) return NULL;
o,v=heap_elt(1);
o,u=heap_elt(hsize--);
o,d=u->dist;
k=1;
j=2;
while (j<=hsize) {
if (oooo,heap_elt(j)->dist>heap_elt(j+1)->dist) j++;
if (heap_elt(j)->dist>=d) break;
o,heap_elt(k)=heap_elt(j); /* NB: we cannot have |j>hsize|, see above */
o,heap_elt(k)->heap_index=k;
k=j; /* the hole moves to child position */
j=k<<1; /* |2k| */
}
o,heap_elt(k)=u;
o,u->heap_index=k;
return v;
}
@ OK, here's how we plug binary heaps into Jarn{\'\i}k/Prim.
@<Execute |jar_pr(g)| with binary heaps as the priority queue algorithm@>=
init_queue=init_heap;
enqueue=enq_heap;
requeue=req_heap;
del_min=del_heap;
if (sp_length!=jar_pr(g)) {
printf(" ...oops, I've got a bug, please fix fix fix\n");
return -4;
}
@*Fibonacci heaps.
The running time of Jarn{\'\i}k/Prim with binary heaps, when the algorithm is
applied to a connected graph with $n$ vertices and $m$ edges, is $O(m\log n)$,
because the total number of operations is $O(m+n)=O(m)$ and each
heap operation takes at most $O(\log n)$ time.
Fibonacci heaps were invented by Fredman and Tarjan in 1984, in order
@^Fibonacci, Leonardo, heaps@>
@^Fredman, Michael Lawrence@>
@^Tarjan, Robert Endre@>
to do better than this. The Jarn{\'\i}k/Prim algorithm does $O(n)$
enqueueing operations, $O(n)$ delete-min operations, and $O(m)$
requeueing operations; so Fredman and Tarjan designed a data structure
that would support requeueing in ``constant amortized time.'' In other
words, Fibonacci heaps allow us to do $m$ requeueing operations with a
total cost of~$O(m)$, even though some of the individual requeueings
might take longer. The resulting asymptotic running time is then
$O(m+n\log n)$. (This turns out to be optimum within a constant
factor, when the same technique is applied to Dijkstra's algorithm for
shortest paths. But for minimum spanning trees the Fibonacci method is
not always optimum; for example, if $m\approx n\sqrt{\mathstrut\log n}$, the
algorithm of Cheriton and Tarjan has slightly better asymptotic
behavior, $O(m\log\log n)$.)
Fibonacci heaps are more complex than binary heaps, so we can expect
that overhead costs will make them non-competitive unless $m$ and $n$ are
quite large. Furthermore, it is not clear that the running time with simple
binary heaps will behave as $m\log n$ on realistic data, because
$O(m\log n)$ is a worst-case estimate based on rather pessimistic
assumptions. (For example, requeueing might rarely require many
iterations of the siftup loop.) But it will be instructive to
implement Fibonacci heaps as best we can, just to see how good they
look in actual practice.
Let us say that the {\sl rank\/} of a node in a forest is the number
of children it has. A Fibonacci heap is an unordered forest of trees
in which the key of each node is less than or equal to the key of each
child of that node, and in which the following further condition,
called property~F, also holds: The ranks $\{r_1,r_2,\ldots,r_k\}$ of the
children of every node of rank~$k$, when put into nondecreasing
order $r_1\le r_2\le\cdots\le r_k$, satisfy $r_j\ge j-2$ for all~$j$.
As a consequence of property F, we can prove by induction that every
node of rank~$k$ has at least $F_{k+2}$ descendants (including itself).
Therefore, for example, we cannot have a node of rank $\ge30$ unless
the total size of the forest is at least $F_{32}=2{,}178{,}309$. We cannot
have a node of rank $\ge46$ unless the total size of the forest
exceeds~$2^{32}$.
@ We will represent a Fibonacci heap with a rather elaborate data structure,
in order to guarantee the efficiency of all the necessary operations.
Each node will have four pointers: |parent|, the node's parent (or
|NULL| if the node is a root); |child|, one of the node's children
(or undefined if the node has no children); |lsib| and |rsib|, the
node's left and right siblings. The children of each node, and the
roots of the forest, are doubly linked by |lsib| and |rsib| in
circular lists; the nodes in these lists can appear in any convenient
order, and the |child| pointer can point to any child.
Besides the four pointers, there is a \\{rank} field, which tells how
many children exist, and a \\{tag} field, which is either 0 or~1.
Suppose a node has children of ranks $\{r_1,r_2,\ldots,r_k\}$, where
$r_1\le r_2\le\cdots\le r_k$. We know that $r_j\ge j-2$ for all~$j$;
we say that the node has $l$ {\sl critical\/} children if there are
$l$ cases of equality, where $r_j=j-2$. Our implementation will
guarantee that any node with $l$ critical children will have at
least $l$ tagged children of the corresponding ranks. For example,
suppose a node has seven children, of respective ranks $\{1,1,1,2,4,4,6\}$.
Then it has three critical children, because $r_3=1$, $r_4=2$, and
$r_6=4$. In our implementation, at least one of the children of
rank~1 will have $\\{tag}=1$, and so will the child of rank~2; so will
one of the children of rank~4.
There is an external pointer called |F_heap|, which indicates a node
whose key is smallest. (If the heap is empty, |F_heap| is~|NULL|.)
@<Prior...@>=
void init_F_heap(d)
long d;
{@+F_heap=NULL;@+}
@ @<Glob...@>=
Vertex *F_heap; /* pointer to the ring of root nodes */
@ We can save a bit of space and time by combining the \\{rank} and \\{tag}
fields into a single |rank_tag| field, which contains $\\{rank}*2+\\{tag}$.
Vertices in GraphBase graphs have six utility fields. That's just enough
for |parent|, |child|, |lsib|, |rsib|, |rank_tag|, and the key field
|dist|. But unfortunately we also need the |backlink| field, so
we are over the limit. That's not really so bad, however; we
can set up another array of $n$ records, and point to it. The
extra running time needed for indirect pointing does not have to
be charged to mems, because a production system involving Fibonacci
heaps would simply redefine |Vertex| records to have seven utility
fields instead of six. In this way we can simulate the behavior of larger
records without changing the basic GraphBase conventions.
@^discussion of \\{mems}@>
We will want an |Arc| record for each vertex in our next algorithm,
so we might as well allocate storage for it now even though Fibonacci
heaps need only two of the five fields.
@d newarc u.A /* |v->newarc| points to an |Arc| record associated with |v| */
@d parent newarc->tip
@d child newarc->a.V
@d lsib v.V
@d rsib w.V
@d rank_tag x.I
@<Allocate additional space needed by the more complex algorithms...@>=
{@+register Arc *aa;
register Vertex *uu;
aa=gb_typed_alloc(g->n,Arc,g->aux_data);
if (aa==NULL) {
printf(" and there isn't enough space to try the other methods.\n\n");
goto done;
}
for (uu=g->vertices;uu<g->vertices+g->n;uu++,aa++)
uu->newarc=aa;
}
@ The {\sl potential energy\/} of a Fibonacci heap, as we are
representing it, is defined to be the number of trees in the forest
plus twice the total number of tagged children. When we operate on a
heap, we will store potential energy to be used up later; then it will
be possible to do the later operations with only a small incremental
cost to the running time. (Potential energy is just a way to prove
that the amortized cost is small; it does not appear explicitly in our
implementation. It simply explains why the number of mems we compute
will always be $O(m+n\log n)$.)
Enqueueing is easy: We simply insert the new element as a new tree in
the forest. This costs a constant amount of time, including the cost of
one new unit of potential energy for the new tree.
We can assume that |F_heap->dist| appears in a register, so we need not
charge a mem to fetch~it.
@<Prior...@>=
void enq_F_heap(v,d)
Vertex *v; /* vertex that is entering the queue */
long d; /* its key (aka |dist|) */
{
o,v->dist=d;
o,v->parent=NULL;
o,v->rank_tag=0; /* |v->child| need not be set */
if (F_heap==NULL) {
oo,F_heap=v->lsib=v->rsib=v;
}@+else {@+register Vertex *u;
o,u=F_heap->lsib;
o,v->lsib=u;
o,v->rsib=F_heap;
oo,F_heap->lsib=u->rsib=v;
if (F_heap->dist>d) F_heap=v;
}
}
@ Requeueing is of medium difficulty. If the key is being decreased in
a root node, or if the decrease doesn't make the key less than the key
of its parent, no links need to change (except possibly |F_heap|
itself). Otherwise we detach the node and its descendants from its
present family and put this former subtree into the forest as a new
tree. (One unit of potential energy must be stored with it.)
The rank of the former parent, |p|, decreases by~1. If |p| is a root,
we're done. Otherwise if |p| was not tagged, we tag it (and pay for
two additional units of energy). Property~F still holds, because an
untagged node can always admit a decrease in rank. If |p| was tagged,
however, we detach |p| and its remaining descendants, making it another
new tree of the forest, with |p| no longer tagged. Removing the tag
releases enough stored energy to pay for the extra work of moving~|p|.
Then we must decrease the rank of |p|'s parent, and so on, until finally
we get to a root or to an untagged node. The total net cost is at most
three units of energy plus the cost of relinking the original node,
so it is $O(1)$.
We needn't clear the tag fields of root nodes, because we never
look at them.
@<Prior...@>=
void req_F_heap(v,d)
Vertex *v; /* vertex whose key is being reduced */
long d; /* its new |dist| */
{@+register Vertex *p,*pp; /* parent and grandparent of |v| */
register Vertex *u,*w; /* other vertices being modified */
register long r; /* twice the rank plus the tag */
o,v->dist=d;
o,p=v->parent;
if (p==NULL) {
if (F_heap->dist>d) F_heap=v;
}@+else if (o,p->dist>d)
while(1) {
o,r=p->rank_tag;
if (r>=4) /* |v| is not an only child */
@<Remove |v| from its family@>;
@<Insert |v| into the forest@>;
o,pp=p->parent;
if (pp==NULL) { /* the parent of |v| is a root */
o,p->rank_tag=r-2;@+break;
}
if ((r&1)==0) { /* the parent of |v| is untagged */
o,p->rank_tag=r-1;@+break; /* now it's tagged */
}@+else o,p->rank_tag=r-2; /* tagged parent will become a root */
v=p;@+p=pp;
}
}
@ @<Remove |v| from its family@>=
{
o,u=v->lsib;
o,w=v->rsib;
o,u->rsib=w;
o,w->lsib=u;
if (o,p->child==v) o,p->child=w;
}
@ @<Insert |v| into the forest@>=
o,v->parent=NULL;
o,u=F_heap->lsib;
o,v->lsib=u;
o,v->rsib=F_heap;
oo,F_heap->lsib=u->rsib=v;
if (F_heap->dist>d) F_heap=v; /* this can happen only with the original |v| */
@ The |del_min| operation is even more interesting; this, in fact,
is where most of the action lies. We know that |F_heap| points to the
vertex~$v$ we will be deleting. That's nice, but we need to figure out
the new value of |F_heap|. So we have to look at all the children of~$v$
and at all the root nodes in the forest. We have stored up enough
potential energy to do that, but we can reclaim the potential only if
we rebuild the Fibonacci heap so that the rebuilt version contains
relatively few trees.
The solution is to make sure that the new heap has at most one root
of each rank. Whenever we have two tree roots of equal rank, we can
make one the child of the other, thus reducing the number of
trees by~1. (The new child does not violate Property~F, nor is it
critical, so we can mark it untagged.) The largest rank is always
$O(\log n)$, if there are $n$ nodes altogether, and we can afford to
pay $\log n$ units of time for the work that isn't reclaimed from
potential energy.
An array of pointers to roots of known rank is used to help control
this part of the process.
@<Glob...@>=
Vertex *new_roots[46]; /* big enough for queues of size $2^{32}$ */
@ @<Prio...@>=
Vertex *del_F_heap()
{@+Vertex *final_v=F_heap; /* the node to return */
register Vertex *t,*u,*v,*w; /* registers for manipulation of links */
register long h=-1; /* the highest rank present in |new_roots| */
register long r; /* rank of current tree */
if (F_heap) {
if (o,F_heap->rank_tag<2) o,v=F_heap->rsib;
else {
o,w=F_heap->child;
o,v=w->rsib;
oo,w->rsib=F_heap->rsib;
/* link children of deleted node into the list */
for (w=v;w!=F_heap->rsib;o,w=w->rsib)
o,w->parent=NULL;
}
while (v!=F_heap) {
o,w=v->rsib;
@<Put the tree rooted at |v| into the |new_roots| forest@>;
v=w;
}
@<Rebuild |F_heap| from |new_roots|@>;
}
return final_v;
}
@ The work we do in this step is paid for by the unit of potential
energy being freed as |v| leaves the old forest, except for the
work of increasing~|h|; we charge the latter to the $O(\log n)$ cost of
building |new_roots|.
@<Put the tree rooted at |v| into the |new_roots| forest@>=