Program reads a set of n rooted trees given in the Newick format and for each pair of trees resolves so-called maximum agreement subtree problem – it calculates the minimum number of leaves that should be removed from each tree to make them isomorphic.
Here are two illustrative trees, both having 10 identical leaves labelled from 1 to 10:
(4,(8,1,(5,3)),(9,2,(10,(7,6))));
(10,(8,(9,(5,4)),(6,2,3)),(7,1));
Given trees are stored in the memory in the form of general trees having one pointer to the child node, one pointer to the brother node and one pointer to the parent node. Below is shown such representation of two previously mentioned trees:
Pointers (arrows) pointing downwards represent parent-child relation; pointers ponting leftwards represent brother-brother relation. Pointers pointing upwards represent son-parent relation – every node except the root has such pointer.
All nodes in every tree are given labels – a number for leaves (e.g. 2) and a combination of ‘I’ letter and a number for internal nodes (e.g. I4).
For each pair of trees:
- a square array is created where dimensions contain all vertices of first and second tree respectively
- the array is filled according to the following rules:
- leaf-leaf comparison: if leaves have same label – put 1, else – put 0
- internal node-leaf comparison (and vice versa): if inside an internal node (i.e. in a subtree where the current internal node is the root) of one tree there is a leaf corresponding to the leaf of the second tree – put 1, else – put 0
- internal node-internal node comparison: three possibilities need to be checked and the algorithm chooses the one giving best result (i.e. the largest number of corresponding leaves, which is then inserted into the array):
- check if one of the sons of AX node can be identified with the BX node
- check if one of the sons of BX node can be identified with the AX node
- identify the sons of AX node with the sons of BX node (not necessarily all) so that the number of identical nodes is as large as possible
The last comparison – comparison of the roots – gives the number of leaves of the maximum agreement subtree. Subtracting it from the initial number of leaves in one tree gives the result.
File test_input.txt contains example of input – 100 trees in Newick format.