Algorithm for determining the genealogical relationship (consanguinity) between two members in a family tree, all contained within a simple PHP class.
This algorithm was created with one purpose in mind: to find the relation between one person to another person within a family tree. For example, if person a was born two generations before person b in a family tree, and person a is person b's direct ancestor, then person a is person b's grandparent. I have a working example of this algorithm with my own family here. When designing this algorithm, I followed the relationships outlined in this table of consanguinity from Wikipedia:
In it's simplest form, this algorithm's dataset consists of members that make up a family tree, each with a unique id and the id of their parent. The family tree needs to have a root ancestor, which the tree will stem down from. The root ancestor needs to have an id of 1 and it's parent id is to be set to 0, as it has no parent within the dataset. In any practical implementation, it would be wise to give each member a name, like below. An example dataset with 6 members:
Name | ID | Parent ID |
John Doe (root) | 1 | 0 |
Mary Doe | 2 | 1 |
Jane Doe | 3 | 1 |
Bob Doe | 4 | 1 |
Robert Doe | 5 | 2 |
Betty Doe | 6 | 4 |
Tree Hierarchy
John Doe - 1
Mary Doe - 2
Robert Doe - 5
Jane Doe - 3
Bob Doe - 4
Betty Doe - 6
In the above example, there are 3 generations, with John Doe having 3 children and 2 grandchildren. Generation 1 always has only one member, the root ancestor. All of the root ancestor's children are in generation 2, the root ancestor's grandchildren are in generation 3, and so on.
Along with the main algorithm, there are 4 "sub" algorithms to simplify the process.
General Definitions:
-
Tree
The tree, or family tree, is the dataset with the need information for the algorithm to work, and makes up a directed graph with nodes pointing to their parents.
-
Member
A member is a node, or vertex, within the tree. One node makes up one person.
-
Root Ancestor
The member in the tree from which the rest of the tree stems from.
The following are the dataset definitions, or the information each node needs to have and others information that can be added. Necessary fields are in italics:
-
ID
Each member needs to have a unique ID. The root ancestor always needs to have their ID set to 1.
-
Parent ID
Each member also needs to have a Parent ID, which makes up the hierarchy within the tree. The root ancestor's Parent ID needs to be set to 0, as they have no parent in the tree.
-
Name
Though technically not required, in any practical use of this algorithm, you will want to supply each member a name.
-
Gender/Sex
This is useful to have, as it allows you to use more specific pronouns, e.g. "mother" instead of "parent", "brother" vs. "sibling", or "aunt" vs. "aunt/uncle". The way I did it was 0 for male, 1 for female, and 2 for unspecified.
-
Spouse
If member is married, this can also be useful to add, and if not just set it to NULL. In a more advanced implementation, I had a separate table in the database just for spouses and also included divorce history. It is also worth noting that this algorithm doesn't take incest or inbreeding into account.
-
Birth/Death
With this, you can indicate if someone is alive or dead, and their age/lifespan. If in a database, these can be kept as a date. If member is still alive, then death can be set to NULL.
-
Other
In my more advanced implementation, I split up the parts of the name into title, first-name, nickname, middle-name, and last-name, for more control over displaying the names. There are many other fields you could add too, such as birthplace and information about that person.
Algorithm Terms and Variables:
-
a and b
The two input variables. a and b are both the ID of members in the tree. The algorithm finds a's relationship to b.
-
P(n)
The parent of the nth member of the tree.
-
deg
If a and b are cousins, then deg, or the degree of the cousins, is calculated. It is then put into cousin_prefixes below to determine the suffix.
-
cousin_prefixes[]
An array, or at least a group, for the degree a cousin is, e.g "third cousin" or "first cousin". The first key in the array should be set to 1 ideally instead of 0, or subtract the degree by 1. Values: First, Second, Third, Fourth, Fifth, Sixth, Seventh, Eighth, & Ninth.
-
cousin_suffixes[]
An array, or at least a group, of suffixes to denote how many times removed two cousins are, e.g. "first cousin twice removed" or "Second Cousin 5 times removed". The first key in the array should be set to 1 ideally instead of 0, or subtract the generation difference by 1. Values: Once, Twice, Thrice, 4 times, 5 times, 6 times, 7 times, 8 times, & 9 times.
"Sub" Algorithms/Functions Used In Algorithm (Explained in detail below):
- gen() - Find generation
- gendif() - Finds number of generations between two members
- common() - Finds lowest common ancestor of two members
- gen_prefix() - Determines how many "greats" to use
- gen_anc() - Finds a member's ancestor from a certain generation
The "sub" algorithms are as follows:
-
Generation
Returns the generation of the given member in the tree.
int gen(int a):
-
input a
a is the ID of any given member in the dataset
-
int gen = 1
-
if a == 1 then goto 7
-
a = P(a)
Go up one generation at a time until a is the root ancestor
-
gen = gen + 1
Every time a goes up a generation increment gen
-
goto 3
-
return gen
-
-
Generation Difference
Returns the number of generations between 2 members in the tree.
int gendif(int a, int b):
-
input a and b
a is the ID of any given member in the dataset b is the ID of any given member in the dataset
-
int dif = gen(a) - gen(b)
-
dif = abs(dif)
absolute value of dif
-
return dif
-
-
Common Ancestor
Returns the lowest common ancestor of two members in the tree. If one of the members is a direct ancestor of the other member, that member is considered the common ancestor of the two.
int common(int a, int b):
-
input a and b
a is the ID of any given member in the dataset b is the ID of any given member in the dataset
-
if a == b then return 0
-
int gen_a = gen(a) and int gen_b = gen(b)
-
if a == b then goto 20
-
if gen_a < gen_b then continue else goto 10
-
b = P(b)
-
gen_b = gen_b - 1
-
if a == 1 or b == 1 then return 1
-
goto 4
-
if gen_a > gen_b then continue else goto 15
-
a = P(a)
-
gen_a = gen_a - 1
-
if a == 1 or b == 1 then return 1
-
goto 4
-
if gen_a == gen_b then continue
-
a = P(a)
-
b = P(b)
-
if a == 1 or b == 1 then return 1
-
goto 4
-
return a
-
-
Generation Prefix
Generates the proper prefix based on generation difference if one member is a direct descendant or ancestor of the other member to go in front of "parent" or "child". For example if there is a 1 generation difference, then no prefix would be added, if there were two generations then "grand" would added, and if more than two generation difference, then gendif() * "great" would be added.
string gen_prefix(int gd):
-
input gd
gd is the gendif() between two members of the tree
-
if gd == 1 then return ""
-
if gd == 2 then return "grand"
-
elseif gd > 2 then string pre = "grand"
-
for(int i = 1; i <= gd - 2; i++) pre = "great " + pre
-
return pre
-
-
Ancestor From Generation
This function finds the ancestor of a member from a certain generation. For example, if you wanted to find a member's grandparent, you would input that member as a and the generation as gen(a) - 2.
int gen_anc(int a, int g):
-
input a and g
a is the ID of any member in the tree g is the generation of the unknown ancestor of a to be found
-
if gen(a) <= g then goto 4
-
a = P(a)
-
goto 1
-
return a
-
The main algorithm returns a string containing the relation between a and b. Alternatively, it can also work and return a number instead of a string indicating the relation, such as 0 for direct ancestor, 1 for sibling, 2 for aunt/uncle, and 3 for cousin. With this output you could then put it into another function to convert the numerical representation into a string. If you choose this route, then you will also have to supply the generation difference, and for cousins, you will have to supply the degree. Including the sex of the member in the dataset can also be useful for using specific pronouns, e.g. "mother" instead of "parent" or "nephew" instead of "nephew/niece". This algorithm doesn't take situations like incest and inbreeding into account for simplicity.
string relation(int a, int b):
-
input a and b
a is the ID of any given member in the dataset b is the ID of any given member in the dataset
-
if P(a) == P(b) then return "sibling"
-
if a == common(a, b) then return gen_prefix(gendif(a, b )) + "parent"
-
if b == common(a, b) then return gen_prefix(gendif(a, b )) + "child"
-
if P(a) == P( gen_anc( b, gen( a ) ) then return gen_prefix(gendif(a, b)) + "aunt/uncle"
-
if P(b) == P( gen_anc( a, gen( b ) ) then return gen_prefix(gendif(a, b)) + "nephew/niece"
-
if 2 <= gen(a) - gen(common(a, b)
- if gen(a) <= gen(b) then continue else goto 7.4
- int deg = (gen(a) - gen(common(a, b)) - 1
- goto 8
- int deg = (gen(b) - gen(common(a, b)) - 1
- goto 8
-
return cousin_prefixes[deg] + " cousin " + cousin_suffixes[gendif(a, b)]