Degrees of separation on a tree algorithm
Imagine that you are like me and looking for an algorithm to compute degrees of separation along a hierarchical tree like the ones pictured below. Your tree could represent any data — let’s pretend it’s a company orgchart with node A as the CEO, nodes B and I are execs, and so on. The distance, or degrees of separation, between any two nodes is the number of links along the shortest path that separates them.
Concept
Intuitively, it seems quite easy to count up and down a tree as in the pictures above. But when your data is a flat CSV, the pattern is a little tricky to see. The computation depends on whether one person is a manager, directly or indirectly, of the other person.
Say we have selected two nodes m and n, where m = node B and n = node F. Notice that B is F’s eventual manager (via node D). In this case, B’s distance from A, or X, = 1; F’s distance from A, or Y, = 3. The only node B and F have in common is A, so their intersection Z = 1. B and F’s degrees of separation = 2 because 1+3–2(1) = 2.
Now let’s select two nodes which are not eventual managers of each other — nodes F and H. Both node F’s and H’s distance from A, or X and Y, = 3. Their intersection Z = 3 because they have nodes A, B, and D in common. F and H’s degrees of separation = 2 because 3+3–2(3–1) = 2.
Wait a second, where did that minus 1 come from?! Well, technically it’s minus 2 once you distribute the 2, and it comes from eliminating the double-counting that happens when links are shared by both m and n on their paths to A. I urge you to trace some paths of your own and compute the math to make evident this concept.
Data Structure
If your tree data exists as a nested JSON, consider converting it to a flat CSV for the purpose of this tutorial. I used a CSV and R to achieve my goal along with d3.js collapsible trees to visualize my trees. Ideal data structure for the algorithm looks something like this:
Code
degrees <- function(m,n){ ## compute m & n's distance from A
## subtract 1 to remove "person" column
mFromA <- sum(!is.na(data[m, ])) — 1
nFromA <- sum(!is.na(data[n, ])) — 1 ## compute how many nodes m and n have in common
## in many cases, m and n only have A in common
overlap <- intersect(data[m, which(!is.na(data[m, ]))], data[n, which(!is.na(data[n, ]))]) ## (optional) print above values
print(paste(“Distance”, data[m,1] ,”to A: “, mFromA))
print(paste(“Distance”, data[n,1] ,”to A: “, nFromA))
print(paste(“length of overlap: “, length(overlap)))
print(overlap) ## ok, algorithm time: ## but first, let's check if the user entered the same person
## i.e. does m == n? if so, degrees of separation = 0
if(m == n){
degreesOfSep <- 0
}
## check if m or n is (eventual) manager of n or m
## i.e. check if higher node manages lower node
## if so, compute distance as X+Y-2*Z
else if(mFromA != nFromA){
if(mFromA > nFromA){ lower <- m; higher <- n }
if(nFromA > mFromA){ lower <- n; higher <- m }
for (i in 2:length(data[lower, ])) {
if(grepl(data[higher,1], data[lower,i])){
manager <- TRUE
print(paste(data[higher,1], “is an (eventual) manager of”, data[lower,1]))
break
}
else{
manager <- FALSE
}
}
if(manager){
degreesOfSep <- mFromA + nFromA — 2*(length(overlap))
}
## if one is not (eventual) manager of other,
## compute as X+Y-2*(Z-1)
else {
degreesOfSep <- mFromA + nFromA — 2*(length(overlap) — 1)
}
}
else {
degreesOfSep <- mFromA + nFromA — 2*(length(overlap) — 1)
}
print(paste(“Degrees of Separation: “, degreesOfSep))
}
Cool! Now, what?
Now we have an algorithm that accepts any two nodes as arguments and spits out their degrees of separation! Yay!
Perhaps a degrees of separation metric could help you predict attrition—how does a manager’s resignation ripple-effect through his or her network? Are a manager’s direct reports (+1 degree of separation) more likely to leave than his or her skip-level reports (2+ degrees of separation)?
Degrees of separation make all sorts of team analyses possible—do we perform better as teams or as individuals?
In the future, I’d like weight degrees of separation based on external data, like email communications and LinkedIn connections. In the case of the orgchart, I believe the future lies in networks, not trees. Tree orgcharts are simple which is why we use them, but in reality teamwork and collaboration can happen even amongst those with the highest degrees of separation. Fortunately, there’s a lot more literature online about network algorithms than single-parent tree algorithms.