The Boyer-Moore majority vote algorithm1 is an efficient algorithm used for finding the majority element, if any, in a sequence of elements using linear time complexity
A majority element is an element that appears more than
Based on this definition, there can be only one majority element (or none) in a given sequence.
One way to solve this problem is by simply iterating through the array, supposing that the current element is indeed the majority element, counting its number of occurrences, and if it is repeated more than
This approach uses nested loops, so its time complexity is
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
threshold = len(array) // 2
for candidate in array:
count = 0
for element in array:
if candidate == element:
count += 1
if count > threshold:
return candidate
raise ValueError("Majority element does not exist.")
Another, more efficient, way of solving this problem is by using a dictionary or a hashmap to store the counts of unique elements in the input array and then returning the majority element if it exists.
This approach has a time complexity of
Although to be more accurate, the time complexity is an *amortized
As for the space complexity, it would be an
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
threshold = len(array) // 2
counts = {}
for candidate in array:
if candidate not in counts:
counts[candidate] = 0
counts[candidate] += 1
if counts[candidate] > threshold:
return candidate
raise ValueError("Majority element does not exist.")
Or we can use collections.Counter
which is more efficient for such a task
from collections import Counter
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
counts = Counter(array)
threshold = len(array) // 2
[(candidate, count)] = counts.most_common(1)
if count > threshold:
return candidate
raise ValueError("Majority element does not exist.")
Boyer-Moore Algorithm4
Suppose that an election has taken place and that there was a clear winner with a number of votes greater than at least half the number of total votes (or voters).
The winner represents the majority element in our case, and we are interested in fiding it of course.
Well, let's suppose that humans weren't very civilized at this period of history, and that the way to settle an election was a bit (ok, maybe too much)
bloody.
The voters had to knock each other out until one clear winner survived.
The rules are that a voter for a certain candidate can only be knocked out by a voter for another, different, candidate.
And he must be knocked out in the next round by another voter, for a different candidate of course, if such a voter exists. In other words, no voter can win more than one round, he has to be eliminated after he wins.
If a winner exists, that is a majority element, then he would have more than half of the voters.
If he wins by a margin, that is by a single vote, then he would have m + 1
voters on his side, and the other candidates would have all the remaining voters, that is m
voters, where m = (total number of voters) / 2
.
This means that if the winner's voters were against their enemies in pairs of two, they would cancel each other out, one by one, until one person was left.
Let us try to visualize this with an example: say we have 3 candidates: A
, B
and C
.
Candidate A
has 4 voters: A1
, A2
, A3
and A4
.
Candidate B
has 2 voters: B1
and B2
.
Candidate C
has 1 voter : C1
.
- In the first round
A1
goes againstB1
, andB1
wins (it doesn't really matter who wins first; but we're just trying to simulate a worst case and this aligns perfectly with it I guess). - In the second round we have
B1
, the winner of the previous round, againstA2
for example. According to the rules,B1
can't win a second time, so he gets knocked out byA2
who proceeds to the next round. - In the 3rd round
B2
knocks outA2
. - In the 4th round
A3
knocks outB2
. - In the 5th round
C1
knocks outA3
. - In the 6th and final round
A4
knocks outC1
. - No more voters are remaining and the only one standing is
A4
, soA
is the winner.
And this process would work if the (winner) majority element had way more votes (occurrences) than the remaining candidates since in the worst case we had at least one voter who's still standing.
The previous process, the fighting phase, is formerly known as the pairing phase, because the votes (voters) are taken down one by one in a pairs.
Now, what happens if a majority element doesn't exist? There would still be someone who's standing at the end even though its corresponding candidate doesn't represent tha majority element.
This is why a second phase is necessary, called the counting phase where we would verify if the selected candidate is indeed a majority element, by going through all of the votes one more time, counting the number of votes in favor of this candidate only and comparing it with the threshold (
Now how do we translate this algorithm into actual code?
It's quite simple actually. We only need 2 variables, one to keep track of the current candidate and a counter to keep track of its current number of supporters or voters.
Note: The elements of our array would represent the voters, and their values would be that of the candidates, so if a voter supports candidate X
then its value would be X
.
For example:
- Voters(Array): [1, 1, 2, 2, 1, 2]
- Candidates: 1 and 2
We can choose any element from the array to be a candidate (simply choose the first element) and we initialize the counter to 0 (since no supporters have been found before iterating through the array).
Then, as we go through the array, if we encounter a supporter, that is the same value as the candidate we would increment the counter, otherwise we should decrement it, since that supporter would be knocked out (but the remaining supporters, if any; depending on the value of counter, are still there to knock or be knocked out).
What if the counter reaches the value 0? That would mean that all supporters of the current candidate have been knocked out, and the one standing right now is the current voter, so we would have to update our candidate variable.
And this process continues until we exhaust the entire array. This was the pairing phase, and I believe that the counting phase is self-explanatory and quite easy to implement.
This makes the time complexity of this algorithm
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
# Set the initial candidate and counter
candidate = array[0]
count = 0
# Loop through the array
for element in array:
# If we find the current candidate, increment the counter
if element == candidate:
count +=1
# If the counter reaches zero, switch to a new candidate and
# reset the counter
elif count == 0:
candidate = element
count = 1
# Otherwise, decrement the counter for any non-matching element
else:
count -= 1
# Check if the candidate has enough votes to be the majority
threshold = len(array) // 2
candidate_count = 0
for element in array:
if element == candidate:
candidate_count += 1
# Alternatively you could use: array.count(candidate) > threshold
if candidate_count > threshold:
return candidate
raise ValueError("Majority element does not exist.")
The Frequent Items Problem and Misra-Gries Summary5
A generalization of the previous problem is when we want to find not only the majority element, but all frequent elements which are elements occurring more than a given fraction of the time, that is more than a certain proportion of the input size.
For example, we might want to find all elements that appear more than
Why the 2 most frequent elements you say? It's because that is the maximum number possible given the constraint.
And in the general case, there would be a maximum of
So how can we generalize the Boyer-Moore Majority Vote Algorithm (which can be thought of as the 1 most frequent element problem) to solve this problem? We simply need to imagine another fight :)
Since there can be at most
If we encounter one of our candidates, we simply increment its count as usual.
If we encounter a completely new candidate, then we should decide whether to eliminate one of our previous candidates and add this one or not.
This is where the tricky part comes into play.
The first thing to note is that there's no need to have a fight between our
So then, when should a fight happen? It should happen when we already have a list of
Alternatively, you can choose to eliminate only one candidate randomly (for example, the first one with an insufficient number of votes), thus always maintaining
You might have doubts at this point regarding the correctness of this algorithm, but I can assure you that it works fine in both cases, since if a most frequent element exists, then he should appear again in the list of candidates in an upcoming round.
And if we delete only one candidate from the list, then we should be able to verify its eligibility when we perform an extra check at the end, just like in the original algorithm.
This makes the time complexity of the algorithm linear
Let's make this into Python code:
def most_frequent(array, k):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
if k <= 0:
raise ValueError("Invalid k value: must be positive integer")
# No need to do all the extra work if it isn't necessary
if k >= len(array):
return array.copy()
# Use a dictionary to store element counts.
counts = {}
# First loop: count occurrences of each element.
for element in array:
if element in counts:
counts[element] += 1
else:
counts[element] = 1
# If exceeding maximum candidates, keep only elements with
# positive count.
if len(counts) > k:
# Dictionaries can't be mutated as you loop through them
# that's why we're creating a new dictionary to select
# the corresponding candidates
# Alternatively, we could've saved the current candidates
# in a list, looped through the dictionary and deleted
# the corresponding candidates without any problems
# but this approach is simpler
new_counts = {}
for candidate, count in counts.items():
# because count - 1 == 0 and that candidate
# should be eliminated
if count > 1:
new_counts[candidate] = count - 1
counts = new_counts
# Create a separate dictionary to count again for final verification.
counts_check = dict.fromkeys(counts, 0)
# Second loop: verify counts against threshold.
for element in array:
if element in counts_check:
counts_check[element] += 1
# Calculate the minimum required count for an element to be considered
# frequent. Do you see why it's (k + 1)?
threshold = len(array) // (k + 1)
# Return the list of elements exceeding the threshold
return [
candidate
for candidate, count in counts_check.items()
if count > threshold
]
It is important to treat each problem separately and take advantage of its constraints whenever possible.
If the input array was sorted, the majority element problem would be greatly simplified (it is not worth it to sort the array yourself, since the time complexity needed would be $O(nlog(n))$).
We would simply loop through the array one item at a time and increment a counter variable as we're still consuming the same element (since the array is sorted, equal elements would be held next to each other in contiguous slots).
If this element has been repeated as many times as needed, then we've found our majority element; otherwise, if we counter a different element, we would have to repeat the same process again.
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
array = sorted(array)
count = 0
candidate = array[0]
threshold = len(array) // 2
for element in array:
if element == candidate:
count += 1
else:
candidate = element
count = 1
if count > threshold:
return candidate
raise ValueError("Majority element does not exist.")
One minor remark is that there's no need to check if the candidate's count is greater than the threshold until you've looped through at least half of the array. I leave it as an exercise to the reader to try and adapt this remark to the code if he wishes.
One other way to take advantage of a sorted array is the fact that if a majority element exists, then it must be the median of this array6.
Since the majority element would have at least n / 2 + 1
occurrences, where n is the size of the array, we would get the following:
- If the first occurrence of this element is at the first position of the array, then it would at least reach the
n / 2
th position which represents the median of the array. - If the last occurrence of this element is at the last position of the array, then it would at least start from the
n / 2
th position, which represents the median of the array again. (you can imagine the array in reverse order if you're having a hard time believing that the median would indeed be included). - Any other possibility would be bound by the two previous cases, making it inevitable to have the median as the majority element.
Of course, the majority check is not necessary for this algorithm if it is guaranteed to have a majority element in the input. (Neither is it necessary for the Boyer-Moore algorithm; that is the counting phase can be eliminated).
def majority_element(array):
if len(array) == 0:
raise ValueError("No majority element: Empty sequence")
array = sorted(array)
median = len(array) // 2 # the median index is the same as the threshold
candidate = array[median]
if array.count(candidate) > median:
return candidate
raise ValueError("Majority element does not exist.")