Catalan numbers are a sequence of natural numbers that occur in various counting problems. This article aims to explain the intuition behind them ; how they are derived and how they come up in seemingly unrelated problems. Consider this
"Suppose m = a+b where a=b, votes were cast in an election, with candidate A receiving a votes and candidate B receiving b votes. The ballots are counted individually in some random order, giving rise to a sequence of a As and b Bs. Assuming all such sequences are equally likely, in how many such sequences can candidate A led throughout the counting of the ballots?"
Another way of asking this would be if every vote for A is considered as +1 and every vote for B is a -1, in how many ways can the votes be counted so that the sum while counting is always non negative? Of course since there are an equal number of +1s and -1s the sum at the end will be 0 but we want the sum during the counting to be non negative. For example, for a=b=5,the sequences (+,+,−,+,−,−,+,+,−,−) and (+,+,−,+,+,−,+,−,−,−) can be represented as
This is therefore equivalent to asking how many such non negative paths exists from (0,0) to (2n,0) where n=a=b. This is where it gets interesting.
First, take a positive path(never touches 0) from (0,0) to (2n,0). Note that this path must cross (1,1) and (2n-1,1). In other words, the first number in the sequence should be +1 and the last should be -1 as the total sum is zero.
Let's now consider the segment of this path between (1,1) and (2n-1,1).
If the origin is shifted to (1,1), this can be seen as a non negative path of n-1 steps. Also note that every path P is uniquely determined by this segment alone because the first and the last steps are always fixed (+1 and -1). After some clever manipulation (http://faculty.kfupm.edu.sa/ICS/malalla/Teaching/ICS253/TextBook/Apps_Ch7.pdf), we can conclude that the total number of such non-negative sequences of n-1 steps is the nth Catalan number.
Catalan numbers have a way to cropping up in situations where you least expect them to. Consider a second problem,
"How many “mountain ranges” can you form with n upstrokes and n downstrokes that all stay above the original line?"
For n=3 we have the following 5 ways.
It is clear to see how this relates to the previous problem. Now consider this
"In a grid of n×n squares, how many paths are there of length 2n that lead from the upper left corner to the lower right corner that do not touch the diagonal dotted line from upper left to lower right? In other words, how many paths stay on or above the main diagonal?"
Just rotate the previous image and you get this
Suppose you have n pairs of parentheses and you would like to form valid groupings of them, where “valid” means that each open parenthesis has a matching closed parenthesis. For example, “(()())” is valid, but “())()(” is not. How many groupings are there for each value of n?
Perhaps a more precise definition of the problem would be this: A string of parentheses is valid if there are an equal number of open and closed parentheses and if you begin at the left as you move to the right, add 1 each time you pass an open and subtract 1 each time you pass a closed parenthesis, then the sum is always non-negative.
For n=3 we have ()()(), ()(()), (())(), (()()) and ((())). Note that this can be obtained by replacing / with ( and \ with ) in the mountain range problem.
The same concept can be extended to find the number of full binary search trees as well.
COMMENTS