You are given a pile of blocks and asked to split them into groups, each with at least one block. The order of the groups does not matter. In how many different ways could you do it? The answer to this simple question is more complex than you might think.
the partition numbers are a sequence of integers that represent in how many ways a given integer can be written as a sum of positive integers, or equivalently, split into groups. For example, 5 can be written as 5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, or 1+1+1+1+1, for a total of 7 different partitions. Thus, the 5th partition number, usually denoted p(5), is 7. (The order of the groups does not matter; 2+3 and 3+2 are not distinguished.) Finding these by hand becomes increasingly difficult with larger numbers because the number of partitions grows rapidly and it can be easy to overlook some. Take the partitions of 10, for example.
10 = 9+1 = 8+2 = 8+1+1 = 7+3 = 7+2+1 = 7+1+1+1 = 6+4 = 6+3+1 = 6+2+2 = 6+2+1+1 = 6+1+1+1 = 5+5 = 5+4+1 = 5+3+2 = 5+3+1+1 = 5+2+2+1 = 5+2+1+1+1 = 5+1+1+1+1+1 = 4+4+2 = 4+4+1+1 = 4+3+3 = 4+3+2+1 = 4+3+1+1+1 = 4+2+2+2 = 4+2+2+1+1 = 4+2+1+1+1+1 = 4+1+1+1+1+1+1 = 3+3+3+1 = 3+3+2+2 = 3+3+2+1+1 = 3+3+1+1+1+1 = 3+2+2+2+1 = 3+2+2+1+1+1 = 3+2+1+1+1+1+1 = 3+1+1+1+1+1+1+1 = 2+2+2+2+2 = 2+2+2+2+1+1 = 2+2+2+1+1+1+1 = 2+2+1+1+1+1+1+1 = 2+1+1+1+1+1+1+1+1 = 1+1+1+1+1+1+1+1+1+1
So p(10) = 42, and its calculation is manageable if you're careful. For numbers as relatively small as 100, though, the number of different partitions explodes into the hundreds of millions, larger than any human could count by hand. Clearly, we need a computer to calculate p(100) and other large partition numbers, but how would a program actually find all possible partitions and make sure there are no duplicates?
My solution starts by asking not just "How many partitions exist of some integer n?" but also "How many partitions exist of a with no group larger than b?" This is the key because it suggests two possible options when generating a given partition. For example, partitioning 9 elements with a maximum group size of 4 will result in these options:
1. Create a group of size 4, leaving 5 elements to partition with the same max of 4, or
2. Don't create any groups of size 4, so the max decreases to 3, but there are still 9 elements left.
In the general case, with a elements and a maximum size of b, the options would be:
1. Create a group of size b, leaving a-b elements to partition with the same max of b, or
2. Don't create any groups of size b, so the max decreases to b-1, but there are still a elements left.
The number of partitions of a with groups containing up to b elements can be written as a function pUpTo(a, b), and the number of partitions satisfying the two options described above would be expressed as pUpTo(a-b, b) and pUpTo(a, b-1), respectively. Since the options do not overlap and account for all partitions in pUpTo(a, b),
pUpTo(a, b) = pUpTo (a-b, b)+pUpTo(a, b-1)
The issue here, though, is that trying to calculate this will lead to the expression expanding forever, with the number of copies of the function doubling each time:
pUpTo(9, 4)
= pUpTo(5, 4) + pUpTo(9, 3)
= (pUpTo(1, 4)+pUpTo(5, 3)) + (pUpTo(6, 3)+pUpTo(9, 2))
= ((pUpTo(-3, 4)+pUpTo(1, 3)) + (pUpTo(2, 3)+pUpTo(5, 2))) + ((pUpTo(3, 3)+pUpTo(6, 2)) + (pUpTo(7, 2)+pUpTo(9, 1)))
= ...
and so on. Notice that in the last expression, the term pUpTo(-3, 4) appears, which means "the number of ways to write -3 elements as a sum of positive integers, with no term in the sum being greater than 4". Since negative numbers cannot be written as a sum of positive numbers, pUpTo(-3, 4) must be equal to 0. In general,
if a < 0, pUpTo(a, b) = 0.
The same holds when b is negative. For example, in pUpTo(2, -5), if the sizes of groups must be greater than 0 but less than or equal to -5, there are no possible group sizes and thus no possible partitions. This also works when b=0:
If b <= 0, pUpTo(a, b) = 0.
This second rule has a single exception where a=0 and b=0, because 0 can be partitioned in exactly one way - into 0 groups. This is a weird case but a very important one.
Combining these three rules,
pUpTo(a, b) = {
if a=0 and b=0, 1
otherwise, if a < 0 or b <= 0, 0
otherwise, pUpTo (a-b, b)+pUpTo(a, b-1) }
Finally, we define the partition function. Any partition of n can only have groups as large as n elements:
p(n) = pUpTo(n, n)
We are done! Writing these two definitions into any computer program will enable us to calculate partition numbers as large as we please. I decided to write one version (download link) of the program in Python and another (download link) in Haskell, a functional programming language. The Python program will print all partition numbers from p(0) to p(100), inclusive. To run the Haskell program, follow these steps:
1. Download and install the GHC (Glasgow Haskell Compiler) here.
2. Open a command prompt and navigate to the directory where the script is located. (If you do not know what a command prompt is, I suggest that you just run the Python script, which can often be done by clicking on it.)
3. Type 'ghci' into the prompt. This should give you a new prompt that looks like this:
Prelude>
4. Type ':load partitions' (with the colon) into the prompt, which should change to:
*Main>
5. Type 'p 100' to calculate the partitions of 100, or change the '100' to any n to find the nth partition number. Keep reading, though, for some notes on speed.
On my medium- to high-end laptop, computing p(100) with the Haskell script took a few hours. The Python script, on the other hand, does it in the blink of an eye. What makes one program run so many orders of magnitude faster? The Python script uses a technique known as memoization.
Haskell takes so long to evaluate p(100) because within the recursive tree of function calls, it runs pUpTo with the same arguments many, many times over, re-evaluating it separately each time. This is extremely wasteful and leads to the amount of time to evaluate p(n) growing exponentially as n increases. What if, instead, every time a call to pUpTo finished running, it stored the answer somewhere? That way, if pUpTo were called again with the same arguments, it could fetch the pre-computed solution instead of going through the whole process a second time.
The Python script implements this using a lookup table. A lookup table is a data structure similar to an array, but instead of being indexed by a fixed-length list of integers within specified ranges, a lookup table (known in Python as a dict, or dictionary) can be indexed by virtually anything - integers, floats, strings, or even other data structures like lists and arrays. The details don't matter here other than that a and b can be any integers.
The dict cache is indexed by tuples, or ordered pairs, of a and b. The first thing that pUpTo does is to check whether the arguments it receives are already a valid index of cache, i.e. whether something is already stored at (a, b): the pre-computed solution. If so, it returns said solution and that's the end of it. Otherwise, this is the first time using those precise arguments, so it needs to use the definition of the function as specified above. Once the answer is found, it is stored in the cache for all future evaluations with the same arguments.
With memoization, the evaluation time of p(n) plummets from around 2^n to n^2. There is no need to wait for hours for the Haskell script to complete when the Python program can find partitions as large as your recursion limit (anything beyond p(1000) on my laptop causes a stack overflow).
Happy calculating!
the partition numbers are a sequence of integers that represent in how many ways a given integer can be written as a sum of positive integers, or equivalently, split into groups. For example, 5 can be written as 5, 4+1, 3+2, 3+1+1, 2+2+1, 2+1+1+1, or 1+1+1+1+1, for a total of 7 different partitions. Thus, the 5th partition number, usually denoted p(5), is 7. (The order of the groups does not matter; 2+3 and 3+2 are not distinguished.) Finding these by hand becomes increasingly difficult with larger numbers because the number of partitions grows rapidly and it can be easy to overlook some. Take the partitions of 10, for example.
10 = 9+1 = 8+2 = 8+1+1 = 7+3 = 7+2+1 = 7+1+1+1 = 6+4 = 6+3+1 = 6+2+2 = 6+2+1+1 = 6+1+1+1 = 5+5 = 5+4+1 = 5+3+2 = 5+3+1+1 = 5+2+2+1 = 5+2+1+1+1 = 5+1+1+1+1+1 = 4+4+2 = 4+4+1+1 = 4+3+3 = 4+3+2+1 = 4+3+1+1+1 = 4+2+2+2 = 4+2+2+1+1 = 4+2+1+1+1+1 = 4+1+1+1+1+1+1 = 3+3+3+1 = 3+3+2+2 = 3+3+2+1+1 = 3+3+1+1+1+1 = 3+2+2+2+1 = 3+2+2+1+1+1 = 3+2+1+1+1+1+1 = 3+1+1+1+1+1+1+1 = 2+2+2+2+2 = 2+2+2+2+1+1 = 2+2+2+1+1+1+1 = 2+2+1+1+1+1+1+1 = 2+1+1+1+1+1+1+1+1 = 1+1+1+1+1+1+1+1+1+1
So p(10) = 42, and its calculation is manageable if you're careful. For numbers as relatively small as 100, though, the number of different partitions explodes into the hundreds of millions, larger than any human could count by hand. Clearly, we need a computer to calculate p(100) and other large partition numbers, but how would a program actually find all possible partitions and make sure there are no duplicates?
My solution starts by asking not just "How many partitions exist of some integer n?" but also "How many partitions exist of a with no group larger than b?" This is the key because it suggests two possible options when generating a given partition. For example, partitioning 9 elements with a maximum group size of 4 will result in these options:
1. Create a group of size 4, leaving 5 elements to partition with the same max of 4, or
2. Don't create any groups of size 4, so the max decreases to 3, but there are still 9 elements left.
In the general case, with a elements and a maximum size of b, the options would be:
1. Create a group of size b, leaving a-b elements to partition with the same max of b, or
2. Don't create any groups of size b, so the max decreases to b-1, but there are still a elements left.
The number of partitions of a with groups containing up to b elements can be written as a function pUpTo(a, b), and the number of partitions satisfying the two options described above would be expressed as pUpTo(a-b, b) and pUpTo(a, b-1), respectively. Since the options do not overlap and account for all partitions in pUpTo(a, b),
pUpTo(a, b) = pUpTo (a-b, b)+pUpTo(a, b-1)
The issue here, though, is that trying to calculate this will lead to the expression expanding forever, with the number of copies of the function doubling each time:
pUpTo(9, 4)
= pUpTo(5, 4) + pUpTo(9, 3)
= (pUpTo(1, 4)+pUpTo(5, 3)) + (pUpTo(6, 3)+pUpTo(9, 2))
= ((pUpTo(-3, 4)+pUpTo(1, 3)) + (pUpTo(2, 3)+pUpTo(5, 2))) + ((pUpTo(3, 3)+pUpTo(6, 2)) + (pUpTo(7, 2)+pUpTo(9, 1)))
= ...
and so on. Notice that in the last expression, the term pUpTo(-3, 4) appears, which means "the number of ways to write -3 elements as a sum of positive integers, with no term in the sum being greater than 4". Since negative numbers cannot be written as a sum of positive numbers, pUpTo(-3, 4) must be equal to 0. In general,
if a < 0, pUpTo(a, b) = 0.
The same holds when b is negative. For example, in pUpTo(2, -5), if the sizes of groups must be greater than 0 but less than or equal to -5, there are no possible group sizes and thus no possible partitions. This also works when b=0:
If b <= 0, pUpTo(a, b) = 0.
This second rule has a single exception where a=0 and b=0, because 0 can be partitioned in exactly one way - into 0 groups. This is a weird case but a very important one.
Combining these three rules,
pUpTo(a, b) = {
if a=0 and b=0, 1
otherwise, if a < 0 or b <= 0, 0
otherwise, pUpTo (a-b, b)+pUpTo(a, b-1) }
Finally, we define the partition function. Any partition of n can only have groups as large as n elements:
p(n) = pUpTo(n, n)
We are done! Writing these two definitions into any computer program will enable us to calculate partition numbers as large as we please. I decided to write one version (download link) of the program in Python and another (download link) in Haskell, a functional programming language. The Python program will print all partition numbers from p(0) to p(100), inclusive. To run the Haskell program, follow these steps:
1. Download and install the GHC (Glasgow Haskell Compiler) here.
2. Open a command prompt and navigate to the directory where the script is located. (If you do not know what a command prompt is, I suggest that you just run the Python script, which can often be done by clicking on it.)
3. Type 'ghci' into the prompt. This should give you a new prompt that looks like this:
Prelude>
4. Type ':load partitions' (with the colon) into the prompt, which should change to:
*Main>
5. Type 'p 100' to calculate the partitions of 100, or change the '100' to any n to find the nth partition number. Keep reading, though, for some notes on speed.
On my medium- to high-end laptop, computing p(100) with the Haskell script took a few hours. The Python script, on the other hand, does it in the blink of an eye. What makes one program run so many orders of magnitude faster? The Python script uses a technique known as memoization.
Haskell takes so long to evaluate p(100) because within the recursive tree of function calls, it runs pUpTo with the same arguments many, many times over, re-evaluating it separately each time. This is extremely wasteful and leads to the amount of time to evaluate p(n) growing exponentially as n increases. What if, instead, every time a call to pUpTo finished running, it stored the answer somewhere? That way, if pUpTo were called again with the same arguments, it could fetch the pre-computed solution instead of going through the whole process a second time.
The Python script implements this using a lookup table. A lookup table is a data structure similar to an array, but instead of being indexed by a fixed-length list of integers within specified ranges, a lookup table (known in Python as a dict, or dictionary) can be indexed by virtually anything - integers, floats, strings, or even other data structures like lists and arrays. The details don't matter here other than that a and b can be any integers.
The dict cache is indexed by tuples, or ordered pairs, of a and b. The first thing that pUpTo does is to check whether the arguments it receives are already a valid index of cache, i.e. whether something is already stored at (a, b): the pre-computed solution. If so, it returns said solution and that's the end of it. Otherwise, this is the first time using those precise arguments, so it needs to use the definition of the function as specified above. Once the answer is found, it is stored in the cache for all future evaluations with the same arguments.
With memoization, the evaluation time of p(n) plummets from around 2^n to n^2. There is no need to wait for hours for the Haskell script to complete when the Python program can find partitions as large as your recursion limit (anything beyond p(1000) on my laptop causes a stack overflow).
Happy calculating!