Blog

Binomial Coefficients and Stirling Numbers in C#

As long as I’m on a roll with my posts on number theory in C#, I thought I’d briefly discuss how to generate binomial coefficients, and then move on to Stirling numbers of the second kind, both of which are extremely useful in combinatorics and finite calculus.

Some prerequisites for calculating binomial coefficients include a standard factorial function, nothing special:

long factorial(long n)
{
    if (n == 0) return 1;
    long t = n;
    while(n-- > 2) t *= n;
    return t;
}

Note that I use longs whenever possible because, despite the performance hit from 64-bit operations, it’s worth it to be able to work with numbers that are double the magnitude of ints.

We will also need a function for calculating a falling power of a number, which is defined as: $$x^{\underline{n}} = x(x-1)(x-2)\ldots(x-(n-1))$$You’ll see why in a moment.

long fallingPower(long n, long p)
{
    long t = 1;
    for (long i = 0; i < p; i++) t *= n--;
    return t;
}

Notice that the above function handles the case $$n^{\underline{0}}$$ returning 1. However, it does not handle the case of p > n, which would give an incorrect result, but this is not necessary for our purposes.

Binomial Coefficients

Recall that the definition for the binomial coefficient is $${n \choose k} = \frac{n!}{k!(n-k)!}$$ However, using this exact formula to compute binomial coefficients is a bit naive. If we use falling powers (sometimes called falling factorials), the above formula easily reduces to: $$\frac{n^{\underline{k}}}{k!}$$

We can improve the algorithm a bit more by adding the condition:

$${n \choose k} =
\begin{cases}
\frac{n^{\underline{k}}}{k!} \quad \mbox{if } k \leq \lfloor n/2 \rfloor,\\
\frac{n^{\underline{n-k}}}{(n-k)!} \quad \mbox{if } k > \lfloor n/2 \rfloor.
\end{cases}
$$

Not only is this algorithm faster, but it can also handle larger coefficients than the original formula, since neither the falling power nor the factorial ever gets larger than n/2.
The code for this is straightforward:

long binomialCoeff(long n, long k)
{
    if ((k < 0) || (k > n)) return 0;
    k = (k > n / 2) ? n - k : k;
    return fallingPower(n, k) / factorial(k);
}

However, this is still not as optimal as it can be. The most optimal approach would be to accumulate the falling power while dividing by each factor of the factorial in place. This would minimize the chance of overflow errors, and allow for even larger coefficients to be calculated. The disadvantage of this algorithm is the necessary use of floating-point math:

long binomialCoeff(long n, long k) {
    if ((k < 0) || (k > n)) return 0;
    k = (k > n / 2) ? n - k : k;
    double a = 1;
    for (long i = 1; i <= k; i++) a = (a * (n-k+i)) / i;
    return a + 0.5;
}

Stirling Numbers

Stirling numbers (of the second kind) are useful for, among other things, enumerating the coefficients of the falling-power expansion of a regular power. For example, how would we express x3 in terms of falling powers of x? That is, how do we arrive at an equation of the form $$x^3 = ax^{\underline 3} + bx^{\underline 2} + cx^{\underline 1}$$ Well, we could just solve the equation directly, but this would get unwieldy for higher powers. A neat way of doing this involves the use of Stirling numbers of the second kind, $$\left\{\begin{matrix} n \\ k \end{matrix}\right\}$$ A useful theorem for computing these numbers is

$$ \left\{\begin{matrix} n \\ k \end{matrix}\right\}=\frac{1}{k!} \sum_{i=0}^{k}(-1)^i{k \choose i}(k-i)^n $$$

long stirling(long n, long k)
{
    long sum = 0, neg = 1;
    for (long i = 0; i <= k; i++)
    {
        sum += neg * binomialCoeff(k, i) * pow(k - i, n);
        neg = -neg;
    }
    sum /= factorial(k);
    return sum;
}

With the Stirling numbers in hand, we can now obtain the coefficients for falling power expansions:

$$x^m = \sum_{k=0}^{m}\left\{\begin{matrix} m \\ k \end{matrix}\right\}x^{\underline k}$$

Project Euler — Problem 197

I never thought that solving random math problems can be addictive, but Project Euler does exactly that. Not only does it make you flex your math muscles, it also challenges you to take your programming language of choice to its limits. So it’s a total win-win: you brush up on your math skills, and broaden your programming repertoire at the same time.

Once I discovered Project Euler, I couldn’t pull myself away from the computer until I solved as many problems as I could. As of this writing I’ve solved 187 of their 211 problems.

Problem 197 has to do with finding the nth term of a particular recursively defined sequence: $$u_{n+1} = f(u_n)$$ with $$u_0 = -1, f(x) = \lfloor 2^{30.403243784-x^2}\rfloor \cdot 10^{-9}$$

Of course, as with most Project Euler problems, the value for n is set ridiculously high, presumably to eliminate the possibility of brute-forcing the problem (within the lifetime of the universe).

Fortunately, it takes little more than a superficial examination to see that this problem is actually quite simple in disguise. If we look at the first few terms of the sequence, we can already guess that this sequence appears to be “converging” to a function that oscillates between two values, approximately 0.681 and 1.029 (I won’t give precise numbers, since that would give away the solution).

This means that all we need to do is go far enough into the sequence that the deviation of the oscillations is less than the desired precision asked by the problem (10-9). And it so happens that we don’t need to go out far at all. The sequence actually settles on its two oscillatory values as early as the 1000th term (probably even earlier)! Therefore, the sum of the 1000th and 1001st term will be equivalent to the sum of the 1012th and (1012+1)st term, which is what the problem asks for!

The code to do this is elementary. I accomplished it with a mere 5 lines of C# code. Can you do better?

Elementary Number Theory in C#

I thought I’d post a few code snippets in C# that have to do with basic number theory, since I use them in my programs from time to time. These snippets are by no means optimized, and the use of C# pretty much precludes their use in high-performance applications. Still, for relatively small arguments, these routines run surprisingly fast.

Prime numbers

To generate a list of prime numbers, we use the familiar Sieve of Eratosthenes. We can write one routine to generate the sieve:

bool[] GetPrimeSieve(long upTo)
{
    long sieveSize = upTo + 1;
    bool[] sieve = new bool[sieveSize];
    Array.Clear(sieve, 0, (int)sieveSize);
    sieve[0] = true;
    sieve[1] = true;
    long p, max = (long)Math.Sqrt(sieveSize) + 1;
    for (long i = 2; i <= max; i++)
    {
        if (sieve[i]) continue;
        p = i + i;
        while (p < sieveSize) { sieve[p] = true; p += i; }
    }
    return sieve;
}

The above function returns an array of booleans (the size of the given parameter), each of which is false if the array index is a prime number, or true if it's not a prime number. To get an actual list of prime numbers, we can use a function such as this:

long[] GetPrimesUpTo(long upTo)
{
    if (upTo < 2) return null;
    bool[] sieve = GetPrimeSieve(upTo);
    long[] primes = new long[upTo + 1];

    long index = 0;
    for (long i = 2; i <= upTo; i++) if (!sieve[i]) primes[index++] = i;

    Array.Resize(ref primes, (int)index);
    return primes;
}

The above function returns an actual list of primes less than or equal to the specified high limit. This means that we can easily compute the prime-counting function $$\pi(x)$$ for any integer x by counting the number of elements in the array returned by the function. We can write a similar function to generate a list of composites:

long[] GetCompositesUpTo(long upTo)
{
    if (upTo < 2) return null;
    bool[] sieve = GetPrimeSieve(upTo);
    long[] composites = new long[upTo + 1];

    long index = 0;
    for (long i = 2; i <= upTo; i++)
        if (sieve[i]) composites[index++] = i;

    Array.Resize(ref composites, (int)index);
    return composites;
}

As for determining if a single certain number is prime (without having to generate a giant sieve), we can simply use a function that attempts to factor the number. If the number happens to be divisible by an integer greater than 1 and less than or equal to its square root, then the number is not prime:

bool IsPrime(long n)
{
    long max = (long)Math.Sqrt(n);
    for (long i = 2; i <= max; i++)
        if (n % i == 0)
            return false;
    return true;
}

Greatest Common Divisor and Totient

To obtain the greatest common divisor (GCD) of two numbers, we use the usual Euclidean algorithm:

long gcd(long a, long b)
{
    long temp;
    while (b != 0)
    {
        temp = b;
        b = a % b;
        a = temp;
    }
    return a;
}

There is a slightly different binary algorithm for computing the GCD which is theoretically more efficient, but in practice (at least in C#) it's actually slightly less efficient than the algorithm above:

ulong gcdb(ulong a, ulong b)
{
    int shift;
    if (a == 0 || b == 0) return a | b;

    for (shift = 0; ((a | b) & 1) == 0; ++shift) { a >>= 1; b >>= 1; }
    while ((a & 1) == 0) a >>= 1;

    do {
        while ((b & 1) == 0) b >>= 1;

        if (a < b) {
            b -= a;
        } else {
            ulong temp = a - b;
            a = b;
            b = temp;
        }
        b >>= 1;
    } while (b != 0);
    return a << shift;
}

With the GCD readily available, determining whether two numbers are coprime is just a matter of telling whether or not their GCD is equal to 1. Also, calculating the LCM (least common multiple) of two numbers becomes trivial:

long lcm(long a, long b)
{
    long ret = 0, temp = gcd(a, b);
    if (temp != 0)
    {
        if (b > a) ret = (b / temp) * a;
        else ret = (a / temp) * b;
    }
    return ret;
}

Also using the GCD algorithm, it becomes easy to calculate Euler's totient function for a certain number, since the totient function is simply the number of integers less than or equal to n that are coprime to n:

long eulerTotient(long n)
{
    long sum = 0;
    for (long i = 1; i <= n; i++)
        if (gcd(i, n) == 1) sum++;
    return sum;
}

The above is a really naive algorithm. A much more efficient algorithm (one that is usually given in textbooks) is as follows:

int eulerTotient(int n)
{
    primes = GetPrimesUpTo(n+1);    //this can be precalculated beforehand
    int numPrimes = primes.Length;

    int totient = n;
    int currentNum = n, temp, p, prevP = 0;
    for (int i = 0; i < numPrimes; i++)
    {
        p = (int)primes[i];
        if (p > currentNum) break;
        temp = currentNum / p;
        if (temp * p == currentNum)
        {
            currentNum = temp;
            i--;
            if (prevP != p) { prevP = p; totient -= (totient / p); }
        }
    }
    return totient;
}

Reviving the Veo Observer

Recently I came across an old Veo Observer camera. I remember the Veo cameras as being refreshingly easy to use, and quite inexpensive for all the functionality you get.

This camera seemed to power up normally, and acquired an IP address as expected. However, when I logged on to the camera with a web browser, all it gave was a “404 Not Found” error. Also, when I tried to use the Veo Observer Studio software from the CD that supposedly came with the camera, the software said that there was a “Protocol Version Error.”

This led me to believe that someone may have tried to upgrade the firmware in the camera, and either disconnected before completing the upgrade, or loaded the wrong firmware. So all I had to do was find the correct firmware, as well as the correct utility for loading it onto the camera. This turned out to be a lot more difficult than I thought. The manufacturer (Veo) no longer exists, and all I could find on the Web were complaints from users who are just as SOL as I was. Fortunately, I stumbled on an obscure website that turned out to contain a repository of old device drivers, one of which happened to be the Veo Setup Utility and the Veo firmware. I was then able to load the firmware successfully, and then log on to the camera and see the video stream from it. I’ve decided to host the Veo Setup Utility and the latest firmware here on my website, in case someone else comes across the same problems.

During my search for Veo software, I also found that someone has written a clever Perl module for communicating with the camera (making it usable from virtually any OS), and another person has written Java code for it, too. This inspired me to make a quick-and-dirty C++ application based on the Perl code. My little program controls pretty much all the features of the Veo observer, and displays the image stream from the camera.

Download the program here.

The Growing Importance of Strong Passwords

I received a call today from the Fraud Prevention service of my credit card company, saying that “someone” called in to Customer Service, posing as me, and attempted to gather information about my account. This person had my credit card number, but failed to get past the additional security questions asked by the support staff. The support staff then promptly called me, and asked if it was I who tried to call in to Customer Service. The moment I said “no,” the operator told me that my account will be immediately deactivated to prevent any fraudulent charges, and that a new credit card would be mailed to me within 5 business days.

Despite the inconvenience of having my credit card account shut down, and being issued a new card, I applaud the support staff for taking their users’ security so seriously. But this incident also got me thinking about the current level of security used by online retailers, as well as online banking and credit card websites. After all, how exactly did a would-be identity thief get a hold of my credit card number? All of my online purchases are through very reputable stores like Amazon and Newegg. All the items I purchase are completely legal — i.e. no kinky horse-on-girl porn from shady Russian websites. All of my transactions are over SSL, and I’m quite sure that I don’t have a keylogger installed on my system.

This leads to one of the following conclusions, arranged from least to most likely:

  • Someone cracked my SSL session with an online retailer. This is astronomically unlikely, but still possible.
  • Someone hacked one of the online retailer’s servers, and retrieved the raw database of credit card numbers for thousands of customers.
  • Someone hacked one of the company’s servers, and retrieved password hashes for thousands of users, and decoded the passwords at his/her own leisure. If the hacker is an employee of the company, no hacking would even be necessary. The database would be readily available for copying and selling to the black market.

As the incredibly eye-opening Ophcrack project has shown, old-style passwords are no longer safe (i.e. passwords shorter than 15 characters, consisting only of letters and numbers). Any Windows system administrator who hasn’t disabled LM Hashes has been living in a cave, and any Linux administrator who isn’t using shadow passwords is almost equally neglectful. And of course, any administrator or developer who stores users’ passwords in plain-text format should be fired on the spot, and have the infraction recorded as a felony in his criminal record.

The point is, many forms of identity theft can be prevented by using strong passwords — that is, passwords that are generously long (15 or more characters), that contain uppercase and lowercase letters, numbers, and special characters like $, %, &, space, and maybe even exténdeð­ characters, or even Unicode!

The question is, are online retailers and banking websites “ready” for strong passwords?

No!

At least that’s the short answer. As an example, let’s take a look at what happened when I tried to change my password on my banking website, which happens to be Huntington. I typed in a strong password with letters, numbers, and special characters, and this is what I got:
I beg your pardon?! Of all the websites in the world, banking sites should be the most secure by definition. And yet, here we are with Huntington’s website telling us to limit our password to 16 characters, and not use any special characters!

In Huntington’s defense, they do provide an additional level of security by asking a secret question (in addition to the password), if a user logs in from a different IP.

I can understand enforcing minimum requirements for password strength, which Huntington does, but setting limitations on password strength? What gives?

Let’s move on to my credit card website, which is Chase. Attempting to change my password there, I get the following:
Again, they tell us not to use special characters, and to limit the length of our password. Even though the length limitation here is 32 characters, my question is why is there a length limitation? And why can’t special characters be used?

If the excuse is that the underlying software that runs the website doesn’t support passwords with special characters, then the software is in serious need of revision. The password hashing algorithm should not care about what characters are passed into it.

I’m a big fan of passphrases, too — that is, passwords like “How many licks does it take?” or “Density = Mass/Volume” or “E = m*c^2”, all of which are much stronger than passwords of equal length with just letters and numbers. But, since these websites don’t allow passphrases, I’m forced to come up with a weaker password that fits all their guidelines and restrictions. Even worse, since each website may have slightly different restrictions on passwords, I’m forced to come up with a least-common-denominator password if I want to use one password for multiple sites.

With this kind of “password mess” on the most secure internet websites, it’s no wonder ordinary users become confused about what kind of passwords they are and aren’t allowed to use, and default to using common, easy-to-remember passwords that are just waiting to be cracked by malicious individuals.