Differences Between Random Numbers

Have you ever taken a list of numbers, and produced a new list by taking the difference between each adjacent pair? For instance:

1 4 9 16 25       =>       3 5 7 9

It’s easy to see that with the second list, along with the first number of the original list, we can reconstruct the entire original list. In fact, if we notice a pattern with the second list, we can add a number there and extend the original list

3 5 7 9 [11]       =>       1 4 9 16 25 [36]

I used to do this in church, where there were numbers hanging on the walls (Bible verses, I assume). By building these “difference lists,” I decided I could predict the next number in any sequence. Does it work? Well, the results are often interesting but not always very useful.

One thing I noticed while making difference lists is that the differences are generally substantially smaller than the original numbers. Take this list of 10 random numbers between 1 and 100, inclusive

(65 79 77 29 9 2 75 55 89 63)     average value => 54.3

The difference list, using absolute values, is

(14 2 48 20 7 73 20 34 26)     average value => 27.1

This example seems to point to a 2:1 ratio between the average value of the original list and the average value of the difference list, so I ran a couple of thousand simulations, and the resulting data were quite different. The average value of a difference list constructed from 10000 whole numbers chosen from 1 to 100 was 33.1. When the numbers were chosen from 1 to 300, the average difference was 99.5. Now I put forth the question: given two numbers chosen randomly from 1 to N, what is the expected value of their difference?

From the simulations, I knew I was looking for something in the range of N/3. Here’s how I worked it out.

My method was to build up the expected difference from probabilities. Think about picking the first number, then working out the likelihoods of the second number being a certain distance from the first. Let N = 100 for now, and say the first number picked is 30. If we’re working between 1 and 100, and the first number picked is 30, the distance can’t very well be over 70, can it? Every other difference between 0 and 70 is still possible, but the differences 1 through 30 are especially likely. If is a possible difference somewhere between 1 and 30, then 30 – k and 30 + k are both candidates for a second number which would produce a difference of k. So any number less than the first number picked is twice as likely as other numbers to be the difference. In fact, this needs a little refining: if 70 is the first number picked, not all numbers under 70 are twice as likely, but we’ll see that our formula accounts for this nicely.

So here’s the strategy: there’s a 1/N chance of the first draw being a certain number k. Then we add up probabilities, with each possible difference having a 1/N baseline chance, and numbers that could be a difference on either side of k having a extra 1/N chance of being the difference. The formula looks like this:

\displaystyle\frac{1}{N}\sum_{k=1}^{N}\frac{1}{N}\left(\sum_{j=1}^{N-k}j + \sum_{j=1}^{k-1}j\right)

A bit of explanation: if k is less than or equal to N/2, then all the numbers from 0 to N - k are possible differences, and all numbers from 1 to k - 1 get the extra, bonus chance of being the difference. If is greater than N/2, then all the numbers from 0 to k - 1 are possible differences, while all the numbers from 1 to N – k get the bonus chance. Since terms where j = 0 are worth nothing, we can let both sums start at 1 and take care of both cases! Their roles simply switch depending on the value of k.

I arrived at this formula several different ways, so I’m pretty sure it’s accurate. From here, it’s some fun algebra and use of the formulas for sums:

\displaystyle\frac{1}{N}\sum_{k=1}^{N}\frac{1}{N}\left(\sum_{j=1}^{N-k}j + \sum_{j=1}^{k-1}j\right)

\displaystyle\frac{1}{N^2}\sum_{k=1}^{N}\left(\sum_{j=1}^{N-k}j + \sum_{j=1}^{k-1}j\right)

\displaystyle\frac{1}{N^2}\sum_{k=1}^{N}\left(\frac{(N-k)(N-k+1)}{2} + \frac{(k-1)k}{2}\right)

\displaystyle\frac{1}{2N^2}\sum_{k=1}^N\left(2k^2 - 2k - 2Nk + N^2 + N\right)

\displaystyle\frac{1}{N^2}\left(\sum_{k=1}^{N}k^2 - (N+1)\sum_{k=1}^{N}k + \sum_{k=1}^N\frac{N^2 + N}{2}\right)

\displaystyle\frac{1}{N^2}\left(\frac{N(N+1)(2N+1)}{6} - \frac{N(N+1)(N+1)}{2} + \frac{N^3 + N^2}{2} \right)

\displaystyle\frac{1}{N}\left(\frac{2N^2 - 2}{6}\right)

\displaystyle\frac{N^2 - 1}{3N}

I rather like this. It might as well be N/3 for large values of N, but there’s a big difference when N is small. What if you have to know the average difference between two standard six-sided dice when thrown? Sure, 2 is a fine guess, but I’d have to say that 1.9444… works even better.

Can you think of a better proof of this, or a mistake in mine? Leave me a comment!

About these ads
5 comments
  1. Kat said:

    Hey, this is really cool.

  2. LNsong said:

    I love how you did this for fun. I need to appreciate mathy beauty more…

    Here’s a cool math fact I learned from Casson… at any point in time, there is a pair of antipodal points on the earth that have the exact same pressure and temperature (this comes from the assumption that temperature and pressure are continuous maps, and follows from the Brouwer Fixed Pt Theorem).

  3. Eric Thoma said:

    You definitely should keep posting more interesting stuff like this. The people over at this math community (http://www.reddit.com/r/math/comments/vusut/finding_the_average_difference_between_two_random/) took a look at your post. They have what they claim to be a simpler solution:

    “Let I, J be independent discrete uniform in range 1…n. Then E[|J-I|]] = n-2 sum_{1 <= i, j <= n} |j-i|. If you write |j-i| in a matrix, then the main diagonal is all zeros, the diagonal above that is all ones, and so on, and the matrix is its own transpose. So the sum is simply twice the sum of the upper triangle. In the triangle there are n-1 diagonals d = 1..n-1, and diagonal d has n-d copies of the value d in it. So the total is sum_{d=1}{n-1} (n-d)d, which simplifies to sums over d and d2, for which closed forms are known. Then just simplify."

  4. allenvarney said:

    What would be the average difference between two dice with different numbers of sides, such as one six-sided and one 20-sided? Is there a general answer?

  5. Kevin said:

    “in church, where there were numbers hanging on the walls (Bible verses, I assume)”

    They would be the numbers of the hymns to be sung from the hymn book, not Bible verses.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: