Have you ever taken a list of numbers, and produced a new list by taking the difference between each adjacent pair? For instance:
1 4 9 16 25 => 3 5 7 9
It’s easy to see that with the second list, along with the first number of the original list, we can reconstruct the entire original list. In fact, if we notice a pattern with the second list, we can add a number there and extend the original list
3 5 7 9  => 1 4 9 16 25 
I used to do this in church, where there were numbers hanging on the walls (Bible verses, I assume). By building these “difference lists,” I decided I could predict the next number in any sequence. Does it work? Well, the results are often interesting but not always very useful.
One thing I noticed while making difference lists is that the differences are generally substantially smaller than the original numbers. Take this list of 10 random numbers between 1 and 100, inclusive
(65 79 77 29 9 2 75 55 89 63) average value => 54.3
The difference list, using absolute values, is
(14 2 48 20 7 73 20 34 26) average value => 27.1
This example seems to point to a 2:1 ratio between the average value of the original list and the average value of the difference list, so I ran a couple of thousand simulations, and the resulting data were quite different. The average value of a difference list constructed from 10000 whole numbers chosen from 1 to 100 was 33.1. When the numbers were chosen from 1 to 300, the average difference was 99.5. Now I put forth the question: given two numbers chosen randomly from 1 to N, what is the expected value of their difference?
From the simulations, I knew I was looking for something in the range of N/3. Here’s how I worked it out.
My method was to build up the expected difference from probabilities. Think about picking the first number, then working out the likelihoods of the second number being a certain distance from the first. Let N = 100 for now, and say the first number picked is 30. If we’re working between 1 and 100, and the first number picked is 30, the distance can’t very well be over 70, can it? Every other difference between 0 and 70 is still possible, but the differences 1 through 30 are especially likely. If k is a possible difference somewhere between 1 and 30, then 30 – k and 30 + k are both candidates for a second number which would produce a difference of k. So any number less than the first number picked is twice as likely as other numbers to be the difference. In fact, this needs a little refining: if 70 is the first number picked, not all numbers under 70 are twice as likely, but we’ll see that our formula accounts for this nicely.
So here’s the strategy: there’s a 1/N chance of the first draw being a certain number k. Then we add up probabilities, with each possible difference having a 1/N baseline chance, and numbers that could be a difference on either side of k having a extra 1/N chance of being the difference. The formula looks like this:
A bit of explanation: if k is less than or equal to N/2, then all the numbers from 0 to N - k are possible differences, and all numbers from 1 to k - 1 get the extra, bonus chance of being the difference. If k is greater than N/2, then all the numbers from 0 to k - 1 are possible differences, while all the numbers from 1 to N – k get the bonus chance. Since terms where j = 0 are worth nothing, we can let both sums start at 1 and take care of both cases! Their roles simply switch depending on the value of k.
I arrived at this formula several different ways, so I’m pretty sure it’s accurate. From here, it’s some fun algebra and use of the formulas for sums:
I rather like this. It might as well be N/3 for large values of N, but there’s a big difference when N is small. What if you have to know the average difference between two standard six-sided dice when thrown? Sure, 2 is a fine guess, but I’d have to say that 1.9444… works even better.
Can you think of a better proof of this, or a mistake in mine? Leave me a comment!