Update
Since I originally wrote this blog post, I have discovered an even better solution which satisfies a new fifth condition I didn’t think was achievable. If you already read this blog post pre-update, skip ahead to the last section to see the new solution.
Required conditions
The bit-shift solution was inspired by this Stackoverflow post.
The solution linked above does not establish nor require the between-integers size-ordering constraint like this blog post does, but instead, hardcodes a maximum integer size ahead of time. Use whichever solution is more suitable for you.
Let’s say you have three positive integers (not including zero, but you can easily add one to the integers to handle that case), labelled a, b, & c, where these integers follow a size-ordering constraint of c >= b >= a
. You want to identify these three integers by a single integer identifier which satisfies four conditions:
- The ID is unique to the three integers.
- If put in a list or table, the ID is in the same relative position as the three integers would be.
1-1-1
comes before2-1-1
, which comes before2-2-1
, etc.
- The ID is a reasonably small number that is suitable to be used as a key.
- The ID is relatively easy to compute, or at least faster than a typical hash function.
- (Optional) If put in a list or table, the IDs leave no gaps or unused integers.
- If this is a requirement for you, see the last solution.
Prime-based solution
To demonstrate what the three-to-one conversion actually looks like in practice, here’s what computing 2^c * 3^b * 5^a
as the ID gives you:
Unfortunately, this only satisfies the first of the outlined conditions, as these IDs don’t preserve order, they grow in size really quickly, and the ID is increasingly difficult to compute the greater the three integers get.
As it turns out however, the idea of using exponents, and specifically two to the power of some other number, is heading in the right direction…
Bit-shift solution
To cut to the chase, here’s a Lua script that demonstrates this solution:
To verify that this works, swap the
./script.lua | sort -cn
to test the ordering and./script.lua | sort -n | uniq -d
to test that the IDs never repeat.
Explanation
The <<
syntax in Lua means we’re performing a left bit-shift operation, where all the bits in a given sequence are moved to the left. For instance, doing one left bit-shift on the bit sequence 00000001
would yield 00000010
, which is the same operation as performing integer * 2^shift
(in other words, each left bit-shift doubles the number).
If bit-shifting is unavailable to you, you could do
b * 2^bits
&c * 2^(bits * 2)
instead, however, remember that math operations are slower than bit manipulation, even if both yield the same answers in nearly all cases.
The math.ceil(math.log(c, 2))
line computes the minimum number of bits needed to represent the largest integer we’ve been given, which because of the size condition of c >= b >= a
is always the value of the integer c.
Left bit-shifting b by bits bits, c by bits * 2
bits, and adding them to integer a forms a new bit string where each integer is allocated its own non-overlapping range (e.g. 2-1-1
is the bit string 10 01 01
). More simply, the three integers have been concatenated together as binary.
Naturally then, if concatenating the three integers into a string satisfies the uniqueness and ordering properties, as said string is literally what the script prints beside each ID to test if their unique and ordered, then interpreting the computed bit string as an integer will always yield an integer which itself is unique and preserves the same ordering.
…And more?
Ah right, another nice property of the bit-shifting solution is that it can be generalised to any number of integers; all it takes to find the ID for four integers (where d >= c >= b >= a
) is to amend the ID formula to be written as:
bits = math.ceil(math.log(d, 2))
ra = a
rb = b << bits
rc = c << bits * 2
rd = d << bits * 3
id = ra + rb + rc + rd
Triangular & tetrahedral numbers solution
In an ideal world, the tuple-to-ID conversion would leave no gaps between ID values, like this:
Fortunately, the c >= b >= a
size condition allows us to do exactly this. Here’s a solution which uses triangular & tetrahedral number sequence formulas to find unique, ordered, and gapless ID values:
Like the bit-shift solution, this can be extended to include more integers, assuming a similar size constraint such as d >= c >= b >= a
, by including equivalent formulas for higher dimensional shapes. For a fourth number, d
, you would include the pentatope number sequence formula.