OVERHEAD

Unique, order preserving IDs for any number of positive integers

Scripting

Update

Since I originally wrote this blog post, I have discovered an even better solution which satisfies a new fifth condition I didn’t think was achievable. If you already read this blog post pre-update, skip ahead to the last section to see the new solution.

Required conditions

The bit-shift solution was inspired by this Stackoverflow post.

The solution linked above does not establish nor require the between-integers size-ordering constraint like this blog post does, but instead, hardcodes a maximum integer size ahead of time. Use whichever solution is more suitable for you.

Let’s say you have three positive integers (not including zero, but you can easily add one to the integers to handle that case), labelled a, b, & c, where these integers follow a size-ordering constraint of c >= b >= a. You want to identify these three integers by a single integer identifier which satisfies four conditions:

  1. The ID is unique to the three integers.
  2. If put in a list or table, the ID is in the same relative position as the three integers would be.
    • 1-1-1 comes before 2-1-1, which comes before 2-2-1, etc.
  3. The ID is a reasonably small number that is suitable to be used as a key.
  4. The ID is relatively easy to compute, or at least faster than a typical hash function.
  5. (Optional) If put in a list or table, the IDs leave no gaps or unused integers.

Prime-based solution

To demonstrate what the three-to-one conversion actually looks like in practice, here’s what computing 2^c * 3^b * 5^a as the ID gives you:

Prime-based ID sequence for the range 1-1-1 to 4-4-4.

1-1-1 -> 30
2-1-1 -> 60
2-2-1 -> 180
2-2-2 -> 900
3-1-1 -> 120
3-2-1 -> 360
3-2-2 -> 1800
3-3-1 -> 1080
3-3-2 -> 5400
3-3-3 -> 27000
4-1-1 -> 240
4-2-1 -> 720
4-2-2 -> 3600
4-3-1 -> 2160
4-3-2 -> 10800
4-3-3 -> 54000
4-4-1 -> 6480
4-4-2 -> 32400
4-4-3 -> 162000
4-4-4 -> 810000

Unfortunately, this only satisfies the first of the outlined conditions, as these IDs don’t preserve order, they grow in size really quickly, and the ID is increasingly difficult to compute the greater the three integers get.

As it turns out however, the idea of using exponents, and specifically two to the power of some other number, is heading in the right direction…

Bit-shift solution

To cut to the chase, here’s a Lua script that demonstrates this solution:

Prime-based Lua script for printing the IDs of three integers ranging from start, start, start to upto, upto, upto, where start & upto are the minimum and maximum integers.

#!/bin/lua

local a, b, c, id
local ra, rb, rc, bits

-- Smallest integer of a, b, and c.
local start = 1

-- Largest integer of a, b, and c.
local upto = 4


for n = start, upto do
	c = n
	for n = start, upto do
		-- These break lines ensure 'c >= b >= a' is true.
		if n > c then break end

		b = n
		for n = start, upto do
			if n > b then break end

			a = n

			bits = math.ceil(math.log(c, 2))
			ra = a
			rb = b << bits
			rc = c << bits * 2
			id = ra + rb + rc

			-- Swap these print lines when testing the IDs properties.
--			print(string.format("%s", id))
			print(string.format("%s-%s-%s -> %s", c, b, a, id))
		end
	end
end

To verify that this works, swap the print line for the one which only prints the ID, and then (on a Linux system) run the commands ./script.lua | sort -cn to test the ordering and ./script.lua | sort -n | uniq -d to test that the IDs never repeat.

Explanation

The << syntax in Lua means we’re performing a left bit-shift operation, where all the bits in a given sequence are moved to the left. For instance, doing one left bit-shift on the bit sequence 00000001 would yield 00000010, which is the same operation as performing integer * 2^shift (in other words, each left bit-shift doubles the number).

If bit-shifting is unavailable to you, you could do b * 2^bits & c * 2^(bits * 2) instead, however, remember that math operations are slower than bit manipulation, even if both yield the same answers in nearly all cases.

The math.ceil(math.log(c, 2)) line computes the minimum number of bits needed to represent the largest integer we’ve been given, which because of the size condition of c >= b >= a is always the value of the integer c.

Left bit-shifting b by bits bits, c by bits * 2 bits, and adding them to integer a forms a new bit string where each integer is allocated its own non-overlapping range (e.g. 2-1-1 is the bit string 10 01 01). More simply, the three integers have been concatenated together as binary.

Naturally then, if concatenating the three integers into a string satisfies the uniqueness and ordering properties, as said string is literally what the script prints beside each ID to test if their unique and ordered, then interpreting the computed bit string as an integer will always yield an integer which itself is unique and preserves the same ordering.

…And more?

Ah right, another nice property of the bit-shifting solution is that it can be generalised to any number of integers; all it takes to find the ID for four integers (where d >= c >= b >= a) is to amend the ID formula to be written as:

bits = math.ceil(math.log(d, 2))
ra = a
rb = b << bits
rc = c << bits * 2
rd = d << bits * 3
id = ra + rb + rc + rd

Triangular & tetrahedral numbers solution

In an ideal world, the tuple-to-ID conversion would leave no gaps between ID values, like this:

Ideal ID sequence for the range 1-1-1 to 4-4-4, where the IDs increase from 1 to 20.

1-1-1 -> 1
2-1-1 -> 2
2-2-1 -> 3
2-2-2 -> 4
3-1-1 -> 5
3-2-1 -> 6
3-2-2 -> 7
3-3-1 -> 8
3-3-2 -> 9
3-3-3 -> 10
4-1-1 -> 11
4-2-1 -> 12
4-2-2 -> 13
4-3-1 -> 14
4-3-2 -> 15
4-3-3 -> 16
4-4-1 -> 17
4-4-2 -> 18
4-4-3 -> 19
4-4-4 -> 20

Fortunately, the c >= b >= a size condition allows us to do exactly this. Here’s a solution which uses triangular & tetrahedral number sequence formulas to find unique, ordered, and gapless ID values:

Triangular & tetrahedral numbers-based Lua script for printing the IDs of three integers ranging from start, start, start to upto, upto, upto, where start & upto are the minimum and maximum integers.

#!/bin/lua

local a, b, c, id, tri, tet

-- Smallest integer of a, b, and c.
local start = 1

-- Largest integer of a, b, and c.
local upto = 4


-- Generates a unique number given a, b, and c.
for n = start, upto do
	c = n
	for n = start, upto do
		if n > c then break end
		b = n
		for n = start, upto do
			-- These break lines ensure 'c >= b >= a' is true.
			if n > b then break end
			a = n

			tri = (b * (b + 1)) / 2
			tet = (c * (c - 1) * (c + 1)) / 6

			-- math.floor() is purely for removing the '.0' part.
			id = math.floor(tet + tri - (b - a))

			-- Swap these print lines when testing the IDs properties.
--			print(string.format("%s", id))
			print(string.format("%s-%s-%s -> %s", c, b, a, id))
		end
	end
end

Like the bit-shift solution, this can be extended to include more integers, assuming a similar size constraint such as d >= c >= b >= a, by including equivalent formulas for higher dimensional shapes. For a fourth number, d, you would include the pentatope number sequence formula.