Encoding multiple values in a single number

The Problem:

There have been countless times when I have needed to combine multiple values into a single number. This could be because of storage space constraints, the need to conform to a restricted format that does not allow the multiple values to be supplied separately, transmission efficiency, or a host of other reasons.

The usual way that I have seen this done, and I have used myself, is to multiply each of the values by an appropriate power of 10 and add the numbers together, For instance, if I was storing a 1-digit number (3), a 2-digit number (62), and 2 more 1-digit numbers (9, 4), I could just use

(v1 * 10⁴) + 
(v2 * 10²) + 
(v3 * 10¹) * 
(v4 * 10⁰) => x

(3 * 10000) + 
(62 * 100) + 
(9 * 10) + 
(4 * 1) => 36294

This is easy enough. However, a downside of this method is the operations needed to extract the individual numbers:

v1 = floor(x / 10⁴) 
v2 = floor((x % 10⁴) / 10²) 
v3 = floor((x % 10²) / 10¹) 
v4 = x % 10

v1 = floor(36294 / 10000) => 3
v2 = floor((36294 % 10000) / 100) => 62
v3 = floor((36294 % 100) / 10) => 9
v4 = 394 % 10 => 4

The method is simple enough and I have utilized it countless times. But I never liked the complexity of the extractions (especially when dealing with storing more that 2 or 3 values). I wondered if there was a different, and perhaps, better way.

Is There A Solution?

There is, but, while it is easier to use, it is much harder to set up. It uses the Chinese Remainder Theorem.

Firstly though: I am not a mathematician and understanding this does require some math skills beyond basic chequebook balancing. It took me some time to figure out what all the terminology and symbols meant before I could understand how this worked. That is one reason for this blog entry: to allow me to re-understand all this in the future (i.e. the day after tomorrow).

For example, the first line of the Wikipedia entry for the Chinese Remainder Theorem reads

In number theory, the Chinese remainder theorem states that if one knows the remainders of the Euclidean division of an integer n by several integers, then one can determine uniquely the remainder of the division of n by the product of these integers, under the condition that the divisors are pairwise coprime.

Did you follow that? I didn’t! I figured it out and I still can’t follow that!

I’ll try to explain some of the terminology and I will avoid all the math symbols that I can (however, I will still be using % to represent modulo). And please note, I am not a mathematics teacher and I will not be explaining in detail the Chinese Remainder Theorem, the Extended Euclidean Algorithm, Modulo Arithmetic, nor any of the mathematical techniques being used. I hope this article may help you if you are trying to understand any of them, but that was not my goal in writing this.

But first, some terminology.

Modulo:
This is one I thought I knew: the remainder after a division, right? Wrong. Modulo and Remainder are different operations that give different results when dealing with negative numbers. This one bit me in the butt when I was programming (since most programming languages have a % operation (or Mod function), which they call a “modulo operation”, but most often is a “remainder operation”).
So, what is a modulo operation? The simple answer is that gives a set of signed values equivalent to the possible remainders after divisions. The actual answer is a little fuzzy and I won’t get into it here. You can look it up on Wikipedia, or read Rob Conery’s excellent summary here.

So, while we programmers are used to seeing

7 % 5 = 2

-7 % 5 = -2

7 % -5 = 2

the better answer is

7 % 5 = 2, 7, 12, 17...

-7 % 5 = you don't want to know

7 % -5 = OMG ! ! !

For the rest of this post, a modulo operator, %, is assumed to return the lowest positive remainder after a division (formally: the remainder of the Euclidean division), so 7 % 5 = 2.

Modulus:
This is another one I always got wrong. I was in the habit of referring to the % symbol as the ‘modulus‘ operator (in fact, many programming languages do this too). It is actually the modulo operator. The modulus is the the divisor in the modulo operation. In 7 % 5 = 2, 5 is the modulus.

Coprime:
I’d seen this term, but until now, I never had to learn what it meant. Simply put, if a set of numbers do not share any prime factors then they are coprime. 10 and 9 are coprime since 10’s prime factors (2, 5) do not appear in 9’s prime factors (3). 10 and 12 are not coprime since 2 appears in both 10’s prime factors (2, 5) and 12’s prime factors (2, 3).

Modular Multiplicative Inverse:
This one is key to this method. The modular multiplicative inverse can be thought of this way: given two numbers m and n, what number, y, can we multiply m by so that the result is 1 greater than a multiple of n (i.e. y*m % n = 1). So for m=7 and n=5, a modular multiplicative inverse could be 3, since (3*7) % 5 = 21 % 5 = 1. But 8 and 13 would also work as inverses, as would a whole series of other numbers.

A Simple Example:

The purpose of this example is to show the primary concepts involved in this method of encoding multiple values. Using the Chinese Remainder Theorem for this problem is definitely overkill, but I will do it anyway.

In this example two teams are competing in a set of 7 games and we wish to record the number of games won so far by each. We want to record both scores in a single number, yet be able to get each score out of that number.

So, in this case we have 2 values (v1, v2), each of which can have 1 of 8 values (0-7) that we wish to encode in a single number, x.

The first step is pick a number that is larger than v1‘s range. We will use 9. This will be the modulus that we will use to get v1‘s values from x. We will call this value n1.

Next, we pick a second number that is larger than v2‘s range, but it must be coprime with n1. We can use 10. So n2 = 10.

Note that there is nothing special about 9 and 10 (other than they meet the criteria that they are large enough to hold the various values of v1 and v2 and they are coprime). We could have chosen 8 and 51, or 240 and 11. But let’s stick with 9 and 10.

So that we can work this out, let us assign values to v1 and v2. The first team has won 3 games, and the second team has won 2.

v1 = 3, n1 = 9
v2 = 2, n2 = 10

Our goal is find a number x so that x % 9 will give us a value of 3, and x % 10 will give us 2. To do this we need two more numbers that we will label m1 and m2. Each of these numbers will be the product of all the n#s for the other values. An easy way to obtain these numbers is get the product of all the n# values and divide out the particular n# that corresponds to the v#:

m1 = (n1*n2) / n1
m2 = (n1*n2) / n2

m1 = (9*10) / 9 = 10
m2 = (9*10) / 10 = 9

Since we only have 2 values, it is easy to see that the m# for one value is the n# of the other. But this won’t be the case when we have more than 2 values later.

For each value, we also want to find a modular multiplicative inverse of m# and n#. There is a process to find this number, but I won’t show that until we are working on the more complex example below. Besides, you have probably already worked out the easy-to-determine inverses for 10 and 9 (y=1) and 9 and 10 (y=9).

y1 = 1 : (m1*y1) % n1 = (10*1) % 9 = 1
y2 = 9 : (m2*y2) % n2 = (9*9) % 10 = 1

However, though we have worked out m1, m2, y1, and y2, those are not the numbers we are actually interested in. The numbers we want to remember are m1*y1 and m2*y2:

z1 = y1*m1 = 10*1 = 10
z2 = y2*m2 = 9*9 = 81

So we have

v1 = 3, n1 = 9,  m1 = 10, y1 = 1, z1 = 10
v2 = 2, n2 = 10, m2 = 9,  y2 = 9, z2 = 81

Why do we need all these numbers? What are they for?
– We know v1 and v2 are the values we are storing.
– n1 and n2 will be used to get our v1 and v2 out of x (once we determine what x is).
– m1, m2, y1 and y2 were only need to calculate z1 and z2. We can ignore them from now on.
– z1 and z2 are new numbers that we will use to encode v1 and v2 in x.

Our final value x can now be calculated by getting the sum of the products of v# and z#.

x = v1*z1 + v2*z2 = 3*10 + 2*81 = 30 + 162 = 192

We finally have x, 192. When we wish to extract v1 or v2 from x, we can just do a modulo operation with n1 or n2

v1 = x % n1 = 192 % 9 = 3
v2 = x % n2 = 192 % 10 = 2

Tada! That was a lot of work just to encode two numbers, I agree. But, now that we have n1, n2, z1 and z2, we can easily encode and retrieve any v1 and v2.

v1 = 5, v2 = 1
x = 5*10 + 1*81 = 131

v1 = 131 % 9 = 5
v2 = 131 % 10 = 1

Okay, but why does this work? The secret is in how we derived z1 and z2.

Let’s look at z1:
z1 10 is m1*y1, and we know that (m1*y1) % n1 = 1 (10 % 9 = 1). Note then that if we multiply z1 by 2 we will double the result when we modulo with n1. Multiply by 3 and we triple the result:

2*(m1*y1) % n1 = 2
3*(m1*y1) % n1 = 3
...

2*(10) % 9 = 2
3*(10) % 9 = 3
...

This is key to how we will encode v1 3. We multiply v1*z1 3*10 and end up with a number 30 that modulo n1 9 gives us v1 3. The same relationship applies to v2, z2 and n2.

But we added the products together (v1*z1 + v2*z2). How does that not mess up the modulo operations?

Again, looking at z1:
z1 10 is the product of m1 and y1, and m1 10 is a multiple of all the other n# values (in our case, just the one: n2=10). That means:
– z1 10 is a multiple of the other n#s (10).
– v1*z1, also, is always a multiple of the other n#s, so (v1*z1) % n# is always 0 (e.g: 30 % 10 = 0).
– we can add as many z1s 10s as we want to into x and it won’t change the result of x % n2.
– adding the product of z1 10 and any number to x only affects the result of x % 9, and never affects x % 10.

Similarly, z2 81 is a multiple of n1 9, so we can add as many 81s as we want into x (changing the result of x % 10) without ever changing x % 9.

Since each of these products (v#*z#) can be added into x without affecting the result when a modulo operation is done with a different n#, then we can safely add the products together to get x.

A More Complicated Example (including the determination of y#):

In this case, we wish to store multiple values, each with differing ranges. In the case that I was working on, I was storing a game state which had to include the following information:
– v1: A value indicating the current number of players (range: 1-4)
– v2: A value indicating the current player number (range: 1-4)
– v3: A value indicating the number of rounds played so far (range: 1-50)
– v4: A value indicating the step in the current round (range: 1-35)
– v5: A value indicating the points collectively earned by the players (range: 1-50)

First, we obtain the numbers we will use to extract the v# values from x:

n1 = 5
n2 = 7
n3 = 53
n4 = 37
n5 = 59

Note: In this example, all the n# are primes. As we saw in the first example, this is not necessary: the requirement is that all the n# are coprime with each other. However, prime numbers are, by definition, coprime with all other prime numbers, and it is easier to pick prime numbers from a list than to examine the prime factors of different values in a search for suitable n#s.

We can create the m# values

n1*n2*n3*n4*n5 = 4049465

n1 = 5,  m1 = 4049465 / 5 = 809893   (= n2*n3*n4*n5)
n2 = 7,  m2 = 4049465 / 7 = 578495   (= n1*n3*n4*n5)
n3 = 53, m3 = 4049465 / 53 = 76405   (= n1*n2*n4*n5)
n4 = 37, m4 = 4049465 / 37 = 109445  (= n1*n2*n3*n5)
n5 = 59, m5 = 4049465 / 59 = 68635   (= n1*n2*n3*n4)

Next we need the y# values, the modulo multiplicative inverses for each m# and n#. As you can see, it is going to be a little harder to work this one out mentally. To find these values we must employ the Extended Euclidean Algorithm.

I used the Euclidean Algorithm in an earlier post. This extended version is similar and just adds the calculation of two addition values (one of which we will not be using and is not included in the code below).

public double MMI(double m, double n) 
{ 
  double r1 = m;
  double r2 = n;
  double s1 = 1;
  double s2 = 0;
  double r, s, q, gcd, mmi;
  
  while (r1 != 0) 
  {
    q = Math.Floor(r1 / r2); 
  
    r = r1;
    r1 = r2;
    r2 = r - q * r2;  
  
    s = s1;
    s1 = s2;
    s2 = s - q * s2;
  } 
  gcd = r; // this is the greatest common divisor of m and n
  mmi = s + ((s < 0) ? n : 0); // this is the modulo multiplicative inverse for m and n
  return mmi;
}

The value gcd is not used in our operations and can be ignored. The value mmi is our desired y#. Using this method we can obtain

n1 = 5,  m1 = 809893, y1 = 2,  z1 = 809893 * 2  = 1619786
n2 = 7,  m2 = 578495, y2 = 1,  z2 = 578495 * 1  = 578495
n3 = 53, m3 = 76405,  y3 = 5,  z3 = 76405 * 5   = 382025
n4 = 37, m4 = 109445, y4 = 36, z4 = 109445 * 36 = 3940020 
n5 = 59, m5 = 68635,  y5 = 23, z5 = 68635 * 23  = 1578605

With our set of z#s we can now encode any set of 5 v#s into an x.

To encode that player 3 of 4 is on step 12 after 6 rounds and that the players have accumulated 22 points:

x = v1*z1     + v2*z2    + v3*z3    + v4*z4      + v5*z5 
x = 4*1619786 + 3*578495 + 6*382085 + 12*3940020 + 22*1578605
x = 6479144   + 1735485  + 2292510  + 47280240   + 34729310 = 92516329

So the number 92516329 encapsulates our game state. Decoding the values just needs the use of our n#s:

numPlayers = x % n1 = 4
player     = x % n2 = 3
round      = x % n3 = 6
step       = x % n4 = 12
points     = x % n5 = 22

One reason why I like this method is illustrated above. The modulus operations are all independent of each other. I do not need to involve any other values, their ranges, their storage sizes in x, etc. in order to obtain a value. If I want v4, I just modulo x with n4.

And to show how and why this works, let’s repeat the logic we used from the first example:
Let’s look at z1:
z1 1619786 is m1*y1, and we know that (m1*y1) % n1 = 1 (1619786 % 5 = 1). Note then that if we multiply z1 by 2 we will double the result when we modulo with n1. Multiply by 3 and we triple the result:

2*(m1*y1) % n1 = 2
3*(m1*y1) % n1 = 3
...

2*1619786 % 5 = 2
3*1619786 % 5 = 3
...

This is how we encoded v1 4. We multiplied v1*z1 4*1619786 and ended up with a number 6479144 that modulo n1 5 gives us v1 4. The same relationship applies to v2, z2 and n2, v3, z3 and n3, etc.

To show that all the products can be added to create a viable x, we will look at z1 again:
z1 1619786 is the product of m1 and y1, and m1 809893 is a multiple of all the other n# values (n2*n3*n4*n5 = 809893). That means:
– z1 16198786 is a multiple of each of the other n#s (7, 53, 37, and 59).
– v1*z1 is always a multiple of each of the other n#s, so (v1*z1) % n# is always 0 (e.g.: 6479144 % 7 = 0, 6479144 % 53 = 0, 6479144 % 37 = 0, 6479144 % 59 = 0).
– we can add as many z1s 1619786s as we want to into x and it won’t change the result of x % n2, x % n3, x % n4, or x % n5.
– adding the product of z1 1619786 and any number to x only affects the result of x % 5, and never affects x % 7, x % 53, x % 37, or x % 59.

Similarly, z2 578495 is a multiple of n1, n3, n4 and n5 so we can add as many 578495s as we want into x (changing the result of x % 7) without ever changing x % 5, x % 53, x % 37, or x % 59. The same relationships applies to z3, z4 and z5.

Since each of these products (v#*z#) can be added into x without affecting the result when a modulo operation is done with a different n#, then we can safely add all the products together to get x 92516329.

Conclusion:

As I said earlier, this method is harder to set up than others (which is probably why I have not seen it being used before). But once the n# and z# numbers have been determined, it is easier and cleaner to use than any other method I have seen to date. The method works for 2 values, 5 values, 10 values, or however many values you can fit in whatever data type you are using. And it nicely handles sets of numbers with different ranges. It allows independent extraction of values. And I find that the non-obvious value x deters others from trying to tamper with stored values.

I hope that these examples will be enough to understand how this method works and how to make use of it.

Edit:

I can’t believe it! I got caught by the ‘does % mean modulo or remainder‘ gotcha again!

C#, the language I use the most, and which I used to write the example code to find the modular multiplicative inverse above, is another language that incorrectly says that the % operator is a modulo operator! It is not. % in C# returns the remainder, not the result of a modulo operation (how have I used this language for 2 decades and not run across this before?)

To fix this, I made a small correction to the MMI function above, which I highlighted in yellow. This change just ensures that the value returned for the z#s is not negative. As long as you don’t use negative v#s then this should address the problem. I have also changed the values shown for y4, z4 and x.

2 thoughts on “Encoding multiple values in a single number”

Pingback: Avoiding unnecessary updates in React Context - TechTalk7
1. cpottinger says:
  
  2023/09/25 at 07:11
  
  […] values are cyclic, although of different cycles. The cycles are known, though. I guess I could encode the values into a single integer but this seems a bit insane to do every 30ms and also seems overly complex for such a simple […]
  
  I am not sure of the context of your comment, so forgive me if my interpretations are incorrect.
  
  If you are referring to re-calculating the z# values every 30 ms then: OMG! NO. The idea was to use this technique in situations where the ranges for the values are static. Calculation of the z# values would be done, typically, once, and the z# values would be used repeatedly to calculated the encoded values.
  
  If you are referring to calculating the encoded values every 30 ms then: I guess that would depend strongly on your application. The encoding step involves only N multiplication and N-1 addition operations and should not tax any modern CPU too heavily.
  However, in many of today’s high-volume, time-sensitive environments milliseconds can mean a lot. I would then ask myself “what is the relative cost difference between using this method and whatever other method I have to record the information?” e.g. Does it cost more than recording the numbers separately? Or multiplying them by powers of 10? Or converting them to strings and concatenating them?
  
  LikeLike