Hurray! ! ! Today’s entry is not about SQL!
Well, not directly, anyway. I needed to do this for a problem I was solving in SQL, but the solution is not specific to SQL.
So given an integer x
, how can you find the values of the highest (leftmost) and/or lowest (rightmost) bits that are set? Can it done without looping or bit-by-bit comparisons?
YES
If you know you will only be dealing with positive numbers, finding the indices of the high and low bits can be done using:
highbit = FLOOR(LOG(x) / LOG(2)) lowbit = LOG(x & -x) / LOG(2) C# users: highbit = Math.Floor(Math.Log(x, 2)); lowbit = Math.Log(x & -x, 2)
where the bits
will be numbered from the rightmost bit starting at 0.
If you think you may also be processing negative numbers then you also need to know the number of bits that the integer is being stored in.
bits = 8 highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2)) lowbit = LOG(x & -x) / LOG(2) C# users: highbit = Math.Floor(Math.Log(x & (int)Math.Pow(2, bits) - 1, 2)); lowbit = Math.Log(x & -x, 2);
In both cases, test for a value of 0 first, as highbit
and lowbit
will both be undefined for 0.
So, how does this work? Warning: math ahead !
First, let’s look at the logarithmic (LOG
) operations:
A common trick used in programming is to set a bit to 1 by OR
ing a value with 2(index of the bit)
.
flags = flags | (2 ^ bitnum)
This works because ny
is 1 followed by y 0s in base n.
27 = 010000000 in binary (1 followed by 7 0s) 104 = 10000 in decimal (1 followed by 4 0s)
Since logarithm is the inverse of exponentiation, then the log of ny
in base n is y
LOG(ny)n = y e.g.: Log(27)2 = Log(128)2 = 7 Log(104)10 = Log(10000)10 = 4
So LOG(x)n
is the number of digits (less 1) in x when x is represented in base n. Which is another way of saying the index of the highest digit (starting from 0).
If x is not a power of n then LOG(x)n still returns the number of digits, but that number is now a non-integer. So we apply FLOOR to get the closest lower integer value. Therefore
FLOOR(LOG(x)n)
gets us index of the highest digit.
e.g.: FLOOR(LOG(0110012)2) = FLOOR(LOG(25)2) = FLOOR(4.64385619) = 4 (011001 has 5 significant digits) FLOOR(LOG(2001)10) = FLOOR(3.301247089) = 3 (2001 has 4 digits)
But why divide by LOG(2)?
Well, unlike C#, most languages do not have a convenient way of calculating LOG(x)n
directly. In most programming languages, LOG(x)
is the natural logarithm of x (log of x in base e: LOG(x)e
).
Fortunately, it doesn’t matter what base your system uses for LOG()
. It just so happens that
LOG(x)a / LOG(b)a = LOG(x)b
So we can get LOG(x)2
by dividing LOG(x)
by LOG(2)
regardless of what base is used by the system. Hence
highbit = FLOOR(LOG(x) / LOG(2))
So wherever you see LOG(x) / LOG(2)
think ‘index of the highest bit in x’.
For the lowbit
, before getting the LOG() we AND
the value with the negative of the value.
(x & -x)
Why?
Because of the way computers represent negative numbers. Almost all systems use the two’s complement of a number to represent the negative of a number. The two’s complement is the number subtracted from 2b
where b
is the number bits of storage being used.
e.g. 28 is 100000000 28 stored in 8 bits is 00011100 10000000 - 00011100 = 11100100 = the two's complement of 28 = -28
The two’s complement can also be seen as the binary representation of the number with all of the bits flipped, plus 1.
e.g. 28 stored in 8 bits is 00011100 flipping the bits gives 11100011 + 1 = 11100100 = the two's complement of 28 = -28
This value has an interesting property:
Since it is the original number x
with all the bits flipped (e.g. 11100011), AND
ing it with x
would produce 0. However, since 1 is added, all the consecutive 1 bits from the right (0 bits in the original x
) are flipped to 0 and the rightmost 0 (1 in x
) is flipped to 1. So, in the two’s complement:
- all digits to the left of
x
‘s lowest bit are flipped: whenAND
ed with
x
, they will result in 0s. - all digits to the right of
x
‘s lowest bit are 0: whenAND
ed with x, they will result in 0s.
x
‘s lowest bit is set to 1: whenAND
ed with
x
, it will be set to 1.
e.g. 28 stored in 8 bits is 00011100 -28 is 11100100 AND gives 00000100
Taking the LOG(x) / LOG(2)
of this result gives us the index of the x
‘s lowest bit. And since one, and only one, bit is set to 1 in the result, the result is always a power of 2 and we do not need to use the FLOOR()
function. So:
lowbit = LOG(x & -x) / LOG(2)
Handling negative numbers:
The last item to explain is why, when dealing with possible negative numbers, the formula for the highbit
changes from
highbit = FLOOR(LOG(x) / LOG(2))
to
highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2))
This is because we cannot take the LOG()
of a negative number: the result is an imaginary number. So we have to engage in a little skullduggery.
Negative numbers (as we saw above) are stored as two’s complements of the corresponding positive number. As a result of this, all positive numbers begin with 0 in the leftmost bit and all negative numbers have 1 there. In other words, the leftmost bit is just a sign bit and not actually part of the number. It should be ignored when determining the highest order bit for either positive or negative values.
To do this we create a mask: 2(number of bits) - 1
. This will give us a value equal to 1 in every bit except the leftmost one. When we AND
this value with x
, we effectively eliminate the leftmost bit.
-17 in 16 bits is 11111111111101111 216 - 1 is 01111111111111111 AND gives 01111111111101111 (the original value without the leftmost bit) +17 in 16 bits is 00000000000010001 216 - 1 is 01111111111111111 AND gives 00000000000010001 (the original value without the leftmost bit)
Since the result of this operation always has a 0 in the leftmost bit, it is seen as a positive number by the system and can safely be used in the LOG() operation. Since the remaining bits have the same bit pattern as the original value, our formula will return the correct highbit
value for the original number.
We do not need to do this for the lowbit
because we AND
the number with its negative, which ensures that the leftmost bit is always AND
ed with its opposite, thus always resulting in 0 in the leftmost bit, and so, the result is safe to use with LOG()
.