Finding the highest and lowest set bits in a value.

Hurray! ! ! Today’s entry is not about SQL!

Well, not directly, anyway. I needed to do this for a problem I was solving in SQL, but the solution is not specific to SQL.

So given an integer x, how can you find the values of the highest (leftmost) and/or lowest (rightmost) bits that are set? Can it done without looping or bit-by-bit comparisons?

YES

If you know you will only be dealing with positive numbers, finding the indices of the high and low bits can be done using:

highbit = FLOOR(LOG(x) / LOG(2))
lowbit = LOG(x & -x) / LOG(2)

C# users:
highbit = Math.Floor(Math.Log(x, 2));
lowbit = Math.Log(x & -x, 2)

where the bits will be numbered from the rightmost bit starting at 0.

If you think you may also be processing negative numbers then you also need to know the number of bits that the integer is being stored in.

bits = 8
highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2)) 
lowbit = LOG(x & -x) / LOG(2)

C# users:
highbit = Math.Floor(Math.Log(x & (int)Math.Pow(2, bits) - 1, 2));
lowbit = Math.Log(x & -x, 2);

In both cases, test for a value of 0 first, as highbit and lowbit will both be undefined for 0.

So, how does this work? Warning: math ahead !

First, let’s look at the logarithmic (LOG) operations:

A common trick used in programming is to set a bit to 1 by ORing a value with 2^{(index of the bit)}.

flags = flags | (2 ^ bitnum)

This works because n^y is 1 followed by y 0s in base n.

2⁷ = 010000000 in binary (1 followed by 7 0s)
10⁴ = 10000 in decimal (1 followed by 4 0s)

Since logarithm is the inverse of exponentiation, then the log of n^y in base n is y

LOG(n^y)_n = y
e.g.:
Log(2⁷)₂ = Log(128)₂ = 7
Log(10⁴)₁₀ = Log(10000)₁₀ = 4

So LOG(x)_n is the number of digits (less 1) in x when x is represented in base n. Which is another way of saying the index of the highest digit (starting from 0).

If x is not a power of n then LOG(x)_n still returns the number of digits, but that number is now a non-integer. So we apply FLOOR to get the closest lower integer value. Therefore

FLOOR(LOG(x)_n)

gets us index of the highest digit.

e.g.:
FLOOR(LOG(011001₂)₂) = FLOOR(LOG(25)₂) = FLOOR(4.64385619) = 4 (011001 has 5 significant digits)
FLOOR(LOG(2001)₁₀) = FLOOR(3.301247089) = 3 (2001 has 4 digits)

But why divide by LOG(2)?

Well, unlike C#, most languages do not have a convenient way of calculating LOG(x)_n directly. In most programming languages, LOG(x) is the natural logarithm of x (log of x in base e: LOG(x)_e).

Fortunately, it doesn’t matter what base your system uses for LOG(). It just so happens that

LOG(x)_a / LOG(b)_a = LOG(x)_b

So we can get LOG(x)₂ by dividing LOG(x) by LOG(2) regardless of what base is used by the system. Hence

highbit = FLOOR(LOG(x) / LOG(2))

So wherever you see LOG(x) / LOG(2) think ‘index of the highest bit in x’.

For the lowbit, before getting the LOG() we AND the value with the negative of the value.

(x & -x)

Why?

Because of the way computers represent negative numbers. Almost all systems use the two’s complement of a number to represent the negative of a number. The two’s complement is the number subtracted from 2^b where b is the number bits of storage being used.

e.g.                  2⁸ is 100000000
     28 stored in 8 bits is  00011100
      10000000 - 00011100 =  11100100 = the two's complement of 28 = -28

The two’s complement can also be seen as the binary representation of the number with all of the bits flipped, plus 1.

e.g. 28 stored in 8 bits is  00011100
     flipping the bits gives 11100011
                       + 1 = 11100100 = the two's complement of 28 = -28

This value has an interesting property:

Since it is the original number x with all the bits flipped (e.g. 11100011), ANDing it with x would produce 0. However, since 1 is added, all the consecutive 1 bits from the right (0 bits in the original x) are flipped to 0 and the rightmost 0 (1 in x) is flipped to 1. So, in the two’s complement:

all digits to the left of x‘s lowest bit are flipped: when ANDed with x, they will result in 0s.
all digits to the right of x‘s lowest bit are 0: when ANDed with x, they will result in 0s.
x‘s lowest bit is set to 1: when ANDed with x, it will be set to 1.

e.g. 28 stored in 8 bits is 00011100
                     -28 is 11100100 
                  AND gives 00000100

Taking the LOG(x) / LOG(2) of this result gives us the index of the x‘s lowest bit. And since one, and only one, bit is set to 1 in the result, the result is always a power of 2 and we do not need to use the FLOOR() function. So:

lowbit = LOG(x & -x) / LOG(2)

Handling negative numbers:

The last item to explain is why, when dealing with possible negative numbers, the formula for the highbit changes from

highbit = FLOOR(LOG(x) / LOG(2))

highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2))

This is because we cannot take the LOG() of a negative number: the result is an imaginary number. So we have to engage in a little skullduggery.

Negative numbers (as we saw above) are stored as two’s complements of the corresponding positive number. As a result of this, all positive numbers begin with 0 in the leftmost bit and all negative numbers have 1 there. In other words, the leftmost bit is just a sign bit and not actually part of the number. It should be ignored when determining the highest order bit for either positive or negative values.

To do this we create a mask: 2^{(number of bits)} - 1. This will give us a value equal to 1 in every bit except the leftmost one. When we AND this value with x, we effectively eliminate the leftmost bit.

-17 in 16 bits is 11111111111101111
       2¹⁶ - 1 is 01111111111111111 
        AND gives 01111111111101111 (the original value without the leftmost bit)

+17 in 16 bits is 00000000000010001
       2¹⁶ - 1 is 01111111111111111 
        AND gives 00000000000010001 (the original value without the leftmost bit)

Since the result of this operation always has a 0 in the leftmost bit, it is seen as a positive number by the system and can safely be used in the LOG() operation. Since the remaining bits have the same bit pattern as the original value, our formula will return the correct highbit value for the original number.

We do not need to do this for the lowbit because we AND the number with its negative, which ensures that the leftmost bit is always ANDed with its opposite, thus always resulting in 0 in the leftmost bit, and so, the result is safe to use with LOG().

Finding the highest and lowest set bits in a value.

Related

Leave a comment Cancel reply