Hurray! ! ! Today’s entry is not about SQL!

Well, not directly, anyway. I needed to do this for a problem I was solving in SQL, but the solution is not specific to SQL.

So given an integer `x`

, how can you find the values of the highest (leftmost) and/or lowest (rightmost) bits that are set? Can it done without looping or bit-by-bit comparisons?

YES

If you know you will only be dealing with positive numbers, finding the indices of the high and low bits can be done using:

highbit = FLOOR(LOG(x) / LOG(2)) lowbit = LOG(x & -x) / LOG(2) C# users: highbit = Math.Floor(Math.Log(x, 2)); lowbit = Math.Log(x & -x, 2)

where the `bits`

will be numbered from the rightmost bit starting at 0.

If you think you may also be processing negative numbers then you also need to know the number of bits that the integer is being stored in.

bits = 8 highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2)) lowbit = LOG(x & -x) / LOG(2) C# users: highbit = Math.Floor(Math.Log(x & (int)Math.Pow(2, bits) - 1, 2)); lowbit = Math.Log(x & -x, 2);

In both cases, test for a value of 0 first, as `highbit`

and `lowbit`

will both be undefined for 0.

**So, how does this work?** **Warning: math ahead !**

First, let’s look at the logarithmic (`LOG`

) operations:

A common trick used in programming is to set a bit to 1 by `OR`

ing a value with `2`

.^{(index of the bit)}

flags = flags | (2 ^ bitnum)

This works because `n`

is 1 followed by y 0s in base n.^{y}

2^{7}= 010000000 in binary (1 followed by 7 0s) 10^{4}= 10000 in decimal (1 followed by 4 0s)

Since logarithm is the inverse of exponentiation, then the log of `n`

in base n is y^{y}

LOG(n^{y})_{n}= y e.g.: Log(2^{7})_{2}= Log(128)_{2}= 7 Log(10^{4})_{10}= Log(10000)_{10}= 4

So `LOG(x)`

is the number of digits (less 1) in x when x is represented in base n. Which is another way of saying the index of the highest digit (starting from 0)._{n}

If x is not a power of n then LOG(x)_{n} still returns the number of digits, but that number is now a non-integer. So we apply FLOOR to get the closest lower integer value. Therefore

FLOOR(LOG(x)_{n})

gets us index of the highest digit.

e.g.: FLOOR(LOG(011001_{2})_{2}) = FLOOR(LOG(25)_{2}) = FLOOR(4.64385619) = 4 (011001 has 5 significant digits) FLOOR(LOG(2001)_{10}) = FLOOR(3.301247089) = 3 (2001 has 4 digits)

**But why divide by LOG(2)?**

Well, unlike C#, most languages do not have a convenient way of calculating `LOG(x)`

directly. In most programming languages, _{n}`LOG(x)`

is the natural logarithm of x (log of x in base *e*: `LOG(x)`

)._{e}

Fortunately, it doesn’t matter what base your system uses for `LOG()`

. It just so happens that

LOG(x)_{a}/ LOG(b)_{a}= LOG(x)_{b}

So we can get `LOG(x)`

by dividing _{2} `LOG(x)`

by `LOG(2)`

regardless of what base is used by the system. Hence

highbit = FLOOR(LOG(x) / LOG(2))

So wherever you see `LOG(x) / LOG(2)`

think ‘index of the highest bit in x’.

For the `lowbit`

, before getting the LOG() we `AND`

the value with the negative of the value.

(x & -x)

Why?

Because of the way computers represent negative numbers. Almost all systems use the *two’s complement* of a number to represent the negative of a number. The *two’s complement* is the number subtracted from `2`

where ^{b}`b`

is the number bits of storage being used.

e.g. 2^{8}is 100000000 28 stored in 8 bits is 00011100 10000000 - 00011100 = 11100100 = thetwo's complementof 28 = -28

The *two’s complement* can also be seen as the binary representation of the number with all of the bits flipped, plus 1.

e.g. 28 stored in 8 bits is 00011100 flipping the bits gives 11100011 + 1 = 11100100 = thetwo's complementof 28 = -28

This value has an interesting property:

Since it is the original number `x`

with all the bits flipped (e.g. 11100011), `AND`

ing it with `x`

would produce 0. However, since 1 is added, all the consecutive 1 bits from the right (0 bits in the original `x`

) are flipped to 0 and the rightmost 0 (1 in `x`

) is flipped to 1. So, in the *two’s complement*:

- all digits to the left of
`x`

‘s lowest bit are flipped: when`AND`

`ed with`

`x`

, they will result in 0s. - all digits to the right of
`x`

‘s lowest bit are 0: when`AND`

`ed with x, they will result in 0s.`

`x`

‘s lowest bit is set to 1: when`AND`

`ed with`

`x`

, it will be set to 1.

e.g. 28 stored in 8 bits is 00011100 -28 is 11100100 AND gives 00000100

Taking the `LOG(x) / LOG(2)`

of this result gives us the index of the `x`

‘s lowest bit. And since one, and only one, bit is set to 1 in the result, the result is always a power of 2 and we do not need to use the `FLOOR()`

function. So:

lowbit = LOG(x & -x) / LOG(2)

**Handling negative numbers:**

The last item to explain is why, when dealing with possible negative numbers, the formula for the `highbit`

changes from

highbit = FLOOR(LOG(x) / LOG(2))

to

highbit = FLOOR(LOG(x & (2 ^ bits - 1)) / LOG(2))

This is because we cannot take the `LOG()`

of a negative number: the result is an imaginary number. So we have to engage in a little skullduggery.

Negative numbers (as we saw above) are stored as two’s complements of the corresponding positive number. As a result of this, all positive numbers begin with 0 in the leftmost bit and all negative numbers have 1 there. In other words, the leftmost bit is just a sign bit and not actually part of the number. It should be ignored when determining the highest order bit for either positive or negative values.

To do this we create a mask: `2`

. This will give us a value equal to 1 in every bit except the leftmost one. When we ^{(number of bits)} - 1`AND`

this value with `x`

, we effectively eliminate the leftmost bit.

-17 in 16 bits is 11111111111101111 2^{16 }- 1 is 01111111111111111 AND gives 01111111111101111 (the original value without the leftmost bit) +17 in 16 bits is 00000000000010001 2^{16 }- 1 is 01111111111111111 AND gives 00000000000010001 (the original value without the leftmost bit)

Since the result of this operation always has a 0 in the leftmost bit, it is seen as a positive number by the system and can safely be used in the LOG() operation. Since the remaining bits have the same bit pattern as the original value, our formula will return the correct `highbit`

value for the original number.

We do not need to do this for the `lowbit`

because we `AND`

the number with its negative, which ensures that the leftmost bit is always `AND`

ed with its opposite, thus always resulting in 0 in the leftmost bit, and so, the result is safe to use with `LOG()`

.