Thursday, 14 November 2013

Bit Hacking Part-1

Here I'm gonna post some problems and their different solution which will be using bit magics.!!!

Problem 1:

Swap 2 integers x and y.


temp=x;
x=y;

y=temp;

what could be simpler without temporary?

x=x^y;
y=x^y;

x=x^y;

Wow!!! what's going on there?

let see the example:

x=2;
y=3;
x=x^y;   //x= (000010 ^ 000011)=(000001)

y=x^y;   //y= (000001 ^ 000011)=(000010)
x=x^y;   //x= (000001 ^ 000010)=(000011)


Here the swapping is done!
Why it works?

great property of XOR is its own inverse i.e.
(x^y)^y=x

Does it perform well? Let's evaluate performance...
Performance:

poor at exploiting Instruction Level Parallelism(ILP)
Instruction Level Parallelism is a fact that processor can issue more than 1 instructions at a given step and here the sequence of operation each step has to wait until the previous one is computed before it can execute the next. So here is no ILP. So there is not a particularly high performance but there are other places where we use this kind of property.
                               

!!!_NEAT_BIT_HACK_!!! 



Problem 2:


Find the minimum of 2 intgers.

Naive approach is as follows:

if(x<y)
    minimum=x;
else
    minimum=y;

OR

If you know little bit C language then

minimum=(x<y)?x:y;

What's wrong with this??

NOTHING!!! If you are happy to use SLOW CODE !!!


But compiler will optimize it to deal with it.

Performance:


A unpredictable branch empties the processor pipeline.
The compiler might be smart enough to avoid the unpredictable branch but may be not.

The processor has within it a branch prediction unit , whenever it comes to a branch it guesses which way the branch is got to go and proceeds spectacularly to execute along that path. If it turns out to be wrong then it changes it's path and proceeds another way. To do that it empties the processor pipeline and that takes on the machine were being using 16 cycles. So you don't wanna have branches that are unpredictable and in particular what do you wanna look for in conditional instruction is whether they are predictable. So something that almost all the time branching the same way that is predictable branch. The hardware is very smart about figuring out to predict that and will make the right prediction and you won't pay any performance penalty.
But if you have something you don't know which way it goes so code like the above the compiler architecture isn't gonna know if you just throw out it various pairs of x and y it has no way. The 50-50 guess whether guess is right. Half the time its gonna predict the wrong thing.
So compiler might be smart enough but may be not but you can be sure. Here is a way of being sure.

Finding minimum of x and y without a branch.

minimum=y^((x^y)& -(x<y));

Why it works?
Well C represents Boolean TRUE and FALSE with integers 1 and 0 respectively.
So if you execute the the operator < to x and y it is supposed to be doing a condition. Based on x less than y it returns 0 or 1 depending upon whether it is successful. So when you take the minus of the result this either 0 or -1 and what is -1 in 2's (two's compliment) arithmetic? It is the word filled all with 1. So either got to get all filled with 0 or got a word all filled with 1.
So if x<y all filled with 1 so what's next doing?
x^y ANDing with all 1's that's actually No Operation to AND with all 1's. To mask something if I'm doing AND anything with 1, I get whatever the thing is, this evaluating (x^y).
Then what I take y here and XOR that get back x that inverse property of XOR so if x is less than y then minimum gets the value of x.
This seems to be awful but is this better.
YES it is because all of this instruction goes within the processing unit rather than anything having to do with memory..
It gets the values of x and y and begin with and its all instruction within the processing unit those typically takes 1 cycle and if there is any parallelism in it, it will be able to execute more than 1 op/cycle.
 
Lets understand the parallel instruction going on here which  (x^y)& -(x<y)  1 cycle of computes (x<y) and 2nd cycle computes the negation of the Boolean result obtained by (x<y) 3rd cycle computes the XOR of x and y i.e. (x^y) and AND operation goes in parallel of the two. 4th cycle computes the XOR with y.


To find maximum:


maximum=x ^ ((x ^ y) & -(x < y));




Problem 3
Determine whether the given integer is power of 2 or not?

Whats naive approach to solve this problem:
One can think of increasing the power of 2 and compare if it is equal to the given number and iterate till the power of 2 gets greater or equal.

for(int i=1;i;i++)
{
   k=pow(2,i);
   if(k==n)
     return TRUE;
   if(k>n)
     return FALSE;
}


Or one can also think of dividing the given number by 2 and check if it reduces to 1(Pretty simple).

Is this OK to use? NO!!! Programs are meant to be correct and here it is but it is not as fast as it could be.   It's too slow to use !


                             
 ? Thinking About BIT HACK ? 
                             

Well using bit hacking we can determine that whether it is power of 2 or not very efficiently as follows:

return (n && !(n & (n-1)));

wow!!! A Single line statement here solves the problem.
How?
If n is power of 2 then in binary system the most significant bit would be set and rest bits will be 0. And for (n-1) in binary arithmetic all bits will be set which are at lower order in the binary of 2 except single bit and AND operation is performed resulting in all bits set to 0 and after that negation will make (n & (n-1)), 1 . Now see both n and !(n & (n-1)) are non-zero which finally returns the result TRUE
If n is not power of 2 then !(n & (n-1)) will result into 0 and final statement would be a (non-zero && zero) returning a FALSE.(can be analysed pretty simple).
Let see example(16 bit integer).
n is power of 2 i.e. suppose n=8

n=8;      // 8=0000000000001000
n-1= 7; // 7=0000000000000111
n&(n-1); //0000000000001000 & 0000000000000111= 0000000000000000
now n is non-zero and (n&(n-1)) is zero (here !(n&(n-1) =1)
final result TRUE.

if n is not a power of 2 e.g. n=12
n=12;  // 12=0000000000001100
n-1=11; //11=0000000000001011
n&(n-1); // 0000000000001100 & 0000000000001011= 0000000000001000

now see n is non-zero and (n&(n-1)) is also non-zero (here !(n&(n-1))=0)
so final result is FALSE.


Problem 4

METHOD 1: Count the set bits in a 32 bit integer.

Here is another problem frequently asked to count the total number of set bits in an integer.
Do not say to change the number to binary arithmetic and then count the 1's in it, that's really weird. 
What is Brian Karnighan's way to compute it?


unsigned int n; //count the number of set bits in n
unsigned int cnt; // cnt accumulates the total bits in n 
for(cnt=0;n;cnt++)
{
  n&=n-1; //clear the least significant bit set
}


whats go on here? take an example and do the operation to learn it. And if you want explanation I can explain in comment. Hope you can do it.

There are many other ways to count the number of set bits very efficiently. I will post as soon as possible.



!!! KEEP CALM & CODE ON !!!
!!! HAPPY CODING !!!