Playing With Bits in C
Let's say you have been tasked with rewriting the file API of C and so far you are nailing it! You've created a struct
named file
that contains all the information of a file like its path, name etc.
Now you want to implement permissions. If you have used UNIX, you're probably familiar with the permission system. For the uninitiated, let's recap.
Each file has 9 (actually not strictly, but let's focus on 9) permissions and they're divided into three categories - owner, group and world. Each category has three values - r
, w
and x
.
r
stands for read
, w
stands for write
and x
stands for execute
.
Each file has an owner user and owner group. The first set of permissions dictate what the owner is permitted to do. The next three dictate what the user of the owner group can do and the last is for everyone else.
Let's take an example. I have a file named bot.sh
. Let me view its permissions. I run the command -
ls -l bot.sh
And here's the output -
-rwxr-xr-x 1 aniket wheel 381 May 26 12:38 bot.sh*
This tells me the user aniket
is the owner and wheel
is the owner group.
The permission string is the part -rwxr-xr-x
. Let's break it down.
The first -
means the file is a file, and not a directory. For a directory this would be d
.
The next set rwx
means the user aniket
can read, write and execute this file.
The next set r-x
means other users in the group wheel
can read and execute but not write.
Finally the last r-x
means anyone else can also read and execute but not write.
Now you want to implement this in your file
. Let's say you come up with this -
struct file {
/* other stuff goes here */
int owner_can_read;
int owner_can_write;
int owner_can_execute;
int group_can_read;
int group_can_write;
int group_can_execute;
int world_can_read;
int world_can_write;
int world_can_execute;
};
You have 9 different int
for the 9 permissions. You quickly realize some major drawbacks.
Each int
is going to be just 0
or 1
, which is just one bit, but you're wasting, if we assume int
on your system is 4 bytes = 32 bits, a whooping 31
bits for each permission.
Then, let's say you want to create a new file. Let's assume its permissions are rwxrwxr--
. So you have to do something like -
struct file f;
f.owner_can_read = 1;
f.owner_can_write = 1;
f.owner_can_execute = 1;
f.group_can_read = 1;
f.group_can_write = 1;
f.group_can_execute = 1;
f.world_can_read = 1;
f.world_can_write = 0;
f.world_can_execute = 0;
Not exactly pleasant to write. You can save yourself some strokes by using aggregate initialization -
struct file f = {.owner_can_read = 1, .owner_can_write = 1, ... };
It's still not funny at all.
So, how do you overcome this problem?
You come up with this genius idea. Since each permission is just 0
or 1
, you can concatenate them one after another to create just one int
. For example, you convert rwxrwxrwx
to 111111111
, rwxrw-rw-
to 111110110
and so on. Now you don't have to deal with only one int
instead of 9.
But how do you extract the bits from this large int
so that you know which permission does the file have?
Enter bit masking.
The bitwise operators
C has 6 bitwise operators -
&
(bitwise AND)|
(bitwise OR)^
(bitwise XOR)~
(bitwise complement)>>
(bitwise right shift)<<
(bitwise left shift)
As the name suggests, these operators work on each bit. We will check the first three one by one as these are the ones needed for this post.
Bitwise AND
Bitwise AND performs an AND operation on each bit of its operands. In the result, a bit is 1
if the corresponding bits of both the operands are 1
and it's 0
if the corresponding bits of any one operand is 0
.
15 & 40 = 8
Here first each operand is converted to binary -
15 = 001111
40 = 101000
------------
001000 = 8
Then bitwise AND is performed. The first bit is 0
in one operand so the first bit is 0
in the result. The next bit is 0
in both operands so it is 0
in the result. The next bit is 1
in both the operands so it is 1
in the result.
Finally the result is converted back to decimal and we get 8
.
Bitwise OR
Bitwise OR performs an OR operation on each bit of its operands. In the result, a bit is 1
if any one of the corresponding bits in the operands are 1
and it's 0
if the corresponding bits of both the operands is 0
.
15 | 40 = 47
Here also both the operands are converted to binary -
15 = 001111
40 = 101000
------------
101111 = 47
But this time an OR is performed on each bit.
Bitwise XOR
Bitwise XOR similary performs XOR on each bit. In case you have forgotten the truth table for XOR -
0 ^ 0 = 0
1 ^ 0 = 1
0 ^ 1 = 1
1 ^ 1 = 0
And so you can figure this one out -
15 ^ 40 = 39
15 = 001111
40 = 101000
------------
100111 = 39
Bitwise Complement
This one is easy. It changes 1
to 0
and 0
to 1
in the binary representation.
#include <stdio.h>
int main()
{
int i = 15;
unsigned int j = 15;
printf("~i = %d, ~j = %u", ~i, ~j);
return 0;
}
prints
~i = -16, ~j = 4294967280
You probably know by now that it changes 15
to binary and changes 0
to 1
and 1
to 0
. If you can explain why complement of i
is -16
and why complement of j
is 4294967280
, congrats! You have mastered integer representation!
Bitwise Right and Left Shift
These operators takes the binary representation of a number and shift all the bits rightwards or leftwards and fills the empty bits at the left or right with zeroes.
For a 4 bit number it'd look something like this -
1100 >> 1 = 0110
1100 >> 2 = 0011
1100 >> 3 = 0001
0010 << 1 = 0100
0010 << 3 = 1000
Bit masking
So how does this help with our issue? Remember that our issue is to perform four operations with the large int
we created -
- Set a bit
- Clear a bit
- Extract a bit
- Toggle a bit
Now watch what happens if we perform |
on 1001
and 0100
-
1001
0100
----
1101
See, only the 2nd bit got changed to 1
keeping others intact! This is exactly what we wanted for setting a bit.
So, if we wanted to set the n-th bit of the number, we would take a number whose n-th bit is 1
and rest 0
and |
it with out number. For example, lets say a file permission is rwxrw-rw-
which is 111110110
and we want to make it executable for group, which means we want to set the 6th bit.
So we do 111110110 | 000001000
and voila! We have 111111110
.
Now let's try AND'ing 1100
with 0100
1100
0100
----
0100
And AND'ing 1010
with 0100
1010
0100
----
0000
See, in the second number, the second bit is not set so the result is 0
, but in the first number, the 2nd bit is set so the result is non zero. This is how we can extract a bit.
So, if we want to extract the n-th bit of a number, we would take a number whose n-th bit is 1
and rest 0
and &
it with out number. If the result is 0
, we know the bit is not set, and if it's non zero the bit is set.
For example, let's say we have a permission 111101011
and we want to know if the group can read (that's the 4th bit), so we take 000100000
and do 111101011 & 000100000 = 100000
which is non zero, so we know the permission is set.
Now let's say we want to clear the 6th bit from 111111110
. So, we would do 111111110 & 111110111
111111110
111110111
---------
111110110
See, how AND'ing with 111110111
(which is just complement of 000001000
) clears the 6th bit and keeps others unchanged?
And finally, bit
So, so far this is what we have -
0000 | 0100 = 0100 /* set 2nd bit */
1100 & 0100 = 0100 /* extract 2nd bit */
1111 & ~0100 = 1011 /* Clear 2nd bit */
1011 ^ 0100 = 1111 /* Toggle 2nd bit */
1100 ^ 0100 = 1000 /* Toggle 2nd bit */
So, you see, just by creating one "flag" for one bit, we can set, clear, extract and toggle that bit. So, for example, if we want to handle the 1st bit, that is the 9th bit from the left, we can create a flag with this -
int OWNER_CAN_READ = 1 << 9;
And then we can do manipulations with our permission int -
permissions |= OWNER_CAN_READ; /* Set */
permissions & OWNER_CAN_READ; /* Extract */
permissions & ~OWNER_CAN_READ; /* Clear */
permissions ^= OWNER_CAN_READ; /* Toggle */
Isn't that amazing? So, we can clean up our struct
int OWNER_CAN_READ = 1 << 9;
int OWNER_CAN_WRITE = 1 << 8;
int OWNER_CAN_EXECUTE = 1 << 7;
int GROUP_CAN_READ = 1 << 6;
int GROUP_CAN_WRITE = 1 << 5;
int GROUP_CAN_EXECUTE = 1 << 4;
int WORLD_CAN_READ = 1 << 3;
int WORLD_CAN_READ = 1 << 2;
int WORLD_CAN_READ = 1;
int NO_PERMISSION = 0;
struct file {
/* Other stuff */
int permissions;
}
Now handling stuff is easy too. For example, if you have to create a file with the permission rwx---rwx
, you can do -
struct file f;
f.permissions = NO_PERMISSION
| OWNER_CAN_READ
| OWNER_CAN_WRITE
| OWNER_CAN_EXECUTE
| WORLD_CAN_READ
| WORLD_CAN_WRITE
| WORLD_CAN_EXECUTE;
And then you can do stuff like -
f.permissions ^= OWNER_CAN_READ; /* Toggle owner's read */
f.permissions &= ~OWNER_CAN_READ; /* Revoke owner's read */
if(f.permissions & OWNER_CAN_READ) {
/* Check if owner can read */
}
f.permissions &= NO_PERMISSION; /* Revoke all permissions */
The code is now clear and concise, and DRY.