What Exactly does ++a + ++a do in C?
This question was (and still is) one of the most favourite question of our professors -
int a = 5;
int b = ++a + ++a;
/* What will be the value of a? */
And its variants -
int a = 5;
printf("%d %d %d", a, a++, ++a);
And so on..
And their reasoning (which they copied from Let Us C, which is the worst book on C if you want to learn it in 2020) was -
a
is first 5
. Then ++a
increments a
and then uses the value of a
. So the left hand side of +
is 6
and a
is also 6
. Next, ++a
first increments a
and then uses its value. So the right hand side of +
is 7
and a
is also 7
. So finally, b = 13
and a=7
And for the second one, they said "In a function call, arguments are processed right to left". So first ++a
is executed, giving 6
and a
is now 6
. Then a++
is executed, giving 6
and a
becomes 7
and finally a
is executed. So it will print 7 6 6
Guess what? They are absolutely, horribly, terribly WRONG! Both of this answers are absolutely wrong.
But you say, the answers match when I run them? It's because you're probably using Turbo C. Please stop using it. For the sake of humanity and for your own sanity
Alright enough rant! Let's see what's going on in the code.
First let me compile the programs through different compilers and see the result.
#include<stdio.h>
int main() {
int a = 5;
int b = ++a + ++a;
printf("%d %d", a, b);
return 0;
}
This one outputs
- With X86-64 Clang 10.0.0 -
7 13
- With X86-64 GCC 10.1 -
7 14
And for the second one
#include<stdio.h>
int main() {
int a = 5;
printf("%d %d %d", a, a++, ++a);
return 0;
}
- With X86-64 Clang 10.0.0 -
5 5 7
- With X86-64 GCC 10.1 -
7 6 7
Which of them is correct? Both of them!
Chances are if you use different compilers, or even different platforms, you'll see different results.
Why this kind of weird result? Simply because modifying the same variable more than once in one statement without any sequence is undefined behaviour in C. In fact, Clang produces a warning -
unsequenced modification and access to 'a'
What exactly does undefined behavior mean in C? Simply said, undefined behavior is when there are no restrictions on the behavior of the program. The compiler can do anything it wants. It can create a code such that you get 7 14
as answer, or 7 13
or probably launch a missile, because you broke the law.
When you modify the same variable more than once in a single statement and also want to read its value, the naive left-to-right order we assumed might not be true. The standard doesn't specify any order and is left to the compiler. The compiler is free to choose any order it sees fit. It might happen that all the modifications are done in a batch and the value is read.
This will be clear once we look at the assembly code for the first program. Don't worry if you can't read assembly. It's pretty intuitive. We'll only look at the int b = ++a + ++a;
line. Here's what Clang generates -
mov eax, dword ptr [rbp - 8]
add eax, 1
mov dword ptr [rbp - 8], eax
mov ecx, dword ptr [rbp - 8]
add ecx, 1
mov dword ptr [rbp - 8], ecx
add eax, ecx
mov dword ptr [rbp - 12], eax
[rbp - 8]
contains the value of a(=5)
.
The first 3 lines stores the value of [rbp - 8]
to eax
. Then adds 1
to eax
and stores the incremented value back to [rbp - 8]
. This is the ++a
.
So eax
now contains 6
and [rbp - 8]
also contains 6
.
The next 3 lines do the same but with ecx
in place of eax
. This is another ++a
. Now ecx
contains 7
and [rbp - 8]
also contains 7
.
Finally eax
and ecx
are added to give 13
which is stored in [rbp - 12]
which is the variable b
.
This is actually what we had speculated.
Now here's what GCC has to say -
add DWORD PTR [rbp-4], 1
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
add eax, eax
mov DWORD PTR [rbp-8], eax
Here's the bummer. Here [rbp - 4]
contains the value of a
but notice how the first two lines right ahead increment a
by 1
each? and then the value 7
is moved to eax
and it's added with itself to give 14
.
The compiler noticed that the actual side effect of the statement was that a
would get incremented twice, so it did that right at the start!
Similarly, in the case of 2nd program, there is no guarantee that the arguments will be evaluated right-to-left. It might be evaluated left-to-right, right-to-left, or maybe all the increments are performed in a batch beforehand or afterwards. Who know!
So, what other statements are undefined?
i = i++;
z = i++ * ++i;
a[i] = a[i++];
And so on.
The exact reason the order of evaluation is left undefined is because the compiler can perform optimization. Consider this expression -
int c = (a * b) + (c * d);
The compiler might generate intermediate code like -
t1 = a * b
t2 = c * d;
c = t1 + t2
In this case, observe that t1
and t2
do not depend on each other. So, the compiler can optimize using instruction level parallelism. In simpler words, t1
and t2
might be calculated simultaneously instead of two different instructions.
What about this?
int x = a++ && a++;
Now this is defined!
The operator &&
introduces a sequence point. Sequence point basically means when a compiler encounters it, it is guaranteed that it will finish all the computations before moving on. So when &&
is encountered, it will compute a++
and then the next a++
will be computed.
The reason this is defined for &&
is again optimization, particularly short-circuit. We know that &&
is 1
when both the operands are 1
so if the first operand is 0
, it doesn't evaluate the 2nd one. This is why it is guaranteed to be effectively left-to-right evaluation.
The rule of thumb, don't write statement that modifies the same variable more than once. And don't believe anyone who reads Let Us C.