Monday, July 03, 2006

"Long" Live The King

I was recently involved in a framework that had to provide consistent behavior over both 32 and 64 bit systems ; which was when I came across the wierd standards followed by some compilers, when it comes to data-type sizes, more specifically int, long and a pointer (abbreviated as ILP). Now, both gcc (Linux) and the VC++ compiler(Windows) use ILP32 when compiling for 32-bit systems. This means, that an int, a long and a pointer are all 32-bits in length. To support a 64-bit data type, the type long long has been added, in later compilers. But, on a 64-bit machine, things start to get a little hazy. There exists 3(yes, three!!!) standard conventions :
ILP64 (Int, Long and Pointer are 64-bits)
LP64 (An int is still 32-bits, but a long and pointer are 64-bits)
LLP64 (Both an int and a long are 32-bits, and only pointers are 64-bits).
Faced with these choices, one would think the ILP64 to be a fairly obvious choice. I mean, what's the use of a 64-bit number-crunching machine if one can't do native 64-bit arithmetic on it? But, Linux chose LP64, while Windows embraced the LLP64 standard.
One of the reasons against ILP64 from both these systems was that an int was(and still is) C's most popular data-type and extending it to 64-bits would likely waste more space (4 bytes extra), than it does now. Face it, the average programmer tends to use an int to return even a single bit such as 1 or 0. So, a programmer should be forced(by the compiler) to use 64-bits only when it's absolutely necessary. But, the other argument against ILP64 is an interesting one. Let's consider the types int and long on a 32-bit system. Both are 32-bits in length and are indistinguishable from each other. But why? Because, in the age of 16-bit systems, an int was 16-bits in length, but a long was 32-bits. As we moved onto 32-bit systems, an int became 32-bits, but a long remained the same. This is where we lost it. Should'nt long be 64-bits in length? I don't know why this was done(insufficent hardware support could've been the reason), but it was this mistake that made both int and long equal in size, which made programmers careless and start intermixing them. So, the best thing to fix it, is by making int remain at 32-bits, while extending the long to 64, and this was the approach Linux took. But, Windows faced with a bigger issue, that of "not breaking" existing code, which assumed an int and a long to be the same size. So, they went the way of LLP64, with an int and a long remaining at 32 bits, and the pointer extended to 64-bits. But, there's a subtle loophole here. The size of a long long variable in a 64-bit system, is not, as you would expect it to be, 128-bits, but it remains at 64. So, what happens when moving from a 64-bit to a 128-bit system? The same problems, that we struggled to avoid resurface. The legacy lives on.


Rajesh said...

Have you read joel's article on the fight between the legacy camp and the clean-slate camp in M$; it seems the legacy camp is winning..

Good for us, or bad..

ash said...

nope, I haven't read it. will do that.

Prashanth Guha said...

Great information dude.. I may have read it late.. but no matter it was useful.. :)