C/C++ Memory structs, objects and unions.
C/C++ Memory structs, objects and unions.
C/C++ Memory structs, objects and unions.
[INTR] Introduction [TSOS] The space of structs [TSOO] The space of objects [TSOU] The space of unions [MFOI] Misc. facts of interest
[INTR] For those reading this it is assumed that the person (meaning you) has at one point in their life used a struct within the programming languages C, C++ or any other language that has it and implements it in at least a seemingly similar way. Before really starting though, I ask the reader to take a look at the following code with the assumption that a char has the size of one, an int four and a double eight bytes.
#include <stdio.h>
struct
{
char a;
double b;
int c;
char d;
}A;
int main()
{
printf("%d", sizeof(A));
}
If you know what the output from this will be then you probably know everything in this article already and might as well stop reading; however, at the very least read the following sentence as well. If you thought that the output would be 14, then you were wrong.
The space of structs [TSOS] It's obvious to assume that the output of the code will be 14 as it is two chars, an int and a double, which should mean: 2*1 + 4 + 8 = 14. As mentioned this is not the case and the reason for this is padding, or in a more technical term: data structure/memory alignment. What this basically means for the computer is that for the sake of easier access, the data gets padded with extra bytes. What it means to you is that you get a bunch of wasted space. What happens with the padding is that the struct will be divided into blocks with the size of the the largest type in it and then further divided into minor blocks for each smaller data type with size of that particular data type so that when the computer needs to access it, it will be at a location that is a multiple of the desired type's size. This is probably a bit much for one sentence as it basically sums up everything so I will explain it in a bit more detail. First of all, when you make a struct, the order in which you put the variables counts; this is because C/C++ loves you and has a truly deep respect for your wishes and decisions and will never ever say that you are wrong about anything (compilers, not so much). Secondly, to avoid unnecessary work when accessing data inside a struct, the computer likes the variables to be at an offshoot from the start of the struct that is evenly divisible by their size. Which means that an int can not be sitting around two or three bytes from the beginning, but only at positions at beginning + 0, 4, 8, etc. The fact that other variables might not give a perfect offset for this is what leads to the struct getting padded into chunks of the largest variable's size when needed. In my example, the largest sized type was a double with size 8, which will lead to the memory looking like this:
a—––|b_||_c|d— Size: 24 bytes
As you can see with variables c and d, they are nicely put after each other instead of being in their own eight byte blocks. So naturally if we just put the code after the double before the double instead (as seen below), we will only get a size of 8 + 8 = 16 bytes.
struct
{
char a;
int c;
char d;
double b;
}A;
Wrong. It will still be hogging 24 bytes, the memory will just look different. The reason for this is that, as mentioned before, every variable has to be at an offset from the start that is a multiple of their type's size, not just the largest one. This means that you can group variables together to take up less space, you just have to fit them together correctly. What happens in this new struct is that the int will be placed at its first available position after the variables before it, just like every other variable. Since variable a is hogging position 0, the first position available for c is position 4. Eight bytes are now taken, the offset that the int needed + the int itself, then a char hogs position 8 and since this way not only is position 0, but also 8 taken, the upcoming double is placed at position 16 and a seven byte padding is therefore applied before it. The memory will look like this:
a—|c|d—––|b| Size: still an annoying 24 bytes
By now presumably anyone has figured out how to reduce the size for this particular example. Just group the chars together and we will have reduced its waste of space to a mere two bytes and with that, the total size has been decreased to a mere 16 bytes. One of the alternatives for the new struct:
struct
{
char a, d;
int c;
double b;
}A;
For those who are uncertain, I will say that yes, even here
char a, d;
is the same as
char a;
char d;
So do not worry about that. Our new and truly inspirational memory (only in the example I've given, mind you) thus ends up like this:
ad–|c||b| Size: a lovable 16 bytes
The space of objects [TSOO] With structs covered, there is surprisingly little to write for this; for those who wonder why, I'll explain and cover a little bit of other information as well that is relevant to structs too. The reason for why there is virtually nothing to write here is because it can be summed up as this: see structs. Objects are in essence just structs, for what the variable's memory is concerned at least. The main difference is that objects have their very own functions and destructor, which is basically just a special function. You can also set up different relations and hierarchies amongst classes and whatnot, but let's leave that. The thing to take note of here is mainly that functions do not exist in the object's space in memory, so they do not count to its size. How does this affect structs when they lack functions? For those who do not know or who simply had it slip their mind, structs can have constructors just like classes can. A constructor, just like a destructor, being merely a special function, does therefore not take any place in whatever object or struct type variable you make.
The space of unions [TSOU] Let's start this part with an example of a union; in fact, just for the sake of nostalgia, let's make it similar to the first struct we used.
union
{
char a;
double b;
int c;
char d;
}A;
That looks suspiciously a lot like any old struct; structs and unions must be almost completely identical. Most of you have by now probably realized that making such assumptions for things in this article will probably be a bad idea. So if I would ask what the size of that union is, you would probably not say 24 nor 14 bytes and you would be right in not saying that as the size is in fact 8 bytes. You might be wondering how it could possibly store these variables if it is merely 8 bytes in size when just the double would be 8 bytes large and then there are both two more bytes in char format and four more in int form. Short answer is: it can't. All a union is is in essence a shared space, it will only be as big as the biggest type in it. All the lists of variables do is to provide you with a way to access the information within the union as if it was of that variable type. This means that you could, even though it's very unnecessary, but just for the sake of argument, have as many variables of whatever type that you want and the union will never become larger than the largest type. Furthermore it means that if you modify one variable in it, you will most likely change the value of whatever other variable you have in the union. The reason it is only most likely, but not definitely, is because of overflows due to differing data type sizes. For example:
A.a='y';
This will give all the chars in the union the value y and the int will have a value of 121. If we afterwards do this:
A.c=3449;
The int and anything larger than a char will basically have changed, but the chars will still have the value of y since the last byte of that int is the same as it was before. Something with multiple different values where changing one will change all seems a bit impractical at first and might just look like an unnecessarily complicated way to cast things. However, its purpose is mainly to reduce space usage and comes in handy for optimizations when you might be using several different temporary variables.
Miscellaneous facts of interest [MFOI] Due to how structs and objects handle their memory, you can access most of their variables as if it was an array. Whether you choose the address of a variable inside it and just use an array to go through variables of same type next to each other, or if you just go for the address of the struct variable/object itself which should basically give you the address of the first variable. This also leads to the possibility to both access and modify private variables inside an object which you should have no access to other than through the object's functions. There is one type of variable though that you can neither access nor modify through this method in a struct or object. These are const variables. This might seem obvious, but it actually isn't because even if const is made to not be modified, its value is still accessible. In fact, since const is actually for the programmer to not modify things he shouldn't by letting the compiler catch any attempt at trying to change the variable, const values are in fact modifiable on less thorough compilers by indirectly accessing them. One method would be through the use of a non-const pointer that happens to be pointing at a const variable, keep in mind though that if you try it and it remains unmodified, it just means that you have a thorough compiler. So what is the reason that const can not be accessed nor modified with the previously mentioned method of accessing variables through an array? The reason is that const values in structs and objects need to be static, this in turn means that just like functions, they are stored at a different place and do not affect the size of the struct variable/object itself. The information given about the const type here is only for C++ though as C does not have a const value. What it does have, however, is #define which functions more or less as a const would for single values. There is a major difference between const and #define though which is that const behaves like a variable would, as you can see by its syntax. #define, on the other hand, is a preprocessor directive (as you can see by the # in front of it) which means that at compile time the preprocessor will change all of the given instances to their definitions. In other words: before your code is compiled, there will be a find and replace done for the given name - value combination that you defined. This in turn leads to a very different behaviour than a variable and means that for #define, none of the facts for const applies. This doesn't mean that it will have the polar opposite behaviour of the facts I've given about const, that would be very odd as const is replacement for #define. What I simply mean is that you should not assume that the underlying working is the same. Enough information has been provided for those who understand it to know what the differences are or rather why there is a difference, but an indepth look at the underlying function of #define and const are out of the scope of this article and should not be needed anyway.
GTADarkDude 14 years ago
Great article! Very interesting information. Didn't know this, even though I usually use lots of structs and objects in my C++ codes. Well-written too.
ghost 14 years ago
Edited the article. When I wrote it I wrote it in an environment where all characters are of same width, forgot that it will not be the case everywhere. So changed the register at the beginning as well as the depiction of memory to ensure it to be correct, clear and understandable. Removed an unnecessary space in one of the struct codes to make it look nicer, as well as added a part about #define at the end to give an explanation for C's version of const as well. Thank you all for your nice comments and votes. And a special thanks goes out to 454447415244 for his vote and his invaluable feedback in the comments about how I could improve my article, it really helped a lot :)