Node:Struct size,
Next:Struct packing,
Previous:0xfe+0x20,
Up:Miscellany
Q: When I call sizeof
on a struct, I sometimes get values
which are larger than the sum of the sizes of the struct members, whereas
in Borland C++ I always get the correct result. Is it a bug in GCC?
Q: I have a program that reads struct contents from a binary file.
It works OK when compiled with BC, but reads garbage when compiled with
DJGPP. This must be a bug in DJGPP, right?
A: No, it's not a compiler bug. GCC generates 32-bit code, and in that mode, there is a significant penalty (in terms of run-time performance) for unaligned accesses, like accessing a 16-bit short which isn't aligned on a 16-bit word boundary, or accessing a 32-bit int which isn't aligned on a 32-bit dword boundary. To produce faster code, GCC pads struct members so that each one can be accessed without delays; this sometimes produces struct size which is larger than the sum of the sizes of its members. If you need to minimize this padding (e.g., if your program uses large arrays of such structs, where padding will waste a lot of memory), lay out your structures so that the longer members are before the shorter ones. For example, let's say that you have a struct defined thus:
struct my_struct { char name[7]; unsigned long offset; double quality; };
To make such a struct use the least number of bytes, rearrange the members, like this40:
struct my_struct { double quality; unsigned long offset; char name[7]; };
If the layout of the structure cannot be changed (e.g., when it must
match some external specification, like a block of data returned by a
system call), you can use the __attribute__((packed))
extension
of GCC (see GNU C/C++ Manual.) to prevent GCC from padding
the structure members; this will make accesses to some of the members
significantly slower.
Beginning with version 2.7.0, GCC has a command-line option
-fpack-struct
which causes GCC to pack all members of all structs
together without any holes, just as if you used
__attribute__((packed))
on every struct declaration in the
source file you compile with that switch. If you use this switch, be
sure that source files which you compile with it don't use any
of the structures defined by library functions, or you will get some
members garbled (because the library functions weren't compiled with
that switch). Also, GCC 2.95.1 and 2.95.2 had bugs in their support of
-fpack-struct
(the bug is corrected in v2.96 and later).
Alternatively, you could declare a particular structure to be packed, like so:
struct my_struct { char name[7]; unsigned long offset; double quality; } __attribute__ ((packed));
However, note that the latter will only work when you compile it as a C source; C++ doesn't allow such syntax, and you will have to fall back to declaring each struct member with the packed attribute. Therefore, it's best to only use declarations such as above if you are certain it won't be ever compiled as a C++ source.
The padding of struct members should be considered when you read or
write struct contents from or to a disk file. In general, this should
only be done if the file is read and written by the same program,
because the exact layout of the struct members depends on some subtle
aspects of code generation and the compiler switches used, and these may
differ between programs, even if they were compiled by the same compiler
on the same system. If you do need this method, be aware of the struct
member padding and don't assume that the number of the file bytes that
the structure uses is equal to the sum of the members' sizes, even if
you instructed the compiler to pack structs: GCC still can add some
padding after the last member. So always use sizeof struct foo
to read and write a structure.
Another problem with porting programs that read structs from binary
files is that the size of some data types might be different under
different compilers. Specifically, an int
is 16-bit wide in most
DOS-based compilers, but in DJGPP it's 32-bit wide.
You should never read whole structures if they were written by
other programs. Instead, read the struct members one by one, and make
sure the member declarations are consistent with their definitions in
the program that wrote the struct. For example, if a struct member was
declared int
in a 16-bit program, you need to declare it
short
in a DJGPP program.
The best, most robust and portable way to read and write structs is
through a char
buffer, which your code then uses to move the
contents into or out of the struct members, one by one. This way, you
always know what you are doing and your program will not break down if
the padding rules change one day, or if you port it to another
OS/compiler. The ANSI-standard offsetof
macro comes in handy in
many such cases. If you need to change the byte order in struct members
that occupy more than a single byte, use special library functions such
as ntohl
and htons
.