Node:Struct size, Next:Struct packing, Previous:0xfe+0x20, Up:Miscellany

22.11 What should sizeof (struct xyzzy) return?

Q: When I call sizeof on a struct, I sometimes get values which are larger than the sum of the sizes of the struct members, whereas in Borland C++ I always get the correct result. Is it a bug in GCC?

Q: I have a program that reads struct contents from a binary file. It works OK when compiled with BC, but reads garbage when compiled with DJGPP. This must be a bug in DJGPP, right?

A: No, it's not a compiler bug. GCC generates 32-bit code, and in that mode, there is a significant penalty (in terms of run-time performance) for unaligned accesses, like accessing a 16-bit short which isn't aligned on a 16-bit word boundary, or accessing a 32-bit int which isn't aligned on a 32-bit dword boundary. To produce faster code, GCC pads struct members so that each one can be accessed without delays; this sometimes produces struct size which is larger than the sum of the sizes of its members. If you need to minimize this padding (e.g., if your program uses large arrays of such structs, where padding will waste a lot of memory), lay out your structures so that the longer members are before the shorter ones. For example, let's say that you have a struct defined thus:

  struct my_struct {
    char name[7];
    unsigned long offset;
    double quality;
  };

To make such a struct use the least number of bytes, rearrange the members, like this⁴⁰:

  struct my_struct {
    double quality;
    unsigned long offset;
    char name[7];
  };

If the layout of the structure cannot be changed (e.g., when it must match some external specification, like a block of data returned by a system call), you can use the __attribute__((packed)) extension of GCC (see GNU C/C++ Manual.) to prevent GCC from padding the structure members; this will make accesses to some of the members significantly slower.

Beginning with version 2.7.0, GCC has a command-line option -fpack-struct which causes GCC to pack all members of all structs together without any holes, just as if you used __attribute__((packed)) on every struct declaration in the source file you compile with that switch. If you use this switch, be sure that source files which you compile with it don't use any of the structures defined by library functions, or you will get some members garbled (because the library functions weren't compiled with that switch). Also, GCC 2.95.1 and 2.95.2 had bugs in their support of -fpack-struct (the bug is corrected in v2.96 and later).

Alternatively, you could declare a particular structure to be packed, like so:

  struct my_struct {
    char name[7];
    unsigned long offset;
    double quality;
  } __attribute__ ((packed));

However, note that the latter will only work when you compile it as a C source; C++ doesn't allow such syntax, and you will have to fall back to declaring each struct member with the packed attribute. Therefore, it's best to only use declarations such as above if you are certain it won't be ever compiled as a C++ source.

The padding of struct members should be considered when you read or write struct contents from or to a disk file. In general, this should only be done if the file is read and written by the same program, because the exact layout of the struct members depends on some subtle aspects of code generation and the compiler switches used, and these may differ between programs, even if they were compiled by the same compiler on the same system. If you do need this method, be aware of the struct member padding and don't assume that the number of the file bytes that the structure uses is equal to the sum of the members' sizes, even if you instructed the compiler to pack structs: GCC still can add some padding after the last member. So always use sizeof struct foo to read and write a structure.

Another problem with porting programs that read structs from binary files is that the size of some data types might be different under different compilers. Specifically, an int is 16-bit wide in most DOS-based compilers, but in DJGPP it's 32-bit wide.

You should never read whole structures if they were written by other programs. Instead, read the struct members one by one, and make sure the member declarations are consistent with their definitions in the program that wrote the struct. For example, if a struct member was declared int in a 16-bit program, you need to declare it short in a DJGPP program.

The best, most robust and portable way to read and write structs is through a char buffer, which your code then uses to move the contents into or out of the struct members, one by one. This way, you always know what you are doing and your program will not break down if the padding rules change one day, or if you port it to another OS/compiler. The ANSI-standard offsetof macro comes in handy in many such cases. If you need to change the byte order in struct members that occupy more than a single byte, use special library functions such as ntohl and htons.