Dave,
Just a postscript: this is not an academic notion to which I allude,
but a common error in just the kind of code I am paid to fix every day.
The circumstances tend to be communications code, where the programmer
has created a generic n-byte buffer:
unsigned char commbuf[SOME_SIZE];
and a few message structures:
#pragma packed(4)
typedef struct {
int msgtype;
int source;
int othercrap;
} HEADER;
typedef stuct {
HEADER hdr;
int someparm;
char someotherparm;
int yetanotherparm;
} MESSAGE_0;
typedef struct {
HEADER hdr;
char athing;
char another;
int athird;
} MESSAGE_1;
etc...
and then later in their code they accumulate a certain number of bytes
in that buffer (via RS-232, TCP, etc). Now, let's say this is a
simple case where the first byte is a message type code. So, the
programmer does something like (forget the poor form here, the point
is the alignment issue):
switch (commbuf[0)
{
case 0:
MESSAGE_0 *p = (MESSAGE_0*)commbuf;
p->someparm = 8;
...
I can tell you that this code fails in just the way you describe on
almost every compiler. The packing #pragma causes the compiler to be
complacent about accessing the ints within the MESSAGE_x structures.
It PRESUMES that any time you statically/automatically allocate an
instance of the structure, say:
MESSAGE_0 aCopyOfMessage0;
it will simply force an alignment equal to the internal alignment, so
that it can access the internal members most efficiently.
BUT, if you end-run the compiler by creating an instance of this
structure out of thin air by casting a structure pointer to a "char"
buffer, it has NO IDEA if that char buffer is aligned on an odd
address, a word address, or a dword address. Just because its a
"char" buffer, does not mean it MUST be on an odd address. It just
means it could be an ANY address. And it is the *linker*, not the
compiler, that would know this, since the compiler had no reason to
give specific direction to the linker about its placement.
In situations like the above, the common fix is to force alignment of
the "char commbuf[]" buffer -- depending on the compiler, something like:
#pragma alignment=4
char commbuf[SOME_SIZE];
Then, the combination of the overall alignment and the internal
structure alignment will ensure that all MESSAGE_x types cast to the
contents of the buffer will be accessible using the
(aligned-presuming) code the compiler generated for accesses within
structures of those types.
You are not alone, of course, in wishing that compilers spent their
time analyzing your code to remove all potential error conditions, and
many do a darn good job of it as far as it goes. But, the "cast"
operator in particular is a kind of "get out of jail free" card for
the compiler. It says: "Don't bother me boy; I know what I'm doing!"
Even when you don't mean it :-)
Cordeg
--- In lpc2000@yahoogroups.com, "dsidlauskas1" <dsidlauskas@...> wrote:
>
> Cordeg,
>
> Thanks for taking the time to comment. You've presented a though case,
> but I'm not going to let my compiler off so easily :-).
>
> In your case I think the compiler should throw an error, although it
> does know that buf is byte aligned and...
>
> In my case I think the compiler should compile working code.
>
> I won't repeat all of the reasons, which are spread through the thread.
>
> Thanks again.
>
> Dave
>
> --- In lpc2000@yahoogroups.com, "garycordelli" <gary@> wrote:
> >
> > Dave:
> >
> > While what you said may sound reasonable to humans, it is demonstrably
> > not so certain in software. :-)
> >
> > To be specific, what you have in your original example is *not* a
> > simple structure. Your new example "sets up" the compiler to *know*
> > precisely what is going on, so -- if it is inclined to generate some
> > nice byte-swapping/byte-copying code -- it knows that it will have to
> > do so.
> >
> > However, let's change your new example to something that would "look"
> > more like the conditions present in your original example:
> > #pragma packed(4)
> > struct eg {
> > char x;
> > int y;
> > char z;
> > int w;
> > };
> >
> > Now you have "properly aligned" ints, just as you did in your original
> > example (in the case of x[4], for example). But, then you introduced
> > the casting operation. Hmmm. So, this would be, in effect, doing
> > something like:
> > char buf[] = { 1, 2, 3, 4, 5, 6, 7, 8, ...};
> >
> > for (i = 0; i < 4; i++) {
> > struct eg *p = (struct eg*)&buf[i];
> > ...
> >
> > Now, what happens when you try to read/write the p->y and p->w members
> > of this structure?
> >
> > "It depends!"
> >
> > See, the compiler is likely to think that -- because you properly
> > packed the structure -- it does not have to generate byte-copy kludges
> > to access these members. They appear properly aligned to access using
> > LDR. HOWEVER, what happens if the whole darn structure STARTS at an
> > odd address? Well, that odd *external* alignment quite simply defeats
> > the whole benefit of the proper *internal* alignment, ruining the
> > best-laid plans of the compiler and you. And, what exactly is it that
> > defines "bad" *external* alignment of the structure? Is it when "i"
> > is not a multiple of 4? NO. NO? NO. It is when the ADDRESS OF
> > "buf[i]" is not a multiple of 4 -- and this depends not only on the
> > value of "i", but on the actual starting address of "buf"...something
> > that the compiler DOES NOT KNOW. That depends on where the linker
> > puts it (or, if it were allocated dynamically, rather than as you show
> > it, it would depend on where the memory allocation routine found space
> > for it).
> >
> > If you try out the above properly aligned structure definition with a
> > "cast" to a char buffer of unknown alignment, you will find that just
> > about every compiler will fail to produce the "right" code to work in
> > every circumstance, and that the use of "proper alignment" in the
> > structure definition will fool it into thinking there is no cause for
> > even so much as a warning to you.
> >
> > This, I think, is the downfall of your original example. You declared
> > some "auto" or static ints (x[4]), can't tell which from the code
> > excerpt. The compiler knew enough to force their alignment on the
> > stack or in memory to an appropriate boundary to make them accessible
> > with a normal int-aligned addressing mode. However, the compiler had
> > no reason to care a bit about the alignment of "char buf[]", since it
> > expects you to be using it to hold chars (silly compiler). But, then
> > you threw the compiler a curve by using an int* to access an arbitrary
> > spot in a byte-aligned buffer and assign this to one of those x[i]'s.
> > What's a poor compiler to do?
> >
> > You owe the compiler an apology.
> >
> > cordeg
> >
> >
> > --- In lpc2000@yahoogroups.com, "dsidlauskas1" <dsidlauskas@> wrote:
> > >
> > > ints are not always word aligned. For example:
> > >
> > > #pragma packed(1)
> > > struct eg
> > > {
> > > char x;
> > > int y;
> > > char z;
> > > int w;
> > > }test;
> > >
> > > One of the integers in this structure is not word word aligned
and yet
> > > reference to the non word aligned integer works fine, as should
> > >
> > > int x, *ip;
> > > ip = &test.w;
> > > x = *ip;
> > >
> > >
> > > --- In lpc2000@yahoogroups.com, "fordp2002" <SimonEllwood@> wrote:
> > > >
> > > > By casting buf to an int you have instructed the compiler NOT to
> treat
> > > > it as a byte.
> > > >
> > > > It is beholden on you the software engineer to ensure whatever you
> > > > cast is compatible with what you cast it too. In this case you are
> > > > saying that &buf[i] is word aligned, which will be only true
for one
> > > > in four cases of i. If the compiler aligns the start of buf to a
> word
> > > > bondary which will happen under some cases &buf[0] will be the
only
> > > > legal cast.
> > > >
> > > > FordP
> > > >
> > > >
> > > > --- In lpc2000@yahoogroups.com, David Hawkins <dwh@> wrote:
> > > > >
> > > > > dsidlauskas1 wrote:
> > > > > > Consider the following code:
> > > > > >
> > > > > > ============================
> > > > > > char buf[]={1,2,3,4,5,6,7,8};
> > > > > > int *ip, x[4];
> > > > > >
> > > > > > for (i=0; i<4; i++)
> > > > > > {
> > > > > > ip = (int *)&buf[i];
> > > > > > x[4] = *ip;
> > > > > > }
> > > > >
> > > > > Er, given the fact that x is of length 4,
> > > > > the statement x[4] = *ip is actually out-of-bounds.
> > > > >
> > > > > Dave
> > > > >
> > > >
> > >
> >
>Message
Re: For C Experts
2006-03-31 by garycordelli
Attachments
- No local attachments were found for this message.