[ub] data overlays: reinterpret_cast vs placement new

Discussion:

David Krauss

2014-03-18 05:06:48 UTC

I have a POD data overlay class template whose only member is a char[]. It performs byte swapping and interfaces to a blob of data from the network.

template< typename native >
struct net_word {
char raw[ sizeof (native) ];

operator native () const {
native ret;
COPY_OP( raw, raw + sizeof raw, reinterpret_cast< char * >( & ret ) );
return ret;
}

net_word & operator = ( native const & value ) {
COPY_OP( reinterpret_cast< char const * >( & value ), reinterpret_cast< char const * >( & value + 1 ), raw );
return * this;
}
};

Supposing the implementation aligns such a class the same as a char, is it safe to use it in the old-fashioned, unsafe C idiom:

uint32_t datum = * (net_word< uint32_t > *) buf_ptr;

Is it any safer to jump through a little hoop with placement new?

uint32_t datum = * new( buf_ptr ) net_word< uint32_t >;

Would any part of this mayhem be vulnerable to future semantic restrictions?

For the sake of argument, assume that the underlying memory came straight from malloc (and the NIC) and its never been assigned a dynamic type, or referenced in any way besides char *.

Jens Maurer

2014-03-19 21:25:13 UTC

Permalink

Post by David Krauss
I have a POD data overlay class template whose only member is a char[]. It performs byte swapping and interfaces to a blob of data from the network.
template< typename native >
struct net_word {
char raw[ sizeof (native) ];
operator native () const {
native ret;
COPY_OP( raw, raw + sizeof raw, reinterpret_cast< char * >( & ret ) );
return ret;
}
net_word & operator = ( native const & value ) {
COPY_OP( reinterpret_cast< char const * >( & value ), reinterpret_cast< char const * >( & value + 1 ), raw );
return * this;
}
};
uint32_t datum = * (net_word< uint32_t > *) buf_ptr;

What's "buf_ptr"? Anyway, you seem to be aliasing a net_word
with a uint32_t, which seems to be undefined behavior according
to 3.10p10.

Post by David Krauss
Is it any safer to jump through a little hoop with placement new?
uint32_t datum = * new( buf_ptr ) net_word< uint32_t >;

This destroys the previous contents of *buf_ptr, from a
specification point-of-view.

Post by David Krauss
Would any part of this mayhem be vulnerable to future semantic restrictions?

Which part of the future do you wish me to predict?

Jens

David Krauss

2014-03-20 11:05:12 UTC

Permalink

Post by David Krauss
uint32_t datum = * (net_word< uint32_t > *) buf_ptr;

What's "buf_ptr?

A blob of data with no dynamic type. Its supposed to be idiomatic, cast-happy C using a memory overlay struct.

Anyway, you seem to be aliasing a net_word
with a uint32_t, which seems to be undefined behavior according
to 3.10p10.

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined: a char or unsigned char type.

The only value being accessed after the alias is through an lvalue of char type.

The address of the net_word is the same as the address of its first member, so if member access expressions run afoul of 3.10/10 (certainly on-topic for this question; its not clear to me that they do), the functions could reinterpret_cast< char * >( this ) and there would be no access to the class object whatsoever.

Merely forming a pointer is not aliasing. You can cast pointers to whatever and back per [expr.reinterpret.cast] 5.2.10/7.

Post by David Krauss
Is it any safer to jump through a little hoop with placement new?
uint32_t datum = * new( buf_ptr ) net_word< uint32_t >;

This destroys the previous contents of *buf_ptr, from a
specification point-of-view.

It runs the constructor of a trivially-constructible class, which does nothing and has no particular significance. The object lifetime already began when storage with the proper alignment and size for type T is obtained, which occurred when the data blob was allocated.

Post by David Krauss
Would any part of this mayhem be vulnerable to future semantic restrictions?

Which part of the future do you wish me to predict?

Theres been quite a bit of discussion on this group about adjusting the aliasing and lifetime rules. The last I recall, new-expressions were being considered to become more significant with respect to object lifetime and dynamic type.

Johannes Schaub

2014-03-23 13:48:36 UTC

Permalink

Post by David Krauss
uint32_t datum = * (net_word< uint32_t > *) buf_ptr;

What's "buf_ptr”?

A blob of data with no dynamic type. It’s supposed to be idiomatic,
cast-happy C using a memory overlay struct.

The Standard is not exceptionally clear what the difference between an
object whose lifetime hasn't started yet and an object that doesn't exist at
all is. As far as I am aware, such a difference only exists for class type
objects (during destruction and construction in constructors and
destructors).

In your case there was no construction of the class type
"net_word<uint32_t>", therefor lifetime could not start. Hence I think that
3.8p6 applies which renders your program undefined because " the glvalue is
used to access a non-static data member or call a non-static member function
of the object, or" (you are calling a conversion function).

Regarding the start of lifetime of the net_word object, i think the lifetime
begins when you complete the invocation of the constructor of "net_word<
uint32_t >". 3.8p1 says something else, but it has previously been shown
that this rule is defective (because it allows infinitely many objects whose
sizeof and alignments are compatible be at the same memory location at the
same time). Therefor I assume that this paragraph still does not reflect the
actual intent, for types like int and float but also for class types with
trivial constructors.

Post by David Krauss
Is it any safer to jump through a little hoop with placement new?
uint32_t datum = * new( buf_ptr ) net_word< uint32_t >;

This destroys the previous contents of *buf_ptr, from a
specification point-of-view.

It runs the constructor of a trivially-constructible class, which does
nothing and has no particular significance. The object lifetime already
began when “storage with the proper alignment and size for type T is
obtained,” which occurred when the data blob was allocated.

Your code renders the value of the data member array indeterminate, by
5.3.1p17 (default initialization happens), 12.6.2p8 (the member array is
default initialized), 5.3.4p1 (storage duration is dynamic) and 8.5p12 (the
member array has indeterminate value).

I'm not sure what rule to apply to infer that when memcpy'ing that array
into a uint32_t, that this uint32_t then also contains an undeterminate
value. The paragraphs 3.9p2 and 3.9p3 both assume that you previously had
another uint32_t object that you grabbed the bytes from that make up the
array.

In case nothing else covers it yet, should 3.9p2 say that when the array has
an indeterminate value (disregarding of whether the array has been copied to
from another object), that the target object copied to also has an
indeterminate value?