[ub] Proposal: make self-initialized references ill-formed (C++17?)

Discussion:

[ub] Proposal: make self-initialized references ill-formed (C++17?)

John Zwinck

2014-09-13 13:52:07 UTC

I recently happened upon some code which, boiled down to its essence, was
like this:

for (int ii = 0; ii < 1; ++ii)
{
const std::string& str = str; // !!
std::cout << str << std::endl;
}

My to my surprise, this code compiled (and produced a segfault at runtime).
I say surprise because I had all warnings enabled (as errors) in GCC 4.7
and 4.9, yet there was no complaint. I got a good answer from Jonathan
Wakely (http://stackoverflow.com/a/25720743/4323) explaining why GCC failed
to catch it, but this got me thinking: why does C++ allow this at all?

So, a proposal: perhaps in C++17 we could declare that self-initialized
references are ill-formed. I did consider whether this might impact
existing code; the only use case that came to mind might be SFINAE, though
I surely have never seen it used that way.

I would appreciate any thoughts on this, and hope I have come to the right
place to discuss it.

David Krauss

2014-09-13 14:24:22 UTC

I would appreciate any thoughts on this, and hope I have come to the right place to discuss it.

You have certainly come to the right place.

The problem cant be solved in general, because member references generate a case where mutual recursion is possible:

struct s {
int & a = b;
int & b = a;
};

Globals allow similar evil:

// a.cpp
extern int & b;
int & a = b;

// b.cpp
extern int & a;
int & b = a;

Any solid rule to forbidding self-initialized references would need to have exemptions for such cases, which would be a serious devaluation. It comes down to QOI.

However, I think there is a problem that such programs are well-formed but only produce UB at runtime. The compiler should be allowed to complain that the reference is initialized without a referent object.

So, a good specification would be that a program is ill-formed but no diagnosis is required, if a reference initializer never refers to an object.

David Krauss

2014-09-13 14:29:08 UTC

Post by David Krauss
So, a good specification would be that a program is ill-formed but no diagnosis is required, if a reference initializer never refers to an object.

Er, never refers to a well-defined storage location suitable for an object of the given type. References can certainly refer to things that only exist in the future.

Richard Smith

2014-09-18 19:42:34 UTC

Post by John Zwinck
I recently happened upon some code which, boiled down to its essence, was
for (int ii = 0; ii < 1; ++ii)
{
const std::string& str = str; // !!
std::cout << str << std::endl;
}
My to my surprise, this code compiled (and produced a segfault at
runtime). I say surprise because I had all warnings enabled (as errors) in
GCC 4.7 and 4.9, yet there was no complaint. I got a good answer from
Jonathan Wakely (http://stackoverflow.com/a/25720743/4323) explaining why
GCC failed to catch it, but this got me thinking: why does C++ allow this
at all?
So, a proposal: perhaps in C++17 we could declare that self-initialized
references are ill-formed. I did consider whether this might impact
existing code; the only use case that came to mind might be SFINAE, though
I surely have never seen it used that way.

FYI, this is core issue 504:

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#504

Post by John Zwinck
I would appreciate any thoughts on this, and hope I have come to the right
place to discuss it.
_______________________________________________
ub mailing list
http://www.open-std.org/mailman/listinfo/ub

Jens Maurer

2014-09-21 12:27:43 UTC

| for (int ii = 0; ii < 1; ++ii)
{
const std::string& str = str; // !!
std::cout << str << std::endl;
}|
My to my surprise, this code compiled (and produced a segfault at runtime). I say surprise because I had all warnings enabled (as errors) in GCC 4.7 and 4.9, yet there was no complaint. I got a good answer from Jonathan Wakely (http://stackoverflow.com/a/25720743/4323) explaining why GCC failed to catch it, but this got me thinking: why does C++ allow this at all?
So, a proposal: perhaps in C++17 we could declare that self-initialized references are ill-formed. I did consider whether this might impact existing code; the only use case that came to mind might be SFINAE, though I surely have never seen it used that way.
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#504

... which doesn't necessarily mean it will be automatically addressed by CWG
in the near future.

Feel free to write a short paper suggesting specific wording changes to address
this issue. (Or, if just 1-2 sentence, send your wording changes to Mike Miller
if you won't attend WG21 meetings in person.)

Jens

John Zwinck

2014-09-22 12:15:58 UTC

Post by Jens Maurer

Post by Richard Smith
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#504

... which doesn't necessarily mean it will be automatically addressed by CWG
in the near future.
Feel free to write a short paper suggesting specific wording changes to address
this issue. (Or, if just 1-2 sentence, send your wording changes to Mike Miller
if you won't attend WG21 meetings in person.)

Understood. I will write some proposed wording here in the hope that
someone might give feedback. In [dcl.init.ref] I would add a clause:

---
The initializer shall not mention the reference being initialized.
Any use of a reference which has not yet been initialized is
ill-formed.
---

I currently have no plans to attend WG21 meetings, but appreciate
your time here.

John Zwinck

Jens Maurer

2014-09-22 16:02:08 UTC

Post by John Zwinck
The initializer shall not mention the reference being initialized.

So, something like the following would be ill-formed?

const int& i = sizeof(i);
const int& j = f<decltype(j)>();

Post by John Zwinck
Any use of a reference which has not yet been initialized is
ill-formed.

That seems unimplementable, because order-of-initialization
for global variables is unspecified between translation units.

Jens

David Krauss

2014-09-23 03:03:55 UTC

Post by Jens Maurer
So, something like the following would be ill-formed?

…

Post by Jens Maurer
That seems unimplementable, because order-of-initialization
for global variables is unspecified between translation units.

I think what we’re fishing for is ODR-use and ill-formed/NDR.

Richard Smith

2014-09-24 01:39:41 UTC

Post by Jens Maurer
So, something like the following would be ill-formed?

âŠ

Post by Jens Maurer
That seems unimplementable, because order-of-initialization
for global variables is unspecified between translation units.

I think what weâre fishing for is ODR-use and ill-formed/NDR.

I think we simply want to say that if an id-expression naming a reference
appears in its own initializer, the program is ill-formed unless the
id-expression is an unevaluated operand or subexpression thereof.

I don't think ill-formed, NDR is a good approach here: this is easy to
diagnose in the "obvious" cases, and no different from other similar
(non-reference) cases that lead to UB in the "non-obvious" cases, so I
think we should make the obvious case ill-formed and leave the other cases
as UB.

Also, ill-formed, NDR implies that *all* executions of the program have
undefined behavior (if the compiler accepts it, which it's permitted to),
even if they don't actually execute the UB. For instance,

void f() { int &r = r; }
int main() {}

is a well-formed program with defined behavior today, but does not have
defined behavior and does not require a diagnostic if we made this
ill-formed, NDR.

I don't think that odr-use is a good approach here, since odr-use means
something else (and in particular, you can name a reference in an evaluated
context without odr-using it, if it's initialized by a constant
expression). That is, I want this to be ill-formed:

const int &r = true ? 0 : r;

... even though the mention of 'r' here happens to not be an odr-use.

David Krauss

2014-09-24 01:59:50 UTC

I think we simply want to say that if an id-expression naming a reference appears in its own initializer, the program is ill-formed unless the id-expression is an unevaluated operand or subexpression thereof.
I don't think ill-formed, NDR is a good approach here: this is easy to diagnose in the "obvious" cases, and no different from other similar (non-reference) cases that lead to UB in the "non-obvious" cases, so I think we should make the obvious case ill-formed and leave the other cases as UB.

That’s only QOI. No need for standardization.

Also, ill-formed, NDR implies that *all* executions of the program have undefined behavior (if the compiler accepts it, which it's permitted to), even if they don't actually execute the UB. For instance,

This was my intent. If a compiler with stronger static analysis finds any circular reference initialization, it should be allowed to balk because the program is nonsense before it ever runs. A reference is supposed to have a referent. Otherwise, it may need to implement an effort to pull a result out of thin air before issuing a mere warning.

const int &r = true ? 0 : r;
... even though the mention of 'r' here happens to not be an odr-use.

Use in a potentially evaluated context sounds better than ODR-use, but the compiler doesn’t know that whole initializer is a constant expression at the time it’s processing the self-reference.

Richard Smith

2014-09-24 20:52:32 UTC

Post by Richard Smith
I think we simply want to say that if an id-expression naming a

reference appears in its own initializer, the program is ill-formed unless
the id-expression is an unevaluated operand or subexpression thereof.

Post by Richard Smith
I don't think ill-formed, NDR is a good approach here: this is easy to

diagnose in the "obvious" cases, and no different from other similar
(non-reference) cases that lead to UB in the "non-obvious" cases, so I
think we should make the obvious case ill-formed and leave the other cases
as UB.
Thatâs only QOI. No need for standardization.

Post by Richard Smith
Also, ill-formed, NDR implies that *all* executions of the program have

undefined behavior (if the compiler accepts it, which it's permitted to),
even if they don't actually execute the UB. For instance,
This was my intent.

Then I'm strongly opposed. It does not seem acceptable to silently change
existing valid and well-defined code into having undefined behavior. I'm
sure I'm not the only one who'll feel this way.

If a compiler with stronger static analysis finds any circular reference

initialization, it should be allowed to balk because the program is
nonsense before it ever runs.

It should be allowed to warn, and that is the status quo; if people want
errors, compilers commonly have a feature to turn their warnings into
errors. We don't need a language change to allow that. But allowing one
compiler to reject (in its conforming mode) where another compiler accepts,
for a program that runs without undefined behavior, is not reasonable.
That's a disaster for portability and predictability.

A reference is supposed to have a referent. Otherwise, it may need to

implement an effort to pull a result out of thin air before issuing a mere
warning.

If a compiler happens to stumble on undefined behavior when emitting code,
it doesn't need to put in an effort to do anything in particular. Any code
it emits is fine.

I don't think that odr-use is a good approach here, since odr-use means
something else (and in particular, you can name a reference in an evaluated
context without odr-using it, if it's initialized by a constant

Post by Richard Smith
const int &r = true ? 0 : r;
... even though the mention of 'r' here happens to not be an odr-use.

Use in a potentially evaluated context sounds better than ODR-use, but the
compiler doesnât know that whole initializer is a constant expression at
the time itâs processing the self-reference.

Exactly; that's one reason why I think we shouldn't rely on odr-use here.

David Krauss

2014-09-26 03:45:10 UTC

Then I'm strongly opposed. It does not seem acceptable to silently change existing valid and well-defined code into having undefined behavior. I'm sure I'm not the only one who'll feel this way.

True, nobody gains anything from the possibility of non-diagnosis and a crash-on-run executable. But thats not in practice going to happen except as a result, as you put it, of the compiler doing nothing special in particular. If the implementation notices the out-of-thin-air result at all, that suggests the existence of some exception handling which provides an opportunity for diagnosis.

A variable declared to be a T& or T&&, that is, reference to type T (8.3.2), shall be initialized by an object, or function, of type T or by an object that can be converted into a T.

This is a shall be requirement applied to a runtime occurrence. Although, it perhaps intends only to constrain the type of an initializer expression.

Does portability suffer from NDR? Sure. But the main suggestion here is to weed out nonsense with a new diagnosis. If some customer cant live with a hard error, issuing a warning and producing an executable also satisfies NDR.

Non-diagnosis and UB right at static initialization is the status quo for Clang and GCC, given this declaration sequence which portably specifies a defective product:

extern int & a;
int & b = a;
int & a = b;

The only other way I know to get circular reference initialization is in a constructor, between two member references, which Clang does diagnose by default as use of an uninitialized variable. GCCs -Wuninitialized is also an area of active development.

On the other hand, there are ways to use a reference in its own initializer which are not unreasonable:

std::function< void( int ) > && f
= [&] ( int cnt ) { if ( cnt ) f( -- cnt ); };

Stronger analysis might be better when it comes to checking initialization.

Richard Smith

2014-09-26 19:02:02 UTC

Post by Richard Smith
Then I'm strongly opposed. It does not seem acceptable to silently change
existing valid and well-defined code into having undefined behavior. I'm
sure I'm not the only one who'll feel this way.
True, nobody gains anything from the possibility of non-diagnosis and a
crash-on-run executable. But thatâs not in practice going to happen except
as a result, as you put it, of the compiler doing nothing special in
particular. If the implementation notices the out-of-thin-air result at
all, that suggests the existence of some exception handling which provides
an opportunity for diagnosis.
Perhaps there should be some specification like static UB diagnosis, where
a particular expression is ill-formed/NDR but behavior is well-defined if
it is neither diagnosed nor evaluated.

Yes, that would nicely address one half of my concer. Essentially we'd
introduce a new class of program that is well-formed, but which an
implementation is not required to translate (and can instead reject with a
diagnostic).

The other half of my concern is portability: it is hugely painful to some
audiences if a program is accepted by one compiler but rejected (or
interpreted differently) by another, and indeed, the very purpose of having
a standard is to minimize the occurrence of this problem. Making the
diagnosis optional exacerbates this.

With the tweak discussed above, I don't see that we gain much over the
status quo: either way, implementations can choose to detect this case, and
either way, they can choose to diagnose or not. The *only* difference is
that their "conforming" mode would be permitted to refuse to translate the
program (which I claim is actually harmful for some audiences).

Post by Richard Smith
A variable declared to be a T& or T&&, that is, âreference to type Tâ
(8.3.2), shall be initialized by an object, or function, of type T or by an
object that can be converted into a T.
This is a âshall beâ requirement applied to a runtime occurrence.
Although, it perhaps intends only to constrain the type of an initializer
expression.

Perhaps; this is definitely imprecisely worded. Either there's a mixture of
a compile-time and a runtime constraint here, or this really means "lvalue
of object or function type T". I suspect the latter.

Does portability suffer from NDR? Sure. But the main suggestion here is to

Post by Richard Smith
weed out nonsense with a new diagnosis. If some customer canât live with a
hard error, issuing a warning and producing an executable also satisfies
NDR.
Non-diagnosis and UB right at static initialization is the status quo for
Clang and GCC, given this declaration sequence which portably specifies a
extern int & a;
int & b = a;
int & a = b;
The only other way I know to get circular reference initialization is in a
constructor, between two member references, which Clang does diagnose by
default as use of an uninitialized variable. GCCâs -Wuninitialized is also
an area of active development.
On the other hand, there are ways to use a reference in its own
std::function< void( int ) > && f
= [&] ( int cnt ) { if ( cnt ) f( -- cnt ); };
Stronger analysis might be better when it comes to checking initialization.

Yes, whatever we do specify, this example should not be made ill-formed.

Gabriel Dos Reis

2014-09-26 19:53:11 UTC

In addition to "conditionally supported"?

-- Gaby

From: ub-***@open-std.org [mailto:ub-***@open-std.org] On Behalf Of David Krauss
Sent: Thursday, September 25, 2014 8:45 PM
To: WG21 UB study group
Subject: Re: [ub] Proposal: make self-initialized references ill-formed (C++17?)

On 2014-09-25, at 4:52 AM, Richard Smith <***@google.com<mailto:***@google.com>> wrote:

Then I'm strongly opposed. It does not seem acceptable to silently change existing valid and well-defined code into having undefined behavior. I'm sure I'm not the only one who'll feel this way.

True, nobody gains anything from the possibility of non-diagnosis and a crash-on-run executable. But that's not in practice going to happen except as a result, as you put it, of the compiler doing nothing special in particular. If the implementation notices the out-of-thin-air result at all, that suggests the existence of some exception handling which provides an opportunity for diagnosis.

Perhaps there should be some specification like static UB diagnosis, where a particular expression is ill-formed/NDR but behavior is well-defined if it is neither diagnosed nor evaluated. The current wording of [dcl.init.ref] 8.5.3/1 actually comes pretty close:

A variable declared to be a T& or T&&, that is, "reference to type T" (8.3.2), shall be initialized by an object, or function, of type T or by an object that can be converted into a T.

This is a "shall be" requirement applied to a runtime occurrence. Although, it perhaps intends only to constrain the type of an initializer expression.

Does portability suffer from NDR? Sure. But the main suggestion here is to weed out nonsense with a new diagnosis. If some customer can't live with a hard error, issuing a warning and producing an executable also satisfies NDR.

Non-diagnosis and UB right at static initialization is the status quo for Clang and GCC, given this declaration sequence which portably specifies a defective product:

extern int & a;
int & b = a;
int & a = b;
The only other way I know to get circular reference initialization is in a constructor, between two member references, which Clang does diagnose by default as use of an uninitialized variable. GCC's -Wuninitialized is also an area of active development.

On the other hand, there are ways to use a reference in its own initializer which are not unreasonable:

std::function< void( int ) > && f
= [&] ( int cnt ) { if ( cnt ) f( -- cnt ); };

Stronger analysis might be better when it comes to checking initialization.

David Krauss

2014-09-27 00:02:18 UTC

In addition to conditionally supported?

Interesting. Conditional support applies to features, but I essentially suggested the same semantics for *mis*features. Casual readers would probably be better off with a separate term for it, if we do eventually go the route of hard static analysis.

In general, yeah, hard static analysis would turn into a portability mess; the status quo is better. Perhaps the real problem is giving users the ability to increase tolerance on (third-party) library headers while still applying strict rules to their own development.

Jim Gimpel

2014-09-22 16:12:25 UTC

I tried this with my favorite static analyzer (FlexeLint) and it issued a
Warning.

Jim Gimpel

Post by Jens Maurer

Post by Jens Maurer

Post by Richard Smith
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#504

... which doesn't necessarily mean it will be automatically addressed by

CWG

Post by Jens Maurer
in the near future.
Feel free to write a short paper suggesting specific wording changes to

address

Post by Jens Maurer
this issue. (Or, if just 1-2 sentence, send your wording changes to

Mike Miller

Post by Jens Maurer
if you won't attend WG21 meetings in person.)

Understood. I will write some proposed wording here in the hope that
---
The initializer shall not mention the reference being initialized.
Any use of a reference which has not yet been initialized is
ill-formed.
---
I currently have no plans to attend WG21 meetings, but appreciate
your time here.
John Zwinck
_______________________________________________
ub mailing list
http://www.open-std.org/mailman/listinfo/ub

15 Replies
1 View
Permalink to this page
Disable enhanced parsing

Thread Navigation

John Zwinck 2014-09-13 13:52:07 UTC

David Krauss 2014-09-13 14:24:22 UTC

David Krauss 2014-09-13 14:29:08 UTC

Richard Smith 2014-09-18 19:42:34 UTC

Jens Maurer 2014-09-21 12:27:43 UTC

John Zwinck 2014-09-22 12:15:58 UTC

Jens Maurer 2014-09-22 16:02:08 UTC

David Krauss 2014-09-23 03:03:55 UTC

Richard Smith 2014-09-24 01:39:41 UTC

David Krauss 2014-09-24 01:59:50 UTC

Richard Smith 2014-09-24 20:52:32 UTC

David Krauss 2014-09-26 03:45:10 UTC

Richard Smith 2014-09-26 19:02:02 UTC

Gabriel Dos Reis 2014-09-26 19:53:11 UTC

David Krauss 2014-09-27 00:02:18 UTC

Jim Gimpel 2014-09-22 16:12:25 UTC

about - legalese

Loading...