Your code probably works, it's just relying on some (fragile) assumptions.
1. That reads and writes to size_t are atomic.
This is likely to be true on many platforms, usually dependent on alignment. This assumption might be broken by transferring this code to another platform, or another context, another compiler, someone adding a
#pragma pack to a header somewhere, etc... Compilers are allowed to align structs and add padding between members as needed, but they are not required to align anything (unless explicitly told to do so). If you need it you should specify it.
In C++11 you could guarantee it by using
std::atomic<size_t>. In C there's
sig_atomic_t, though on some compilers it might be signed or byte-sized, so it's probably not ideal here. Otherwise you can find various compiler-specific atomic types or other libraries for it.
The botom line is that if you want atomic, you should specificy
atomic, because there isn't any protection against making incorrect assumptions about it. If you write code that compiles and runs on multiple platforms, these kind of assumptions tend to get broken often.
For example, say you accidentally broke alignment by whatever accident, and now the member
i has to be written in two pieces. A read that interrupts a write isn't going to read the old value, it's going to read a
completely incorrect value.
2. That volatile guarantees ordering.
Volatile only guarantees ordering in a very limited way. I believe the strict definition is that the operations on volatile objects themselves occur in the specified order, but other operations on non-volatile objects surrounding them might still float around. Some compilers, like MSVC, add other conditions to what volatile does.
The other thing that throws a big wrench into this is that many modern CPUs have deep pipelines, and may re-order memory operations within the pipeline for better cache behaviour, etc. In this case even if the compiler puts everything in the order you expect, the CPU might still re-order it when it gets executed.
Thus, where strict ordering is needed on modern CPUs, there are processor instructions specifically for flushing the pipeline and guaranteeing order. In general,
volatile will not do this, because volatile was only really intended for single-threaded ordering. As Sik mentioned above, you need a
Memory Barrier. (Again, there's C++11 constructs for this, and a plethora of compiler-specific ones.) In MSVC, this is literally just a line that says
MemoryBarrier();, and it directs the compiler not to optimize across that line, and issues the necessary CPU instructions.
In your program's case, it's possible that
data won't finish writing before
i is updated, and even making data volatile wouldn't really guarantee this when using multiple CPUs. The consequences of this problem probably aren't severe though, any might not be noticeable (i.e. in this case it's whether
data gets written before it is read back, which it may have time to do even if the intended barrier condition of
i has failed).
Anyhow, there is some good information at Wikipedia:
https://en.wikipedia.org/wiki/Volatile_(computer_programming)And also:
https://en.wikipedia.org/wiki/Memory_barrierWikipedia: Volatile wrote:
In C and C++ volatile does not work in most threading scenarios, and that use is discouraged.
Wikipedia: Memory Barrier wrote:
In C and C++, the volatile keyword was intended to allow C and C++ programs to directly access memory-mapped I/O. Memory-mapped I/O generally requires that the reads and writes specified in source code happen in the exact order specified with no omissions. Omissions or reorderings of reads and writes by the compiler would break the communication between the program and the device accessed by memory-mapped I/O. A C or C++ compiler may not reorder reads from and writes to volatile memory locations, nor may it omit a read from or write to a volatile memory location. The keyword volatile does not guarantee a memory barrier to enforce cache-consistency. Therefore the use of "volatile" alone is not sufficient to use a variable for inter-thread communication on all systems and processors.