Sharing data between threads
Problems with sharing data between threads
race condition: the outcome depends on the relative ordering of execution of operations on two or more threads. Problematic race conditions typically occur when an operation requires two or more separable instructions as it opens a window for other thread to step in and compromise the data invariant.
data race: a specific type of race condition that arises because the concurrent modifications to a single object. Data races cause the dreaded undefined behavior
Avoiding problematic race conditions
There are several approaches to deal with problematic race condition
- Wrap the data structure with a protection mechanism to ensure that only the thread performing a modification can see the intermediate states where the invariants are broken
- Modify the design of the data structure and its invariants so that modifications are done as a series of indivisible changes, each of which preserves the invariants. This is generally referred to as lock-free programming(Chapter 7) and is difficult to get right. It requires a decent understanding of memory model (Chapter 5).
- Record a series of reads and writes to shared memory in a log—called a transaction—and commit them in a single step. Before making a commit, we validate the memory it accessed in the past to see if another thread concurrently made changes to them. If the memory is modified, the validation fails and the transaction is either rolled back or aborted. Otherwise, the transaction is committed. This is termed software transaction memory, and it’s still an active research area.
Protecting shared data with mutexes
mutex
(mutual exclusion) is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.
std::lock_guard
implements an RAII idiom for a mutex; it locks the supplied mutex on construction and unlocks it on destruction
Don’t pass pointers and references to protected data outside the scope of the lock, whether by returning them from a function, storing them in externally visible memory, or passing them as arguments to user-supplied functions. (Page 43)
Some STL classes are not designed to be thread-safe; there’s a potential risk of race condition when using these classes directly. For example, the following code is susceptible to data race
std::stack<std::vector<int>> data;
// ... add data
if (!data.empty()) {
auto v = data.top(); // another thread may modify data before/after this instruction, leading to undesired behavior
v.pop();
}
Deadlock
Sometime, deadlock may not be obvious. Consider a class that defines a swap operation. In order to be thread-safe, it has to lock its own mutex as well as the other object’s mutex
class X {
public:
void swap(X& other) {
std::lock_guard lock1(m);
std::lock_guard lock1(other.m);
swap(v);
}
private:
std::vector<int> v;
std::mutex m;
}
Now if x1.swap(x2)
and x2.swap(x1)
are executed concurrently in two thread and each has locked its own mutex, we have deadlock. To deal with these situations, we have to lock m
and other.m
at once either using std::lock
class X {
public:
void swap(X& other) {
std::lock(m, other.m);
std::lock_guard lock1(m, std::adopt_lock);
std::lock_guard lock1(other.m, std::adopt_lock);
swap(v);
}
private:
std::vector<int> v;
std::mutex m;
}
Or std::scoped_lock
class X {
public:
void swap(X& other) {
std::scoped_lock lock1(m, other.m); // relying on automic deduction of class template parameters to deduce types
swap(v);
}
private:
std::vector<int> v;
std::mutex m;
}
Guidelines for avoiding deadlock
- Don’t wait for another thread if there’s a chance it’s waiting for you.
- Acquire lock in a fixed order.(Page 54)
- Use a lock hierarchy (Page 55)
std::unique_lock
std::unique_lock
usually more costly than std::lock_guard
as it provides an option to not own a mutex.
Use std::unique_lock
when you need to defer locking or the ownership of the lock needs to be transferred from one scope to another.
std::unique_lock
allows to acquire and release a lock whenever it’s desirable without destroying the object. It can even be passed as an argument to std::lock
Locking at an appropriate granularity
Lock only when necessary. In particular, don’t do any time-consuming activities such as file I/O while holding a lock.
Alternative facilities for protecting shared data
Protecting shared data during initialization
Sometimes data only needs to be protected during initialization and is safe for concurrent access. In that case, it’s burdensome and inefficient to use a mutex for protection as one has always to check if the initialization has completed. C++ provides std::call_once
to ensure a callable object only be called once. The necessary synchronization data is stored in a std::once_flag
instance, which incurs a lower overhead than using a mutex explicitly. (Page 66) The following code provide a concrete example
class X {
private:
connection_info connection_details;
connection_handle connection;
std::once_flag connection_init_flag;
void open_connection() { connection=connection_manager.open(connection_details); }
public:
X(connection_info const& connection_details_): connection_details(connection_details_) {}
void send_data(data_packet const& data) {
std::call_once(connection_init_flag, &X::open_connection, this); // lazy initialization
connection.send_data(data);
}
data_packet receive_data() {
std::call_once(connection_init_flag, &X::open_connection, this); // lazy initialization
return connection.receive_data();
}
};
It’s worth noting that std::mutex
and std::once_flag
instances can’t be copied or moved, so for classes like the above one, we have to explicitly define special member functions should you acquire them.
static
data is guaranteed to initialized only once and therefore its initialization is thread-safe.
Protecting rarely updated data structures
When a data structure is rarely updated, std::mutex
presents a pessimistic option for protecting data as it prevents the possible concurrency in reading the data structure when it isn’t undergoing modification. C++ provides std::shared_mutex
and std::shared_timed_mutex
for such cases; the latter supports additional operations (section 4.3), so the former might offer a performance benefit on some platforms.
For update operation, std::lock_guard<std::shared_mutex>
and std::unique_lock<std::shared_mutex>
can be used for the locking to ensure exclusive access. Those threads don’t need update the data structure can use std::shared_lock<std::shared_mutex>
to obtain shared access.
Recursive locking
If recursively locking in a single thread is required, std::recursive_mutex
should be used.
Most of the time, if you think you want to use a recursive mutex, you probably need to change your design instead.
References
Williams, Anthony. 2019. C++ Concurrency in Action, 2nd Edition.