Designing lock-free concurrent data structures
Definitions and consequences
Data structures and algorithms that use mutexes, conditional variables, and futures to synchronize the data are called blocked data structures and algorithms. When a thread is blocked, the OS will suspend the thread completely and allow another thread to run first.
Data structures and algorithms that don’t use blocking library functions are said to be nonblocking. In C++, nonblocking concurrency is achieved by atomic types and operations.
The most important atomic operation is the compare/exchange operation. It’s important as another thread might have modified the data in the meantime, in which case the code will need to redo part of its operation(e.g., the false part of compare/exchange) before trying the compare/exchange again.
Detecting nodes that can’t be reclaimed using hazard pointers
Hazard pointers refers to a technique discovered by Maged Michael. The idea is that if a thread is going to access an object that would be potentially deleted by another thread, it first set a hazard pointer to reference the object, informing the other thread that deleting the object would indeed be hazardous. Once the object is no longer needed, the hazard pointer is cleared. Because a hazard node is always set before reading, reading is always safe.
It’s easy to see that hazard pointers consists of two parts:
- The thread trying to access an object sets the hazard pointer first and then accesses it.
- The thread trying to delete an object first check if any hazard pointers references the object. If there is one, it defers the reclamation; otherwise, it deletes it.
We demonstrate hazard pointers with an implementation of pop
operation below. Note that a multi-thread stack
does not need such a complicated pop
—one is free to delete old_head
after retrieving data as long as the stack
does not define other operations that access the data.
std::shared_ptr<T> pop() {
std::atomic<void*>& hp = get_hazard_pointer_for_current_thread();
Node* old_head = head.load(); // head is an atomic /of void*
do {
Node* tmp;
// Loop until both old_head and hp points to head
// this eliminates the problem introduced by other
// threads changing the head after setting the hazard pointer
do {
tmp = old_head;
hp.store(old_head);
old_head = head.load();
} while (old_head != head);
// change the head to the next after the hazard pointer is set
// compare_exchange further ensures that the hazard point and
// old_head point to the head we're gonna delete. The strong
// version is used to avoid resetting hp because of spurious failures
} while (old_head && !head.compare_exchange_strong(old_head, old_head->next));
hp.store(nullptr); // clear the hazard pointer as we've claimed the old_head as ours
std::shared_ptr<T> res;
if (old_head) {
res.swap(old_head->data); // it's free to swap data as no read threads try to access data
if (outstanding_hazard_pointers_for(old_head))
reclaim_later(old_head);
else
delete old_head;
delete_nodes_with_no_hazard(); // delete nodes that are previously stored by reclaim_later
}
return res;
}
We don’t elaborate functions used here. Instead, we only outline the utility of each function below. For interested readers, please refer to section 7.2.2 of Williams 2019 for the complete code.
get_hazard_pointer_for_current_thread()
: returns a hazard pointer for the current thread. This hazard pointer is a globalatomic
but is currently uniquely owned by the current thread. This means that other threads may check it’s value but is not allowed to write to it.outstanding_hazard_pointers_for(old_head)
: check all hazard pointers and return if there is any hazard pointer pointing toold_head
.reclaim_later(old_head)
: addold_head
to areclaim_list
so that we can reclaim it laterdelete_nodes_with_no_hazard()
: check thereclaim_list
to delete a node if any there is no hazard pointer associated to it.
Detecting nodes in use with reference counting
Another technique to prevent from dereferencing a delete object is to check how many pointers are currently pointing to the object before deleting. This idea is exactly the same as shared_ptr
. As C++20 officially introduces std::atomic<std::shared_ptr<T>>
, this method becomes trivial(for previous C++ standards, standalone functions, deprecated as of C++20, may be used for atomic access to std::shared_ptr
). We only briefly discuss the data structure below
struct Node;
struct CountedNodePtr {
int external_count;
Node* ptr;
};
struct Node {
std::shared_ptr<T> data;
std::atomic<int> internal_count;
CountedNodePtr next;
};
std::atomic<CountedNodePtr> head;
Here, we use CountedNodePtr
instead of Node*
as the atomic type, which includes an additional counter external_count
. external_count
increases by 1 when any thread tries to read the pointer. When a thread finishes its reading, we decrease internal_count
by 1. When external_count
and internal_count
sum to 0, we delete the node. The utility of the separation of external_count
from internal_count
is twofold: 1) external_count
provides a way to inform other threads that I’m gonna to access ptr
, don’t delete it until I finish. 2) internal_count
counts how many accesses are finished. One may attempt to remove internal_count
and decrease external_count
instead. However, this cannot be done without changing the data structure as external_count
is a local object and will be discarded once the corresponding CountedNodePtr
is destroyed. We refer interested reader to subsection 7.2.4 of Williams 2019 for detailed operations of this method.
Guidelines for working lock-free data structures
Several guidelines for working lock-free data structures are listed below
- Use
std::memory_order_seq
for prototyping. - Use a lock-free memory reclamation scheme
- Watch out for the ABA problem. Extra care must be taken when reusing nodes. For example, compare/exchange may happen far later after the
expected
valueA
is stored. During that period, theatomic
may have changed to some other valueB
and changed back to theexpected
valueA
. In that case, compare/exchange still succeed as both theatomic
andexpected
has valueA
, but they are no longer the sameA
and in most cases, were not supposed to be treated as if they are. - Identify busy-wait loops and help the other thread (see subsection 7.2.6 for an example)
References
Williams, Anthony. 2019. C++ Concurrency in Action, 2nd Edition.