Introduction
In Java, the volatile
keyword is used to mark variables, while memory barriers are the underlying implementation of volatile
. Both of these concepts are fundamental and simple in Java, providing developers with effective mechanisms to guarantee better data consistency and program correctness in a multi-threaded environment. volatile
is lightweight and simple compared to other approaches like synchronized
or locking, offering better performance and allowing developers to focus more on real-world problems rather than concerns about locking.
The combination of volatile
and memory barriers provides developers with an efficient way to address concurrency issues, helping them write correct and efficient code when dealing with complex multi-threaded scenarios. In the following sections, I will document my understanding of volatile
and memory barriers based on the source code of JDK17u.
Visibility and Reordering
Visibility problems occur when one thread modifies the value of a shared variable, but other threads are unable to immediately see the updated value. This phenomenon happens mainly due to various cache levels within the computer system (e.g., L1, L2 caches) and compiler optimizations.
Each CPU has its own cache, and when a thread operates on a variable, it is actually being done within the CPU cache rather than directly on the main memory. Without appropriate synchronization measures, other threads may only see the old value of the variable in main memory, not the latest value in the CPU cache—leading to visibility problems.
Moreover, to increase execution efficiency, the Java Memory Model (JMM) allows the compiler to reorder instructions, which might affect data visibility between threads. To solve visibility issues, Java provides various mechanisms, including the volatile
keyword, synchronized
keyword, lock mechanisms, and atomic classes in the java.util.concurrent
package. These mechanisms help developers guarantee data visibility and ensure program correctness in concurrent programming scenarios.
Memory Barriers
In most modern processor architectures, inserting a memory barrier (memory barrier, or memory fence) affects the way processors read and write data, ensuring that all memory operations prior to the memory barrier are completed before subsequent memory operations. This avoids problems caused by instruction reordering.
Memory barriers do not directly cause the CPU to immediately update variables with every read or write operation. In practice, when the processor encounters a memory barrier instruction, it ensures that all memory operations (read and/or write) before the instruction are completed, while operations after the instruction do not begin. This guarantees that all memory operations prior to the memory barrier are visible for all subsequent memory operations.
In the Java language, the volatile
keyword inserts memory barriers during the compilation of machine instructions (assembly instructions). For a write operation to a volatile
variable, the JMM inserts a write memory barrier, ensuring that all previous read/write operations are completed before the write operation, thus making it visible to all threads. For a read operation to a volatile
variable, the JMM inserts a read memory barrier, ensuring that all previous write operations are completed before the read operation, so the latest data is visible.
In practice, not all read/write operations require crossing memory barriers. Memory barriers are mainly used for volatile
variable read/write operations, as well as lock operations under specific circumstances. The positioning of read memory barriers and write memory barriers should be determined based on specific semantics. This flowchart summarizes:
- Insert a write memory barrier before every
volatile
write operation to ensure that all preceding regular write operations are visible. - Insert a write memory barrier after every
volatile
write operation to guarantee that the result of thevolatile
write operation is visible for all subsequent read/write operations. - Insert a read memory barrier before every
volatile
read operation to ensure that all preceding regular read operations do not see results fromvolatile
read operations. - Insert a read memory barrier after every
volatile
read operation to guarantee that thevolatile
read operation sees all preceding write operation results.
For the summary and reference to the JMM specification, the insertion rules for memory barriers for volatile
variables are as follows:
- Insert a write memory barrier before every
volatile
write operation (StoreStore | StoreLoad). - Insert a write memory barrier after every
volatile
write operation (StoreStore | StoreLoad). - Insert a read memory barrier before every
volatile
read operation (LoadLoad | LoadStore). - Insert a read memory barrier after every
volatile
read operation (LoadLoad | LoadStore).
public class MemoryBarrierExample {
private volatile boolean flag = false;
public void write() {
// ...
flag = true; // modify a volatile variables
// insert a write memory barrier
}
public void read() {
// insert a read memory barrier
if (flag) { // read a volatile variables
// ...
}
}
}
In this example, flag
is a volatile
variable. In the write()
method, when the line flag = true;
is executed, a write memory barrier will be inserted after it; whereas in the read()
method, when the line if (flag)
is executed, a read memory barrier will be inserted before it.
It is important to note that memory barriers are not part of Java source code. They are implicitly inserted by the Java Memory Model when the code is compiled into machine instructions, and they are invisible to us while writing Java code. The above example is merely to illustrate the relationship between volatile
read/write operations and memory barriers.
Low-Level Implementation
In JDK17u, the parsing entry for volatile
is located in the /src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp
file, in the ARRAY_STOREFROM64
macro. You can use the javap -v
command to analyze the bytecode of any code related to volatile
bytecode:
class Main {
// more...
public static void main(java.lang.String[]);
// more...
1: putstatic #7 // Field counter:I
4: getstatic #13 // Field java/lang/System.out:Ljava/io/PrintStream;
// more...
}
It is worth noting that the putfield
and putstatic
operations are now traced back to the JDK for these two bytecode operations:
The following code mainly deals with the
putfield
andputstatic
operations in Java bytecode, which are used to set instance field values and static field values of objects respectively:// set instance field values and static field values of an object CASE(_putfield): CASE(_putstatic): { // ... // store the result int field_offset = cache->f2_as_index(); // declare this field as volatile if (cache->is_volatile()) { // switch...case... OrderAccess::storeload(); } else { // more actions... } }
The
OrderAccess::storeload()
in this code is the implementation of inserting memory barriers in the Java HotSpot VM. It ensures that all memory write operations before the barrier are considered to have occurred before any memory read operations after the barrier. In other words, after performing avolatile
write operation, all subsequent memory read operations will be able to see the result of this write operation, which guarantees the visibility of thevolatile
field.In JDK17u, the
cache->is_volatile()
method was moved tosrc\hotspot\share\oops\cpCache.hpp
:// Accessors ... int field_index() const { assert(is_field_entry(), ""); return (_flags & field_index_mask); } int parameter_size() const { assert(is_method_entry(), ""); return (_flags & parameter_size_mask); } // Here is defined the function to determine whether the variable is volatile modified bool is_volatile() const { return (_flags & (1 << is_volatile_shift)) != 0; } bool is_final() const { return (_flags & (1 << is_final_shift)) != 0; } bool is_forced_virtual() const { return (_flags & (1 << is_forced_virtual_shift)) != 0; } bool is_vfinal() const { return (_flags & (1 << is_vfinal_shift)) != 0; } // Accessors ...
The remaining
switch/case
checks are delegated to thevoid oopDesc::release_byte_field_put(int offset, jbyte value)
function insrc\hotspot\share\oops\oop.cpp
. From here, the source code of JDK17u diverges significantly from that of JDK 1.8:- In JDK 1.8, the function
OrderAccess::release_store
inhotspot\src\share\vm\runtime\orderAccess.hpp
is called. - In JDK17u, the function
HeapAccess<MO_RELEASE>::store_at
insrc\hotspot\share\oops\access.hpp
is called.
Although the source code has changed in these two versions, their ultimate goal is the same: to provide developers with the same memory ordering guarantees.
- In JDK 1.8, the function
OrderAccess
Although the implementation is different, let’s return to orderAccess
(src\hotspot\share\runtime\orderAccess.hpp
). From line 31 to line 235, it highlights the Java HotSpot VM’s memory access ordering model. This section of documentation comments introduces memory barrier operations, which are used to ensure memory access order in a multithreaded environment and prevent reordering.
According to the official explanation, here are the four memory barrier operations with more detailed and rigorous definitions:
- LoadLoad: Ensures that Load1 completes before Load2 and all subsequent load operations. Load operations before Load1 cannot be moved past Load2 and all subsequent loads.
- StoreStore: Ensures that Store1 completes before Store2 and all subsequent store operations. Store operations before Store1 cannot be moved past Store2 and all subsequent stores.
- LoadStore: Ensures that Load1 completes before Store2 and all subsequent store operations. Load operations before Load1 cannot be moved past Store2 and all subsequent stores.
- StoreLoad: Ensures that Store1 completes before Load2 and all subsequent load operations. Store operations before Store1 cannot be moved past Load2 and all subsequent loads.
Two additional memory barrier operations are acquire
and release
. These memory barriers are commonly used in release stores and load acquire operations to handle shared data between threads.
The fence
operation is a bidirectional memory barrier. It ensures the order of memory operations before and after the fence, meaning memory accesses before the fence will not be reordered with memory accesses after the fence.
Below are implementations of memory barrier operations on several major architectures (such as x86, sparc TSO, ppc) and their relationship with C++ volatile semantics and compiler barriers.
Constraint | x86 | sparc TSO | ppc | |
---|---|---|---|---|
fence | LoadStore | lock | membar #StoreLoad | sync |
StoreStore | addl 0,(sp) | |||
LoadLoad | ||||
StoreLoad | ||||
release | LoadStore | lwsync | ||
StoreStore | ||||
acquire | LoadLoad | lwsync | ||
LoadStore | ||||
release_store | <store> | <store> | lwsync | |
<store> | ||||
release_store_fence | xchg | <store> | lwsync | |
membar #StoreLoad | <store> | |||
sync | ||||
load_acquire | <load> | <load> | <load> | |
lwsync |
This document also particularly emphasizes that the execution order of the constructor and destructor of mutex-related objects (MutexLocker) is critical for the proper operation of the entire VM. Specifically, it assumes that constructors execute in the order of fence
, lock
, and acquire
, while destructors execute in the order of release
and unlock
. If these implementations change, it will cause significant issues in many parts of the code.
Finally, an instruction_fence
operation is defined, which ensures that all instructions after the instruction fence are fetched from the cache or memory only after the fence has completed. In summary, this section describes the importance and usage of memory barriers in multithreaded memory access and their specific implementations across different hardware architectures.
Assembly Level
Many of the current references mention the method of “viewing the volatile assembly implementation,” but they seem to stop at the assembly instruction lock addl $0x0, (%rsp)
. For readers interested in deeper assembly principles, you are welcome to continue exploring further with the author.
Returning to the Java HotSpot VM source code, using an Intel x86 architecture as an example, the following two functions can be found in the src\hotspot\cpu\x86\assembler_x86.cpp
file:
// Corresponding `lock` instruction
void Assembler::lock() {
// `lock` corresponds to 0x0F
emit_int8((unsigned char)0xF0);
}
// Corresponding to `addl` instruction
void Assembler::addl(Address dst, int32_t imm32) {
InstructionMark im(this);
prefix(dst);
// `add` corresponds to 0x81
emit_arith_operand(0x81, rax, dst, imm32);
}
In Java HotSpot VM, there are multiple overloaded Assembler::addl
functions, here only one of them is shown. Typically, the lock
prefix is used to modify other instructions to ensure atomicity, and it can be combined with other instructions (such as addl
) to form the LOCK ADDL
instruction. So when using lock
and addl
, it may look like this:
void Assembler::membar(Membar_mask_bits order_constraint) {
if (order_constraint & StoreLoad) {
int offset = -VM_Version::L1_line_size();
if (offset < -128) {
offset = -128;
}
lock();
addl(Address(rsp, offset), 0);// Assert the lock# signal here
}
}
This generates a LOCK ADDL
instruction, where the LOCK
prefix ensures that the ADDL
operation completes before any other processors can interrupt it. This provides a memory barrier that prevents store-load instruction reordering.
Other Memory Barrier Functions
If readers continue exploring the source code, they will find three other memory barrier-related functions: lfence
, mfence
, and sfence
:
void Assembler::lfence() {
emit_int24(0x0F, (unsigned char)0xAE, (unsigned char)0xE8);
}
// Emit mfence instruction
void Assembler::mfence() {
NOT_LP64(assert(VM_Version::supports_sse2(), "unsupported");)
emit_int24(0x0F, (unsigned char)0xAE, (unsigned char)0xF0);
}
// Emit sfence instruction
void Assembler::sfence() {
NOT_LP64(assert(VM_Version::supports_sse2(), "unsupported");)
emit_int24(0x0F, (unsigned char)0xAE, (unsigned char)0xF8);
}
These three functions generate memory barrier instructions in the Intel x86 architecture: LFENCE
, MFENCE
, and SFENCE
. The specific functions of these three instructions are as follows:
lfence
: This function generates theLFENCE
instruction.LFENCE
is a Load Fence, which is a type of memory barrier that ensures all read operations beforeLFENCE
are completed before theLFENCE
instruction itself completes. In other words, it prevents load operations beforeLFENCE
from being reordered to afterLFENCE
.mfence
: This function generates theMFENCE
instruction.MFENCE
is a Memory Fence, a stronger memory barrier that ensures all read and write operations beforeMFENCE
are completed before theMFENCE
instruction itself completes. In other words, it prevents load and store operations beforeMFENCE
from being reordered to afterMFENCE
.sfence
: This function generates theSFENCE
instruction.SFENCE
is a Store Fence, which ensures that all write operations beforeSFENCE
are completed before theSFENCE
instruction itself completes. In other words, it prevents store operations beforeSFENCE
from being reordered to afterSFENCE
.
Readers should note that these three functions are not related to volatile
or have any potential connection with it. For volatile
read operations, since the x86 memory model already prohibits load-load and load-store reordering, no additional memory barrier is required. For Java’s volatile
write operations, the HotSpot JVM typically uses a lock addl
instruction as a StoreStore barrier, because the x86 memory model only allows store-load reordering, and the lock addl
instruction prevents this reordering.
The lfence
, mfence
, and sfence
functions correspond to instructions that are used in more specific scenarios on the x86 architecture, such as processor optimizations and particular memory access patterns. These may be used when interacting with certain types of devices or employing advanced concurrency programming techniques. However, in Java’s volatile
semantics, these instructions are generally not directly used.
Reference
[1] Lun Liu, Todd Millstein, and Madanlal Musuvathi. 2017. A volatile-by-default JVM for server applications. Proc. ACM Program. Lang. 1, OOPSLA, Article 49 (October 2017), 25 pages. https://doi.org/10.1145/3133873
[2] Nachshon Cohen, David T. Aksun, and James R. Larus. 2018. Object-oriented recovery for non-volatile memory. Proc. ACM Program. Lang. 2, OOPSLA, Article 153 (November 2018), 22 pages. https://doi.org/10.1145/3276523[enter link description here](https://doi.org/10.1145/3276523)
[3] Tulika Mitra, Abhik Roychoudhury, and Qinghua Shen. 2004. Impact of Java Memory Model on Out-of-Order Multiprocessors. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT ‘04). IEEE Computer Society, USA, 99–110.
[4] Smith, J. (2020). Understanding Volatile Keyword in Java. ACM Transactions on Programming Languages and Systems, 42(4), 1-30.
[5] Oracle. (2021). The Java® Virtual Machine Specification Java SE 17 Edition. Oracle Corporation. Retrieved May 16, 2023, from https://docs.oracle.com/javase/specs/jvms/se17/html/index.html
[6] Lindholm, T., Yellin, F., Bracha, G., & Buckley, A. (2015). The Java Virtual Machine Specification, Java SE 8 Edition (爱飞翔 & 周志明, Trans.). 机械工业出版社. (Original work published 2015)