G1GC tuning

Setting G1RSetUpdatingPauseTimePercent to 10 (the default)

The percentage of the target pause time used for updating RSets is 10 percent by default. Our configuration (copied from upstream Cassandra) has this value reduced to 5 percent. This tests the JVM default against our baseline.


Baseline

-Xms16g \
-Xmx16g \
-Xss256k \
-XX:+UseG1GC \
-XX:G1RSetUpdatingPauseTimePercent=5 \
-XX:MaxGCPauseMillis=300 \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintHeapAtGC \
-XX:+PrintTenuringDistribution \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintPromotionFailure \
-XX:PrintFLSStatistics=1 \
-Xloggc:/var/log/cassandra/gc.log \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=10 \
-XX:GCLogFileSize=10M \
-XX:+PrintAdaptiveSizePolicy
                

G1RSetUpdatingPauseTimePercent of 10

-Xms16g \
-Xmx16g \
-Xss256k \
-XX:+UseG1GC \
-XX:G1RSetUpdatingPauseTimePercent=10 \
-XX:MaxGCPauseMillis=300 \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-XX:+PrintHeapAtGC \
-XX:+PrintTenuringDistribution \
-XX:+PrintGCApplicationStoppedTime \
-XX:+PrintPromotionFailure \
-XX:PrintFLSStatistics=1 \
-Xloggc:/var/log/cassandra/gc.log \
-XX:+UseGCLogFileRotation \
-XX:NumberOfGCLogFiles=10 \
-XX:GCLogFileSize=10M \
-XX:+PrintAdaptiveSizePolicy
                
Cassandra read rate
Cassandra write rate
Cassandra read latency (75p)
Cassandra read latency (99p)
Cassandra write latency (75p)
Cassandra write latency (99p)
Cassandra connection timeouts/sec
Collection time (accumulated, 5 minute moving average)
Collections

Observations

The most significant change was to average collection time, a reduction of ~14% However, there were 2 allocation failures, one resulting in pauses of 1.48s and 4.98 respectively (unusual for a 16GB heap). More research is needed.