Compact Strings in Java 9

One of the enhancement in Java 9 is Compact String with the goal to make String class and related classes more space efficient while maintaining performance in most scenarios.

Motivation for introducing Compact String in Java

Till Java 8 String was stored internally as a character array with each character taking 2 bytes of space where UTF16 was used for character encoding.

Data gathered from many different applications indicates that strings are a major component of heap usage, moreover most String objects contain only Latin-1 also called ISO-8859-1 characters. Latin-1 is a 8-bit character set meaning it needs 1 byte of space i.e. 1 byte less than UTF16 for each character. If strings can be stored using Latin-1 character encoding that will bring substantial reduction in memory usage by String objects. That is the motivation behind compact Strings in Java.

Java 9 compact Strings

Java 9 onwards this space efficiency optimization is brought to String class in Java using a new feature called compact Strings.

Instead of char array Java 9 onward String is stored internally as a byte array plus an encoding-flag field.

This new String class stores characters encoded as ISO-8859-1/Latin-1 (1 byte per character) if all the characters of the String can be stored using 1 byte each.

In case any character of the String needs 2 bytes (in case of special characters) all the characters of the String are stored as UTF-16 (2 bytes per character).

How to determine whether UTF16 or Latin-1 character encoding has to be used is done using the encoding-flag field known as coder.

So in Java 8 String class there was this code for String storage-

/** The value is used for character storage. */
private final char value[];

Which is changed Java 9 onward to use byte[]-

private final byte[] value;

A flag (field named coder) to identify the encoding is also added-

 * The identifier of the encoding used to encode the bytes in
 * {@code value}. The supported values in this implementation are
 * UTF16
 * @implNote This field is trusted by the VM, and is a subject to
 * constant folding if String instance is constant. Overwriting this
 * field after construction will cause problems.
private final byte coder;

Which can have either of the following two values.

@Native static final byte LATIN1 = 0;
@Native static final byte UTF16  = 1;

Changes in String methods for compact Strings

Methods in String class are also changed to check if String is stored as Latin-1 character or UTF-16 character and appropriate implementation is used. For example substring() method of the String class with Compact String changes-

public String substring(int beginIndex) {
  if (beginIndex < 0) {
    throw new StringIndexOutOfBoundsException(beginIndex);
  int subLen = length() - beginIndex;
  if (subLen < 0) {
    throw new StringIndexOutOfBoundsException(subLen);
  if (beginIndex == 0) {
    return this;
  return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
                    : StringUTF16.newString(value, beginIndex, subLen);

private boolean isLatin1() {
  return COMPACT_STRINGS && coder == LATIN1;

Using XX:-CompactStrings option

By default Compact String option is enabled which can be disabled by using -XX:-CompactStrings VM option. You may want to disable it, if mainly UTF-16 Strings are used in your application.

