One of the enhancement in Java 9 is Compact String with the goal to make String class and related classes more space efficient while maintaining performance in most scenarios.
Motivation for introducing Compact String in Java
Till Java 8 String was stored internally as a character array with each character taking 2 bytes of space where UTF16 was used for character encoding.
Data gathered from many different applications indicates that strings are a major component of heap usage, moreover most String objects contain only Latin-1 also called ISO-8859-1 characters. Latin-1 is a 8-bit character set meaning it needs 1 byte of space i.e. 1 byte less than UTF16 for each character. If strings can be stored using Latin-1 character encoding that will bring substantial reduction in memory usage by String objects. That is the motivation behind compact Strings in Java.
Java 9 compact Strings
Java 9 onwards this space efficiency optimization is brought to String class in Java using a new feature called compact Strings.
Instead of char array Java 9 onward String is stored internally as a byte array plus an encoding-flag field.
This new String class stores characters encoded as ISO-8859-1/Latin-1 (1 byte per character) if all the characters of the String can be stored using 1 byte each.
In case any character of the String needs 2 bytes (in case of special characters) all the characters of the String are stored as UTF-16 (2 bytes per character).
How to determine whether UTF16 or Latin-1 character encoding has to be used is done using the encoding-flag field known as coder.
So in Java 8 String class there was this code for String storage-
/** The value is used for character storage. */ private final char value[];
Which is changed Java 9 onward to use byte[]-
@Stable private final byte[] value;
A flag (field named coder) to identify the encoding is also added-
/** * The identifier of the encoding used to encode the bytes in * {@code value}. The supported values in this implementation are * * LATIN1 * UTF16 * * @implNote This field is trusted by the VM, and is a subject to * constant folding if String instance is constant. Overwriting this * field after construction will cause problems. */ private final byte coder;
Which can have either of the following two values.
@Native static final byte LATIN1 = 0; @Native static final byte UTF16 = 1;
Changes in String methods for compact Strings
Methods in String class are also changed to check if String is stored as Latin-1 character or UTF-16 character and appropriate implementation is used. For example substring() method of the String class with Compact String changes-
public String substring(int beginIndex) { if (beginIndex < 0) { throw new StringIndexOutOfBoundsException(beginIndex); } int subLen = length() - beginIndex; if (subLen < 0) { throw new StringIndexOutOfBoundsException(subLen); } if (beginIndex == 0) { return this; } return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen) : StringUTF16.newString(value, beginIndex, subLen); } private boolean isLatin1() { return COMPACT_STRINGS && coder == LATIN1; }
Using XX:-CompactStrings option
By default Compact String option is enabled which can be disabled by using -XX:-CompactStrings VM option. You may want to disable it, if mainly UTF-16 Strings are used in your application.
Related Posts
- Java StringBuilder With Method Examples
- Java String intern() Method
- Java String split() Method
- Java String join() Method With Examples
- Java String matches() Method
- Java String replace Method With Examples
- Java String valueOf() Method With Examples
- Java String repeat() Method
That’s all for the topic Compact Strings in Java 9. If something is missing or you have something to share about the topic please write a comment.
You may also like