Difference between revisions of "Data types"

From wiki.vg
Jump to navigation Jump to search
m (improved Encodes column)
(→‎Definitions: Describe what an Identifier is (could be, as always, done better, but I link to the page with the official documentation))
(25 intermediate revisions by 6 users not shown)
Line 1: Line 1:
All data sent over the network is [[wikipedia:Endianness#Big-endian|big-endian]], that is the bytes are sent from most significant byte to least significant byte. The majority of everyday computers are little-endian, therefore it may be necessary to change the endianness before sending data over the network.
+
<noinclude>This article defines the '''data types''' used in the [[protocol]]. </noinclude>All data sent over the network (except for VarInt and VarLong) is [[wikipedia:Endianness#Big-endian|big-endian]], that is the bytes are sent from most significant byte to least significant byte. The majority of everyday computers are little-endian, therefore it may be necessary to change the endianness before sending data over the network.
 
+
<noinclude>
 +
== Definitions ==
 +
</noinclude>
 
{| class="wikitable"
 
{| class="wikitable"
 
  |-
 
  |-
Line 45: Line 47:
 
  ! Float
 
  ! Float
 
  | 4
 
  | 4
  | A [[wikipedia:Single-precision floating-point format|single-precision 32-bit IEEE 754 floating point]]
+
  | A [[wikipedia:Single-precision floating-point format|single-precision 32-bit IEEE 754 floating point number]]
 
  |  
 
  |  
 
  |-
 
  |-
 
  ! Double
 
  ! Double
 
  | 8
 
  | 8
  | A [[wikipedia:Double-precision floating-point format|double-precision 64-bit IEEE 754 floating point]]
+
  | A [[wikipedia:Double-precision floating-point format|double-precision 64-bit IEEE 754 floating point number]]
 
  |  
 
  |  
 
  |-
 
  |-
  ! String
+
  ! String (n)
  | ≥ 1 <br />≤&nbsp;2147483652
+
  | ≥ 1 <br />≤ (n&times;4) + 3
 
  | A sequence of [[wikipedia:Unicode|Unicode]] [http://unicode.org/glossary/#unicode_scalar_value scalar values]
 
  | A sequence of [[wikipedia:Unicode|Unicode]] [http://unicode.org/glossary/#unicode_scalar_value scalar values]
  | [[wikipedia:UTF-8|UTF-8]] string prefixed with its size in bytes as a VarInt
+
  | [[wikipedia:UTF-8|UTF-8]] string prefixed with its size in bytes as a VarInt.  Maximum length of <code>n</code> characters, which varies by context; up to <code>n &times; 4</code> bytes can be used to encode <code>n</code> characters and both of those limits are checked.  Maximum <code>n</code> value is 32767.  The + 3 is due to the max size of a valid length VarInt.
 
  |-
 
  |-
 
  ! Chat
 
  ! Chat
  | ≥ 1 <br />≤&nbsp;2147483652
+
  | ≥ 1 <br />≤ (32767&times;4) + 3
 
  | See [[Chat]]
 
  | See [[Chat]]
  | Encoded as a String
+
  | Encoded as a String with max length of 32767.
 +
|-
 +
! Identifier
 +
| ≥ 1 <br />≤ (32767&times;4) + 3
 +
| See [[#Identifier|Identifier]] below
 +
| Encoded as a String with max length of 32767.
 
  |-  
 
  |-  
 
  ! VarInt
 
  ! VarInt
 
  | ≥ 1 <br />≤ 5
 
  | ≥ 1 <br />≤ 5
 
  | An integer between -2147483648 and 2147483647
 
  | An integer between -2147483648 and 2147483647
  | [http://developers.google.com/protocol-buffers/docs/encoding#varints Protocol Buffer Varint], encoding a two's complement signed 32-bit integer
+
  | Variable-length data encoding a two's complement signed 32-bit integer; more info in [[#VarInt and VarLong|their section]]
 
  |-
 
  |-
 
  ! VarLong
 
  ! VarLong
 
  | ≥ 1 <br />≤ 10
 
  | ≥ 1 <br />≤ 10
 
  | An integer between -9223372036854775808 and 9223372036854775807
 
  | An integer between -9223372036854775808 and 9223372036854775807
  | [http://developers.google.com/protocol-buffers/docs/encoding#varints Protocol Buffer Varint], encoding a two's complement signed 64-bit integer
+
  | Variable-length data encoding a two's complement signed 64-bit integer; more info in [[#VarInt and VarLong|their section]]
|-
 
! Chunk
 
| Varies
 
| A vertical chunk column
 
| See [[SMP Map Format#Data]]
 
 
  |-
 
  |-
  ! Metadata
+
  ! Entity Metadata
 
  | Varies
 
  | Varies
 +
| Miscellaneous information about an entity
 
  | See [[Entities#Entity Metadata Format]]
 
  | See [[Entities#Entity Metadata Format]]
|
 
 
  |-  
 
  |-  
 
  ! Slot
 
  ! Slot
 
  | Varies
 
  | Varies
 +
| An item stack in an inventory or container
 
  | See [[Slot Data]]
 
  | See [[Slot Data]]
|
 
|-
 
! Object Data
 
| 4 or 10
 
| See [[Object Data]]
 
|
 
 
  |-
 
  |-
 
  ! NBT Tag
 
  ! NBT Tag
 
  | Varies
 
  | Varies
 +
| Depends on context
 
  | See [[NBT]]
 
  | See [[NBT]]
|
 
 
  |-
 
  |-
 
  ! Position  
 
  ! Position  
Line 111: Line 108:
 
  | 16
 
  | 16
 
  | A [[wikipedia:Universally_unique_identifier|UUID]]
 
  | A [[wikipedia:Universally_unique_identifier|UUID]]
  | The vanilla Minecraft server internally sends this as two longs.
+
  | Encoded as an unsigned 128-bit integer (or two unsigned 64-bit integers: the most significant 64 bits and then the least significant 64 bits)
 
 
<pre>this.writeLong(uuid.getMostSignificantBits());
 
this.writeLong(uuid.getLeastSignificantBits());</pre>
 
 
  |-
 
  |-
 
  ! Optional X
 
  ! Optional X
Line 125: Line 119:
 
  | Zero or more fields of type X
 
  | Zero or more fields of type X
 
  | The count must be known from the context.
 
  | The count must be known from the context.
 +
|-
 +
! X Enum
 +
| size of X
 +
| A specific value from a given list
 +
| The list of possible values and how each is encoded as an X must be known from the context. An invalid value sent by either side will usually result in the client being disconnected with an error or even crashing.
 
  |-
 
  |-
 
  ! Byte Array
 
  ! Byte Array
Line 132: Line 131:
 
  |}
 
  |}
  
=== Position ===
+
<noinclude>== Identifier ==</noinclude><includeonly>=== Identifier ===</includeonly>
 +
 
 +
Identifiers are a namespaced location, in the form of <code>minecraft:thing</code>.  If the namespace is not provided, it defaults to <code>minecraft</code> (i.e. <code>thing</code> is <code>minecraft:thing</code>.  Custom content should always be in its own namespace, not the default one.  The namespace should only use the characters <code>01​​234​5​6​78​9abcdefghijklmnopqrstuvwxyz-_</code>; actual names may contain more symbols.  The naming convention is <code>lower_case_with_underscores</code>.  [https://minecraft.net/en-us/article/minecraft-snapshot-17w43a More information].
 +
 
 +
<noinclude>== VarInt and VarLong ==</noinclude><includeonly>=== VarInt and VarLong ===</includeonly>
 +
 
 +
Variable-length format such that smaller numbers use fewer bytes.  These are very similar to [http://developers.google.com/protocol-buffers/docs/encoding#varints Protocol Buffer Varints]: the 7 least significant bits are used to encode the value and the most significant bit indicates whether there's another byte after it for the next part of the number.  The least significant group is written first, followed by each of the more significant groups; thus, VarInts are effectively little endian (however, groups are 7 bits, not 8).
 +
 
 +
VarInts are never longer than 5 bytes, and VarLongs are never longer than 10 bytes.
 +
 
 +
Pseudocode to read and write VarInts and VarLongs:
 +
 
 +
<syntaxhighlight lang="java">
 +
public static int readVarInt() {
 +
    int numRead = 0;
 +
    int result = 0;
 +
    byte read;
 +
    do {
 +
        read = readByte();
 +
        int value = (read & 0b01111111);
 +
        result |= (value << (7 * numRead));
 +
 
 +
        numRead++;
 +
        if (numRead > 5) {
 +
            throw new RuntimeException("VarInt is too big");
 +
        }
 +
    } while ((read & 0b10000000) != 0);
 +
 
 +
    return result;
 +
}
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="java">
 +
public static long readVarLong() {
 +
    int numRead = 0;
 +
    long result = 0;
 +
    byte read;
 +
    do {
 +
        read = readByte();
 +
        int value = (read & 0b01111111);
 +
        result |= (value << (7 * numRead));
 +
 
 +
        numRead++;
 +
        if (numRead > 10) {
 +
            throw new RuntimeException("VarLong is too big");
 +
        }
 +
    } while ((read & 0b10000000) != 0);
 +
 
 +
    return result;
 +
}
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="java">
 +
public static void writeVarInt(int value) {
 +
    do {
 +
        byte temp = (byte)(value & 0b01111111);
 +
        // Note: >>> means that the sign bit is shifted with the rest of the number rather than being left alone
 +
        value >>>= 7;
 +
        if (value != 0) {
 +
            temp |= 0b10000000;
 +
        }
 +
        writeByte(temp);
 +
    } while (value != 0);
 +
}
 +
</syntaxhighlight>
 +
<syntaxhighlight lang="java">
 +
public static void writeVarLong(long value) {
 +
    do {
 +
        byte temp = (byte)(value & 0b01111111);
 +
        // Note: >>> means that the sign bit is shifted with the rest of the number rather than being left alone
 +
        value >>>= 7;
 +
        if (value != 0) {
 +
            temp |= 0b10000000;
 +
        }
 +
        writeByte(temp);
 +
    } while (value != 0);
 +
}
 +
</syntaxhighlight>
 +
 
 +
{{Warning2|Note that Minecraft's VarInts are not encoded using Protocol Buffers; it's just similar.  If you try to use Protocol Buffers Varints with Minecraft's VarInts, you'll get incorrect results in some cases.  The major differences:
 +
*Minecraft's VarInts are all signed, but do not use the ZigZag encoding.  Protocol buffers have 3 types of Varints: <code>uint32</code> (normal encoding, unsigned), <code>sint32</code> (ZigZag encoding, signed), and <code>int32</code> (normal encoding, signed).  Minecraft's are the <code>int32</code> variety.  Because Minecraft uses the normal encoding instead of ZigZag encoding, negative values always use the maximum number of bytes.
 +
*Minecraft's VarInts are never be longer than 5 bytes and its VarLongs will never be longer than 10 bytes, while Protocol Buffer Varints will always use 10 bytes when encoding negative numbers, even if it's an <code>int32</code>.}}
 +
 
 +
Sample VarInts:
 +
 
 +
{| class="wikitable"
 +
! Value !! Hex bytes !! Decimal bytes
 +
|-
 +
| 0 || 0x00 || 0
 +
|-
 +
| 1 || 0x01 || 1
 +
|-
 +
| 2 || 0x02 || 2
 +
|-
 +
| 127 || 0x7f || 127
 +
|-
 +
| 128 || 0x80 0x01 || 128 1
 +
|-
 +
| 255 || 0xff 0x01 || 255 1
 +
|-
 +
| 2147483647 || 0xff 0xff 0xff 0xff 0x07 || 255 255 255 255 7
 +
|-
 +
| -1 || 0xff 0xff 0xff 0xff 0x0f || 255 255 255 255 15
 +
|-
 +
| -2147483648 || 0x80 0x80 0x80 0x80 0x08 || 128 128 128 128 8
 +
|}
 +
 
 +
Sample VarLongs:
 +
 
 +
{| class="wikitable"
 +
! Value !! Hex bytes !! Decimal bytes
 +
|-
 +
| 0 || 0x00 || 0
 +
|-
 +
| 1 || 0x01 || 1
 +
|-
 +
| 2 || 0x02 || 2
 +
|-
 +
| 127 || 0x7f || 127
 +
|-
 +
| 128 || 0x80 0x01 || 128 1
 +
|-
 +
| 255 || 0xff 0x01 || 255 1
 +
|-
 +
| 2147483647 || 0xff 0xff 0xff 0xff 0x07 || 255 255 255 255 7
 +
|-
 +
| 9223372036854775807 || 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x7f || 255 255 255 255 255 255 255 255 127
 +
|-
 +
| -1 || 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x01 || 255 255 255 255 255 255 255 255 255 1
 +
|-
 +
| -2147483648 || 0x80 0x80 0x80 0x80 0xf8 0xff 0xff 0xff 0xff 0x01 || 128 128 128 128 248 255 255 255 255 1
 +
|-
 +
| -9223372036854775808 || 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x01 || 128 128 128 128 128 128 128 128 128 1
 +
|}
 +
 
 +
<noinclude>== Position ==</noinclude><includeonly>=== Position ===</includeonly>
  
 
64-bit value split in to three parts
 
64-bit value split in to three parts
Line 157: Line 289:
 
  if z >= 2^25 { z -= 2^26 }
 
  if z >= 2^25 { z -= 2^26 }
  
=== Fixed-point numbers ===
+
<noinclude>== Fixed-point numbers ==</noinclude><includeonly>=== Fixed-point numbers ===</includeonly>
  
 
Some fields may be stored as [https://en.wikipedia.org/wiki/Fixed-point_arithmetic fixed-point numbers], where a certain number of bits represents the signed integer part (number to the left of the decimal point) and the rest represents the fractional part (to the right). Floating points (float and double), in contrast, keep the number itself (mantissa) in one chunk, while the location of the decimal point (exponent) is stored beside it.  
 
Some fields may be stored as [https://en.wikipedia.org/wiki/Fixed-point_arithmetic fixed-point numbers], where a certain number of bits represents the signed integer part (number to the left of the decimal point) and the rest represents the fractional part (to the right). Floating points (float and double), in contrast, keep the number itself (mantissa) in one chunk, while the location of the decimal point (exponent) is stored beside it.  
Line 166: Line 298:
  
 
Java lacks support for fractional integers directly, but you can represent them as integers. To convert from a double to this integer representation, use the following formulas:
 
Java lacks support for fractional integers directly, but you can represent them as integers. To convert from a double to this integer representation, use the following formulas:
   abs_int = (int)double * 32;
+
   abs_int = (int)double * 32.0D;
 
And back again:
 
And back again:
  
   double = (double)abs_int / 32;
+
   double = (double)abs_int / 32.0D;
 
+
<noinclude>
 
[[Category:Protocol Details]]
 
[[Category:Protocol Details]]
 
[[Category:Minecraft Modern]]
 
[[Category:Minecraft Modern]]

Revision as of 17:31, 8 November 2017

This article defines the data types used in the protocol. All data sent over the network (except for VarInt and VarLong) is big-endian, that is the bytes are sent from most significant byte to least significant byte. The majority of everyday computers are little-endian, therefore it may be necessary to change the endianness before sending data over the network.

Definitions

Name Size (bytes) Encodes Notes
Boolean 1 Either false or true True is encoded as 0x01, false as 0x00.
Byte 1 An integer between -128 and 127 Signed 8-bit integer, two's complement
Unsigned Byte 1 An integer between 0 and 255 Unsigned 8-bit integer
Short 2 An integer between -32768 and 32767 Signed 16-bit integer, two's complement
Unsigned Short 2 An integer between 0 and 65535 Unsigned 16-bit integer
Int 4 An integer between -2147483648 and 2147483647 Signed 32-bit integer, two's complement
Long 8 An integer between -9223372036854775808 and 9223372036854775807 Signed 64-bit integer, two's complement
Float 4 A single-precision 32-bit IEEE 754 floating point number
Double 8 A double-precision 64-bit IEEE 754 floating point number
String (n) ≥ 1
≤ (n×4) + 3
A sequence of Unicode scalar values UTF-8 string prefixed with its size in bytes as a VarInt. Maximum length of n characters, which varies by context; up to n × 4 bytes can be used to encode n characters and both of those limits are checked. Maximum n value is 32767. The + 3 is due to the max size of a valid length VarInt.
Chat ≥ 1
≤ (32767×4) + 3
See Chat Encoded as a String with max length of 32767.
Identifier ≥ 1
≤ (32767×4) + 3
See Identifier below Encoded as a String with max length of 32767.
VarInt ≥ 1
≤ 5
An integer between -2147483648 and 2147483647 Variable-length data encoding a two's complement signed 32-bit integer; more info in their section
VarLong ≥ 1
≤ 10
An integer between -9223372036854775808 and 9223372036854775807 Variable-length data encoding a two's complement signed 64-bit integer; more info in their section
Entity Metadata Varies Miscellaneous information about an entity See Entities#Entity Metadata Format
Slot Varies An item stack in an inventory or container See Slot Data
NBT Tag Varies Depends on context See NBT
Position 8 An integer/block position: x (-33554432 to 33554431), y (-2048 to 2047), z (-33554432 to 33554431) x as a 26-bit integer, followed by y as a 12-bit integer, followed by z as a 26-bit integer (all signed, two's complement). See also the section below.
Angle 1 A rotation angle in steps of 1/256 of a full turn Whether or not this is signed does not matter, since the resulting angles are the same.
UUID 16 A UUID Encoded as an unsigned 128-bit integer (or two unsigned 64-bit integers: the most significant 64 bits and then the least significant 64 bits)
Optional X 0 or size of X A field of type X, or nothing Whether or not the field is present must be known from the context.
Array of X count times size of X Zero or more fields of type X The count must be known from the context.
X Enum size of X A specific value from a given list The list of possible values and how each is encoded as an X must be known from the context. An invalid value sent by either side will usually result in the client being disconnected with an error or even crashing.
Byte Array Varies Depends on context This is just a sequence of zero or more bytes, its meaning should be explained somewhere else, e.g. in the packet description. The length must also be known from the context.

Identifier

Identifiers are a namespaced location, in the form of minecraft:thing. If the namespace is not provided, it defaults to minecraft (i.e. thing is minecraft:thing. Custom content should always be in its own namespace, not the default one. The namespace should only use the characters 01​​234​5​6​78​9abcdefghijklmnopqrstuvwxyz-_; actual names may contain more symbols. The naming convention is lower_case_with_underscores. More information.

VarInt and VarLong

Variable-length format such that smaller numbers use fewer bytes. These are very similar to Protocol Buffer Varints: the 7 least significant bits are used to encode the value and the most significant bit indicates whether there's another byte after it for the next part of the number. The least significant group is written first, followed by each of the more significant groups; thus, VarInts are effectively little endian (however, groups are 7 bits, not 8).

VarInts are never longer than 5 bytes, and VarLongs are never longer than 10 bytes.

Pseudocode to read and write VarInts and VarLongs:

public static int readVarInt() {
    int numRead = 0;
    int result = 0;
    byte read;
    do {
        read = readByte();
        int value = (read & 0b01111111);
        result |= (value << (7 * numRead));

        numRead++;
        if (numRead > 5) {
            throw new RuntimeException("VarInt is too big");
        }
    } while ((read & 0b10000000) != 0);

    return result;
}
public static long readVarLong() {
    int numRead = 0;
    long result = 0;
    byte read;
    do {
        read = readByte();
        int value = (read & 0b01111111);
        result |= (value << (7 * numRead));

        numRead++;
        if (numRead > 10) {
            throw new RuntimeException("VarLong is too big");
        }
    } while ((read & 0b10000000) != 0);

    return result;
}
public static void writeVarInt(int value) {
    do {
        byte temp = (byte)(value & 0b01111111);
        // Note: >>> means that the sign bit is shifted with the rest of the number rather than being left alone
        value >>>= 7;
        if (value != 0) {
            temp |= 0b10000000;
        }
        writeByte(temp);
    } while (value != 0);
}
public static void writeVarLong(long value) {
    do {
        byte temp = (byte)(value & 0b01111111);
        // Note: >>> means that the sign bit is shifted with the rest of the number rather than being left alone
        value >>>= 7;
        if (value != 0) {
            temp |= 0b10000000;
        }
        writeByte(temp);
    } while (value != 0);
}

Warning.png Note that Minecraft's VarInts are not encoded using Protocol Buffers; it's just similar. If you try to use Protocol Buffers Varints with Minecraft's VarInts, you'll get incorrect results in some cases. The major differences:

  • Minecraft's VarInts are all signed, but do not use the ZigZag encoding. Protocol buffers have 3 types of Varints: uint32 (normal encoding, unsigned), sint32 (ZigZag encoding, signed), and int32 (normal encoding, signed). Minecraft's are the int32 variety. Because Minecraft uses the normal encoding instead of ZigZag encoding, negative values always use the maximum number of bytes.
  • Minecraft's VarInts are never be longer than 5 bytes and its VarLongs will never be longer than 10 bytes, while Protocol Buffer Varints will always use 10 bytes when encoding negative numbers, even if it's an int32.

Sample VarInts:

Value Hex bytes Decimal bytes
0 0x00 0
1 0x01 1
2 0x02 2
127 0x7f 127
128 0x80 0x01 128 1
255 0xff 0x01 255 1
2147483647 0xff 0xff 0xff 0xff 0x07 255 255 255 255 7
-1 0xff 0xff 0xff 0xff 0x0f 255 255 255 255 15
-2147483648 0x80 0x80 0x80 0x80 0x08 128 128 128 128 8

Sample VarLongs:

Value Hex bytes Decimal bytes
0 0x00 0
1 0x01 1
2 0x02 2
127 0x7f 127
128 0x80 0x01 128 1
255 0xff 0x01 255 1
2147483647 0xff 0xff 0xff 0xff 0x07 255 255 255 255 7
9223372036854775807 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x7f 255 255 255 255 255 255 255 255 127
-1 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x01 255 255 255 255 255 255 255 255 255 1
-2147483648 0x80 0x80 0x80 0x80 0xf8 0xff 0xff 0xff 0xff 0x01 128 128 128 128 248 255 255 255 255 1
-9223372036854775808 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x01 128 128 128 128 128 128 128 128 128 1

Position

64-bit value split in to three parts

  • x: 26 MSBs
  • z: 26 LSBs
  • y: 12 bits between them

Encoded as followed:

((x & 0x3FFFFFF) << 38) | ((y & 0xFFF) << 26) | (z & 0x3FFFFFF)

And decoded as:

val = read_unsigned_long();
x = val >> 38;
y = (val >> 26) & 0xFFF;
z = val << 38 >> 38;

Note: The details of bit shifting are rather language dependent; the above may work in Java but probably won't in other languages without some tweaking. In particular, you will usually receive positive numbers even if the actual coordinates are negative. This can be fixed by adding something like the following:

if x >= 2^25 { x -= 2^26 }
if y >= 2^11 { y -= 2^12 }
if z >= 2^25 { z -= 2^26 }

Fixed-point numbers

Some fields may be stored as fixed-point numbers, where a certain number of bits represents the signed integer part (number to the left of the decimal point) and the rest represents the fractional part (to the right). Floating points (float and double), in contrast, keep the number itself (mantissa) in one chunk, while the location of the decimal point (exponent) is stored beside it.

Essentially, while fixed-point numbers have lower range than floating points, their fractional precision is greater for higher values. This makes them ideal for representing global coordinates of an entity in Minecraft, as it's more important to store the integer part accurately than position them more precisely within a single block (or meter).

Coordinates are often represented as a 32-bit integer, where 5 of the least-significant bits are dedicated to the fractional part, and the rest store the integer part.

Java lacks support for fractional integers directly, but you can represent them as integers. To convert from a double to this integer representation, use the following formulas:

 abs_int = (int)double * 32.0D;

And back again:

 double = (double)abs_int / 32.0D;