Difference between revisions of "Data types"
Drainedsoul (talk | contribs) (Elaborating on the vagaries of UTF-16) |
Drainedsoul (talk | contribs) m (Fix typo) |
||
Line 60: | Line 60: | ||
Characters in the other Unicode planes (U+10000 through U+10FFFF inclusive) are encoded as two 16 bit code units (a high/low "surrogate pair"). | Characters in the other Unicode planes (U+10000 through U+10FFFF inclusive) are encoded as two 16 bit code units (a high/low "surrogate pair"). | ||
− | Therefore, the length in bytes of a string may be obtained by multiplying the number of code | + | Therefore, the length in bytes of a string may be obtained by multiplying the number of code units by two, and the opposite operation may be performed by dividing by two. |
However, since UTF-16 is variable-length, the number of code units is totally unrelated to the number of code points (i.e. "characters"), and you will actually have to decode the string to obtain this information. | However, since UTF-16 is variable-length, the number of code units is totally unrelated to the number of code points (i.e. "characters"), and you will actually have to decode the string to obtain this information. |
Revision as of 19:49, 26 August 2013
All data sent over the network is big-endian, that is the bytes are sent from most significant byte to least significant byte. The majority of everyday computers are little-endian, therefore it may be necessary to change the endianness before sending data over the network.
Other than 'String' and 'Metadata', which are decoded with a custom function, these data formats are identical to those provided by the Java classes DataInputStream and DataOutputStream.
Size | Range | Notes | |
---|---|---|---|
bool | 1 | 0 or 1 | Value can be either true (0x01) or false (0x00) |
byte | 1 | -128 to 127 | Signed, two's complement |
short | 2 | -32768 to 32767 | Signed, two's complement |
int | 4 | -2147483648 to 2147483647 | Signed, two's complement |
long | 8 | -9223372036854775808 to 9223372036854775807 | Signed, two's complement |
float | 4 |
See this |
Single-precision 32-bit IEEE 754 floating point |
double | 8 |
See this |
Double-precision 64-bit IEEE 754 floating point |
string | ≥ 2 ≤ 240 |
N/A | UTF-16 big-endian string prefixed by a short containing the length of the string in code units.
UTF-16 is a variable-length encoding, which means that a single Unicode code point (what most programmers think of as a "character") may be encoded by a variable number of code units. UTF-16 encodes code points as either one or two 16 bit code units. Characters in the Basic Multilingual Plane (U+0000 through U+FFFF inclusive) are encoded as one 16 bit code unit. Characters in the other Unicode planes (U+10000 through U+10FFFF inclusive) are encoded as two 16 bit code units (a high/low "surrogate pair"). Therefore, the length in bytes of a string may be obtained by multiplying the number of code units by two, and the opposite operation may be performed by dividing by two. However, since UTF-16 is variable-length, the number of code units is totally unrelated to the number of code points (i.e. "characters"), and you will actually have to decode the string to obtain this information. It was historically believed that the Minecraft protocol used the fixed-width encoding UCS-2, however this has since been proven to be incorrect. |
metadata | Varies | See this |
Some data may be stored as an "absolute integer", which is a more precise kind of integer, and a less precise kind of double. The conversion from double to absolute integer is like so:
abs_int = (int)double * 32;
And back again:
double = (double)abs_int / 32;