User:Pokechu22/Chunk Format

From wiki.vg
< User:Pokechu22
Revision as of 17:54, 22 March 2017 by Pokechu22 (talk | contribs) (→‎Sample implementations: ... I recommended isolating into separate methods, but didn't do so myself; change that.)
Jump to navigation Jump to search

v2 of SMP Map Format, work in progress.

This article describes in additional detail the format of the Chunk Data packet.

Concepts

Chunks columns and Chunk sections

You've probably heard the term "chunk" before. Minecraft uses chunks to store and transfer world data. However, there are actually 2 different concepts that are both called "chunks" in different contexts: chunk columns and chunk sections.


A chunk column is a 16×256×16 collection of blocks, and is what most players think of when they hear the term "chunk". However, these are not the smallest unit data is stored in in the game; chunk columns are actually 16 chunk sections aligned vertically.

Chunk columns store biomes, block entities, entities, tick data, and an array of sections.


A chunk section is a 16×16×16 collection of blocks (chunk sections are cubic). This is the actual area that blocks are stored in, and is often the concept Mojang refers to via "chunk". Breaking columns into sections wouldn't be useful, except that you don't need to send all chunk sections in a column: If a section is empty[concept note 1], then it doesn't need to be sent. For the average world, this means you don't need to send approximately 60% of the world's data, since it's all air. A chunk section can contain at maximum 4096 (16×16×16, or 212) unique IDs (but, it is highly unlikely that such a section will occur in normal circumstances).

Chunk sections store blocks and light data (both block light and sky light). Additionally, they can be associated with a section palette.

Chunk columns and chunk sections are both displayed when chunk border rendering is enabled (F3+G). Chunk columns borders are indicated via the red vertical lines, while chunk sections borders are indicated by the blue lines.

Global and section palettes

Illustration of an indexed palette (Source)

Minecraft also uses palettes. A palette maps numeric IDs to block states. The concept is more commonly used with colors in an image; Wikipedia's articles on color look-up tables, indexed colors, and palettes in general may be helpful for fully grokking it.

There are 2 palettes that are used in the game: the global palette and the section palette.


The global palette is the standard mapping of IDs to block states. Currently, it is a combination Block ID and Metadata ((blockId << 4) | metadata). Note that thus, the global palette is not continuous[concept note 2]. Entries not defined within the global palette are treated as air (even if the block ID itself is known, if the metadata is not known, the state is treated as air). Note that the global palette is currently represented by 13 bits per entry[concept note 3], with 9 bits for the block ID and 4 bits for the metadata.

The basic implementation looks like this:

long getGlobalPaletteIDFromState(BlockState state) {
	if (state.isValid()) {
		return (state.getId() << 4) | state.getMetadata();
	} else {
		return 0;
	}
}

BlockState getStateFromGlobalPaletteID(long id) {
	int blockID = (id >> 4);
	byte metadata = (id & 0x0F);
	BlockState state = new BlockState(blockID, metadata);
	if (state.isValid()) {
		return state;
	} else {
		return new BlockState(0, 0);  // Air
	}
}

Warning.png Don't assume that the global palette will always be like this; keep it in a separate function. Mojang has stated that they plan to change the global palette to avoid increasing the total size. Equally so, though, do not hardcode the total size of the palette; keep it in a constant.


A section palette is used to map IDs within a chunk section to global palette IDs. Other than skipping empty sections, correct use of the section palette is the biggest place where data can be saved. Given that most sections contain only a few blocks, using 13 bits to represent a chunk section that is only stone, gravel, and air would be extremely wasteful. Instead, a list of IDs are sent mapping indexes to global palette IDs (for instance, 0x10 0xD0 0x00), and indexes within the section palette are used (so stone would be sent as 0, gravel 1, and air 2)[concept note 4]. The number of bits per ID in the section palette varies from 4 to 8; if fewer than 4 bits would be needed it's increased to 4[concept note 5] and if more than 8 would be needed, the section palette is not used and instead global palette IDs are used[concept note 6].

Warning.png Note that the Notchian client (and server) store their chunk data within the compacted, paletted format. Sending non-compacted data not only wastes bandwidth, but also leads to increased memory use clientside; while this is OK for an initial implementation it is strongly encouraged that one compacts the block data as soon as possible.

Notes

  1. Empty is defined in vanilla as being composed of all air, but this can result in lighting issues (MC-80966). Custom servers should consider defining empty to mean something like "completely air and without lighting data" or "completely air and with no blocks in the neighboring sections that need to be lit by light from this section".
  2. The global palette is not continuous in more ways than 1. The more obvious manner is that not all blocks have metadata: for instance, dirt (ID 3) has only 3 states (dirt, coarse dirt, and podzol), so the palette surrounding it is 000000011 0000; 000000011 0001; 000000011 0010; 000000100 0000. The second way is that structure blocks have an ID of 255, even though there is currently no block with ID 254; thus, there is a large gap.
  3. The number of bits in the global palette via the ceil of a base-2 logarithm of the highest value in the palette.
  4. There is no requirement for IDs in a section palette to be monotonic; the order within the list is entirely arbitrary and often has to deal with how the palette is built (if it finds a stone block before an air block, stone can come first). (However, although the order of the section palette entries can be arbitrary, it can theoretically be optimized to ensure the maximum possible GZIP compression. This optimization offers little to no gain, so generally do not attempt it.) However, there shouldn't be any gaps in the section palette, as gaps would increase the size of the section palette when it is sent.
  5. Most likely, sizes smaller than 4 are not used in the section palette because it would require the palette to be resized several times as it is built in the majority of cases; the processing cost would be higher than the data saved.
  6. Most likely, sizes larger than 8 use the global palette because otherwise, the amount of data used to transmit the palette would exceed the savings that the section palette would grant.

Packet structure

Packet ID State Bound To Field Name Field Type Notes
0x20 Play Client Chunk X Int Chunk coordinate (block coordinate divided by 16, rounded down)
Chunk Z Int Chunk coordinate (block coordinate divided by 16, rounded down)
Ground-Up Continuous Boolean This is true if the packet represents all chunk sections in this vertical chunk column. If true, the chunk that was previously there should be replaced with this chunk. If false, this packet is instead modifying the given chunk sections, but leaves the other sections alone.
Primary Bit Mask VarInt Bitmask with bits set to 1 for every 16×16×16 chunk section whose data is included in Data. The least significant bit represents the chunk section at the bottom of the chunk column (from y=0 to y=15).
Size VarInt Size of Data in bytes
Data Byte array See data structure below
Number of block entities VarInt Length of the following array
Block entities Array of NBT Tag All block entities in the chunk. Use the x, y, and z tags in the NBT to determine their positions.

Data structure

The data section of the packet contains most of the useful data for the chunk.

Field Name Field Type Notes
Data Array of Chunk Section The length of the array is equal to the number of bits set in Primary Bit Mask. Chunks are sent bottom-to-top, i.e. the first chunk, if sent, extends from Y=0 to Y=15.
Biomes Optional Byte Array Only sent if Ground-Up Continuous is true; 256 bytes if present

Chunk Section

A Chunk Section is defined in terms of other data types. A Chunk Section consists of the following fields:

Field Name Field Type Notes
Bits Per Block Unsigned Byte Determines how many bits are used to encode a block. Note that not all numbers are valid here. This also changes whether the palette is present.
Palette Length VarInt Length of the following array. May be 0, in which case the following palette is not sent.
Palette Optional Array of VarInt Mapping of block state IDs in the global palette to indices of this array
Data Array Length VarInt Number of longs in the following array
Data Array Array of Long Compacted list of 4096 indices pointing to state IDs in the Palette
Block Light Byte Array Half byte per block
Sky Light Optional Byte Array Only if in the Overworld; half byte per block

In half-byte arrays, two values are packed into each byte. Even-indexed items are packed into the low bits, odd-indexed into the high bits.

Data Array, Block Light, and Sky Light are given for each block with increasing x coordinates, within rows of increasing z coordinates, within layers of increasing y coordinates.

The Data Array, although varying in length, will never be padded due to the number of blocks being evenly divisible by 64, which is the number of bits in a long.

There are several values that can be used for the bits per block value. In most cases, invalid values will be interpreted as a different value when parsed by the Notchian client, meaning that chunk data will be parsed incorrectly if you use an invalid bits per block. Servers must make sure that the bits per block value is correct.

  • up to 4: Blocks are encoded as 4 bits. The palette is used and sent.
  • 5 to 8: Blocks are encoded with the given number of bits. The palette is used and sent.
  • 9 and above: The palette is not sent. Blocks are encoded by their whole ID in the global palette, with bits per block being set as the base 2 logarithm of the number of block states, rounded up. For the current vanilla release, this is 13 bits per block.

The global palette encodes a block as 13 bits. It uses the block ID for the first 9 bits, and the block damage value for the last 4 bits. For example, Diorite (block ID 1 for minecraft:stone with damage 3) would be encoded as 000000001 0011. If a block is not found in the global palette (either due to not having a valid damage value or due to not being a valid ID), it will be treated as air.

If Minecraft Forge is installed and a sufficiently large number of blocks are added, the bits per block value for the global palette will be increased to compensate for the increased ID count. This increase can go up to 16 bits per block (for a total of 4096 block IDs; when combined with the 16 damage values, there are 65536 total states). You can get the number of blocks with the "Number of ids" field found in the RegistryData packet in the Forge Handshake.

The data array stores several entries within a single long, and sometimes overlaps one entry between multiple longs. For a bits per block value of 13, the data is stored such that bits 1 through 13 are the first entry, 14 through 26 are the second, and so on. Note that bit 1 is the least significant bit in this case, not the most significant bit. The same behavior applies when a value stretches between two longs: for instance, block 5 would be bits 53 through 64 of the first long and then bit 65 of the second long.

Example

13 bits per block, using the global palette.

The following two longs would represent...

1001880C0060020 = 0000000100000000000110001000000011000000000001100000000000100000
200D0068004C020 = 0000001000000000110100000000011010000000000001001100000000100000

9 blocks, with the start of a 10th (that would be finished in the next long).

  1. Grass, 2:0
  2. Dirt, 3:0
  3. Dirt, 3:0
  4. Coarse dirt, 3:1
  5. Stone, 1:0
  6. Stone, 1:0
  7. Diorite, 1:3
  8. Gravel, 13:0
  9. Gravel, 13:0
  10. Stone, 1:0 (or potentially emerald ore, 129:0)

Biomes

The biomes array is only present when ground-up continuous is set to true. Biomes cannot be changed unless a chunk is re-sent.

The structure is an array of 256 bytes, each representing a Biome ID (it is recommended that 127 for "Void" is used if there is no set biome). The array is indexed by z * 16 | x.

Tips

There are several things that can make it easier to implement this format.

  • The 13 value for full bits per block is likely to change in the future, so it should not be hardcoded (instead, it should either be calculated or left as a constant).
  • Servers do not need to implement the palette initially (instead always using 13 bits per block), although it is an important optimization later on.
  • The Notchain server implementation does not send values that are out of bounds for the palette. If such a value is received, the format is being parsed incorrectly.

Sample implementations

How the chunk format can be implemented varies largely by how you want to read/write it. It is often easier to read/write the data long-by-long instead of pre-create the data to write; however, storing the chunk data arrays in their packed form can be far more efficient memory- and performance-wise. These implementations are simple versions that can work as a base (especially for dealing with the bit shifting), but are not ideal.

Deserializing

When deserializing, it is easy to read to a buffer (since length information is present). A basic example:

public Chunk ReadChunkDataPacket(Buffer data) {
    int x = ReadInt(data);
    int z = ReadInt(data);
    bool full = ReadBool(data);
    Chunk chunk;
    if (full) {
        chunk = new Chunk(x, z);
    } else {
        chunk = GetExistingChunk(x, z);
    }
    int mask = ReadVarInt(data);
    int size = ReadVarInt(data);
    ReadChunkColumn(chunk, full, mask, data.ReadByteArray(size));

    int blockEntityCount = ReadVarInt(data);
    for (int i = 0; i < blockEntityCount; i++) {
        CompoundTag tag = ReadCompoundTag(data);
        chunk.AddBlockEntity(tag.GetInt("x"), tag.GetInt("y"), tag.GetInt("z"), tag);
    }

    return chunk;
}

private void ReadChunkColumn(Chunk chunk, bool full, int mask, Buffer data) {
    for (int sectionY = 0; sectionY < CHUNK_HEIGHT / SECTION_HEIGHT; y++) {
        if ((mask & (1 << chunkY)) != 0) {  // Is the given bit set in the mask?
            byte bitsPerBlock = ReadByte(data);

            // Excessively specific format that exactly matches the client logic
            // This extra checking makes sense on the server side, but client
            // side it only is needed when dealing with servers sending incorrect packets
            // (the notchian server will not send such packets)
            if (bitsPerBlock < 4) {
                bitsPerBlock = 4;
            }
            if (bitsPerBlock > 8) {
                bitsPerBlock = FULL_SIZE_BITS_PER_BLOCK;  // 13, currently, but liable to eventually change
            }

            bool usePalette = (bitsPerBlock <= 8)

            int[] palette = null;
            if (usePalette) {
                int numPaletteEntries = ReadVarInt(data);
                palette = new int[numPaletteEntries];
                for (int i = 0; i < numPaletteEntries; i++) {
                    palette[i] = ReadVarInt(data);
                }
            } else {
                ReadVarInt(data);  // Should always be 0
            }

            // A bitmask that contains bitsPerBlock set bits
            uint individualValueMask = (uint)((1 << bitsPerBlock) - 1);

            UInt64[] dataArray = ReadUInt64Array(data);  // Reads a VarInt length prefix and then that many UInt64

            ChunkSection section = new ChunkSection();

            for (int y = 0; y < SECTION_HEIGHT; y++) {
                for (int z = 0; z < SECTION_WIDTH; z++) {
                    for (int x = 0; x < SECTION_WIDTH; x++) {
                        int blockNumber = (((blockY * SECTION_HEIGHT) + blockZ) * SECTION_WIDTH) + blockX;
                        int startLong = (blockNumber * bitsPerBlock) / 64;
                        int startOffset = (blockNumber * bitsPerBlock) % 64;
                        int endLong = ((blockNumber + 1) * bitsPerBlock - 1) / 64;

                        uint data;
                        if (startLong == endLong) {
                            data = (uint)(dataArray[startLong] >> startOffset);
                        } else {
                            int endOffset = 64 - startOffset;
                            blockId = (uint)(dataArray[startLong] >> startOffset | dataArray[endLong] << endOffset);
                        }
                        data &= individualValueMask;

                        if (usePalette) {
                            // data should always be within the palette length
                            // If you're reading a power of 2 minus one (15, 31, 63, 127, etc...) that's out of bounds,
                            // you're probably reading light data instead
                            data = palette[data];
                        }

                        BlockState state = GetStateFromGlobalPaletteID(data);
                        section.SetState(x, y, z, state);
                    }
                }
            }

            for (int y = 0; y < SECTION_HEIGHT; y++) {
                for (int z = 0; z < SECTION_WIDTH; z++) {
                    for (int x = 0; x < SECTION_WIDTH; x += 2) {
                        // Note: x += 2 above; we read 2 values along x each time
                        byte value = ReadByte(data);

                        section.SetBlockLight(x, y, z, value & 0xF);
                        section.SetBlockLight(x + 1, y, z, (value >> 4) & 0xF);
                    }
                }
            }

            if (currentDimension.HasSkylight()) { // IE, current dimension is overworld / 0
                for (int y = 0; y < SECTION_HEIGHT; y++) {
                    for (int z = 0; z < SECTION_WIDTH; z++) {
                        for (int x = 0; x < SECTION_WIDTH; x += 2) {
                            // Note: x += 2 above; we read 2 values along x each time
                            byte value = ReadByte(data);

                            section.SetSkyLight(x, y, z, value & 0xF);
                            section.SetSkyLight(x + 1, y, z, (value >> 4) & 0xF);
                        }
                    }
                }
            }

            // May replace an existing section or a null one
            chunk.Sections[SectionY] = section;
        }
    }

    for (int z = 0; z < SECTION_WIDTH; z++) {
        for (int x = 0; x < SECTION_WIDTH; x++) {
            chunk.SetBiome(x, z, ReadByte(data));
        }
    }
}

// Value should already have gone through the section palette
private BlockState GetStateFromGlobalPaletteID(uint value) {
    // This method is subject to change in future MC versions

    byte metadata = data & 0xF;
    uint id = data >> 4;

    return BlockState.ForIDAndMeta(id, metadata);
}

Serializing

Serializing the packet is more complicated, because of the palette. It is easy to implement with the full bits per block value; implementing it with a compacting palette is much harder since algorithms to generate and resize the palette must be written. As such, this example does not generate a palette. The palette is a good performance improvement (as it can significantly reduce the amount of data sent), but managing that is much harder and there are a variety of ways of implementing it.

Also note that this implementation doesn't handle situations where full is false (ie, making a large change to one section); it's only good for serializing a full chunk.

public void WriteChunkDataPacket(Chunk chunk, Buffer data) {
    WriteInt(data, chunk.GetX());
    WriteInt(data, chunk.GetZ());
    WriteBool(true);  // Full

    int mask = 0;
    Buffer columnBuffer = new Buffer();
    for (int sectionY = 0; sectionY < CHUNK_HEIGHT / SECTION_HEIGHT; y++) {
        if (!chunk.IsSectionEmpty(sectionY)) {
            mask |= (1 << chunkY);  // Set that bit to true in the mask
            WriteChunkSection(chunk.Sections[sectionY], columnBuffer);
        }
    }
    for (int z = 0; z < SECTION_WIDTH; z++) {
        for (int x = 0; x < SECTION_WIDTH; x++) {
            WriteByte(columnBuffer, chunk.GetBiome(x, z));  // Use 127 for 'void' if your server doesn't support biomes
        }
    }

    WriteVarInt(data, mask);
    WriteVarInt(data, columnBuffer.Size);
    WriteByteArray(data, columnBuffer);

    // If you don't support block entities yet, use 0
    // If you need to implement it by sending block entities later with the update block entity packet,
    // do it that way and send 0 as well.  (Note that 1.10.1 (not 1.10 or 1.10.2) will not accept that)

    WriteVarInt(data, chunk.BlockEntities.Length);
    foreach (CompoundTag tag in chunk.BlockEntities) {
        WriteCompoundTag(data, tag);
    }
}

private void WriteChunkSection(ChunkSection section, Buffer data) {
    byte bitsPerBlock = FULL_SIZE_BITS_PER_BLOCK;  // 13

    WriteVarInt(data, 0);  // Palette size is 0

    // A bitmask that contains bitsPerBlock set bits
    uint individualValueMask = (uint)((1 << bitsPerBlock) - 1);

    UInt64 workLong;
    int currentLong = 0;

    for (int y = 0; y < SECTION_HEIGHT; y++) {
        for (int z = 0; z < SECTION_WIDTH; z++) {
            for (int x = 0; x < SECTION_WIDTH; x++) {
                int blockNumber = (((blockY * SECTION_HEIGHT) + blockZ) * SECTION_WIDTH) + blockX;
                int startLong = (blockNumber * bitsPerBlock) / 64;
                int startOffset = (blockNumber * bitsPerBlock) % 64;
                int endLong = ((blockNumber + 1) * bitsPerBlock - 1) / 64;

                if (startLong != currentLong) {
                    // We've finished one long at the border.  Write it and start another.
                    WriteUInt64(data, workLong);
                    workLong = 0;
                    currentLong = startLong;
                }

                BlockState state = section.GetState(x, y, z);

                uint value = GetGlobalPaletteIDFromState(state);
                value &= individualValueMask;

                workLong |= (Value << startOffset);

                if (startLong != endLong) {
                    // We've finished part of one long; write it and start the next.
                    Packet.WriteBEUInt64(workLong);
                    currentLong = endLong;

                    workLong = (value >> (64 - startOffset));
                }
            }
        }
    }

    for (int y = 0; y < SECTION_HEIGHT; y++) {
        for (int z = 0; z < SECTION_WIDTH; z++) {
            for (int x = 0; x < SECTION_WIDTH; x += 2) {
                // Note: x += 2 above; we read 2 values along x each time
                byte value = section.GetBlockLight(x, y, z) | (section.GetBlockLight(x + 1, y, z) << 4);
                WriteByte(data, value);
            }
        }
    }

    if (currentDimension.HasSkylight()) { // IE, current dimension is overworld / 0
        for (int y = 0; y < SECTION_HEIGHT; y++) {
            for (int z = 0; z < SECTION_WIDTH; z++) {
                for (int x = 0; x < SECTION_WIDTH; x += 2) {
                    // Note: x += 2 above; we read 2 values along x each time
                    byte value = section.GetSkyLight(x, y, z) | (section.GetSkyLight(x + 1, y, z) << 4);
                    WriteByte(data, value);
                }
            }
        }
    }
}

private uint GetGlobalPaletteIDFromState(BlockState state) {
    // NOTE: This method will probably change in new versions
    byte metadata = state.getMetadata();
    uint id = section.GetBlockID(x, y, z);

    return id << 4 | metadata;
}

Full implementations

Old format

The following implement the previous (before 1.9) format: