Region Files

From wiki.vg
Revision as of 20:54, 13 August 2019 by KurtThiemann (talk | contribs) (fixed location table python example (again))
Jump to navigation Jump to search

The latest version of the Minecraft game uses a new, more efficient (but still poor) format for storing a worlds chunk data, based on work by Scaevolous.

Background & Reasoning

Prior to this system, each chunk (a 16x16x128 area of a level) was contained within its own file on disk, two folders down. The first folder represented the X region, and the second the Y region. This (very obviously) has several issues associated with it, mainly:

  • Excessive filesystem activity,
  • Poor fragmentation,
  • Unneeded filesystem overhead (a 90-byte compressed chunk of mostly dirt and air will still consume at a minimum one sector, generally more),
  • Generates large enough directory trees on large worlds to crash the Windows Explorer,
  • A large multiplayer world would eventually run out of handles (by default, only 1024 handles can be opened by a single process), causing the server to stall, crash, or otherwise fail (the particular behaviour has changed between versions)

To improve performance, Scaevolous created an unofficial mod called McRegion. This mod groups chunks into regions, each of which contains a group of chunks 32 by 32. In theory, this improves a few things:

  • Reduces filesystem activity & overhead considerably,
  • Improves fragmentation (not directly, but the OS or underlying drive can intelligently improve it),
  • Far, far fewer handles need to be open at any given time, allowing more of the level to be loaded at any one time.
  • Reduces the number of files created considerably (no more crashing Explorer)

To define some terminology,

Chunk A single section of the level that is 16 by 16 blocks, and 128 blocks high.
Region A single grouping of chunks in a 32 by 32 area
Level A (realistically, but not technically) unlimited collection of chunks stored in regions that make up a single playable world.

Specification

File Names

In the official client and server, each region is stored within the folder region within the level folder. The naming scheme for region files is very simple. For example, given the filename r.8.20.mcr, r is a meaningless prefix found on all region files, 8 is the X coordinate of the region, and 20 is the Z coordinate of the region.

The region X and Z can be found simply by dividing the chunks X or Z by 32, and then flooring the result. Here's an example in python, where given a chunk at <81, -39>, the region filename can be found:

>>> import math
>>> region_xz = lambda x,z: (math.floor(x / 32), math.floor(z / 32))
>>> region_xz(81,-39)
(2.0, -2.0)

So, this chunk would reside in r.2.-2.mcr.

Here's an example in java, where given a chunk at <-152, 15>:

int x = -152;
int z = 15;
String regionFileName = "r." + (int) Math.floor(x / 32) + "." + (int) Math.floor(z / 32) + ".mcr";

So, this chunk would reside in r.-4.0.mcr.

Simple to use tool for calcualting region filenames made in java is available for download here.

Structure

Every region file begins with two 4KiB tables (each compose of 1024 4-byte integers), with the first table containing the location of each chunk, and the second table the last-modified timestamp of that chunk.

Location Table

The location table is composed of 1024 entries, each 4 bytes long. The first three bytes indicate the offset in the file where the chunk may be found, and the last byte is the size of the file. When multiplied by 4096, this gives you the exact start of the chunk in bytes, and its end in bytes. If you know the X and Z of the chunk you're looking for, you can find its location entry using the formula: ((x % 32) + (z % 32) * 32) * 4. If the offset and size are both 0, then the chunk at that location hasn't been generated yet.

As an example, this is the location entry for the first chunk in a region:

Offset (3 bytes) Size (1 byte)
Decoded 2 1
On Disk (in hex) 00 00 02 01

Here's an example in Python to decode a location entry:

def chunk_location(l):
    """
    Returns the offset (in bytes) and size (in bytes) of the chuck for
    the given location.
    """
    offset = ((l >> 8) & 0xFFFFFF)
    size = l & 0xFF
 
    return (offset * 4096, size * 4096)

Timestamp Table

The timestamp table is composed of 1024 timestamps, each a 4-byte integer. This is the time that the chunk was last modified, and is in the same order as the location table. Thus, the chunk whose location was at location_table[15] has a timestamp at timestamp_table[15].

Chunk Header

Each chunk has an additional 5-byte header, followed by the actual chunk which is stored as NBT.

Length (in bytes) Compression Scheme
Decoded 528 2
On disk (in hex) 00 00 02 10 02

There are two possible values for the compression scheme. If it is a 1, the following chunk is compressed using gzip. If it's a 2, the following chunk is compressed with zlib. In practice, you will only ever encounter chunks compressed using zlib.