Region Files

The latest version of the Minecraft game uses a new, more efficient (but still poor) format for storing a worlds chunk data, based on work by Scaevolous.

Background & Reasoning
Prior to this system, each chunk (a 16x16x128 area of a level) was contained within its own file on disk, two folders down. The first folder represented the  region, and the second the   region. This (very obviously) has several issues associated with it, mainly:

To improve performance, Scaevolous created an unofficial mod called McRegion. This mod groups chunks into regions, each of which contains a group of chunks 32 by 32. In theory, this improves a few things: To define some terminology,
 * Excessive filesystem activity,
 * Poor fragmentation,
 * Unneeded filesystem overhead (a 90-byte compressed chunk of mostly dirt and air will still consume at a minimum one sector, generally more),
 * Generates large enough directory trees on large worlds to crash the Windows Explorer,
 * A large multiplayer world would eventually run out of handles (by default, only 1024 handles can be opened by a single process), causing the server to stall, crash, or otherwise fail (the particular behaviour has changed between versions)
 * Reduces filesystem activity &amp; overhead considerably,
 * Improves fragmentation (not directly, but the OS or underlying drive can intelligently improve it),
 * Far, far fewer handles need to be open at any given time, allowing more of the level to be loaded at any one time.
 * Reduces the number of files created considerably (no more crashing Explorer)

File Names
In the official client and server, each region is stored within the folder  within the level folder. The naming scheme for region files is very simple. For example, given the filename,   is a meaningless prefix found on all region files,   is the   coordinate of the region, and   is the   coordinate of the region. The region  and   can be found simply by dividing the chunks   or   by 32, and then flooring the result. Here&#039;s an example in python, where given a chunk at &lt;81, -39&gt;, the region filename can be found:

&gt;&gt;&gt; import math &gt;&gt;&gt; region_xz = lambda x,z: &#40;math.floor&#40;x / 32&#41;, math.floor&#40;z / 32&#41;&#41; &gt;&gt;&gt; region_xz&#40;81,-39&#41; &#40;2.0, -2.0&#41; So, this chunk would reside in.

Here&#039;s an example in java, where given a chunk at &lt;-152, 15&gt;:

int x = -152; int z = 15; String regionFileName = "r." + (int) Math.floor(x / 32) + "." + (int) Math.floor(z / 32) + ".mcr";

So, this chunk would reside in.

Simple to use tool for calcualting region filenames made in java is available for download here.

Structure
Every region file begins with two 4KiB tables (each compose of 1024 4-byte integers), with the first table containing the location of each chunk, and the second table the last-modified timestamp of that chunk.

Location Table
The location table is composed of 1024 entries, each 4 bytes long. The first three bytes indicate the offset in the file where the chunk may be found, and the last byte is the size of the file. When multiplied by, this gives you the exact start of the chunk in bytes, and its end in bytes. If you know the  and   of the chunk you&#039;re looking for, you can find its location entry using the formula:. If the offset and size are both, then the chunk at that location hasn&#039;t been generated yet.

As an example, this is the location entry for the first chunk in a region: Here&#039;s an example in Python to decode a location entry: def chunk_location&#40;l&#41;: &quot;&quot;&quot; Returns the offset (in bytes) and size (in bytes) of the chuck for the given location. &quot;&quot;&quot; offset = &#40;&#40;l &gt;&gt; 8&#41; &amp; 0xFFFFFF&#41; size = l &amp; 0xFF return &#40;offset * 4096, size * 4096&#41;

Timestamp Table
The timestamp table is composed of 1024 timestamps, each a 4-byte integer. This is the time that the chunk was last modified, and is in the same order as the location table. Thus, the chunk whose location was at  has a timestamp at.

Chunk Header
Each chunk has an additional 5-byte header, followed by the actual chunk which is stored as NBT. There are two possible values for the compression scheme. If it is a, the following chunk is compressed using gzip. If it's a, the following chunk is compressed with zlib. In practice, you will only ever encounter chunks compressed using zlib.