Difference between revisions of "NBT"

From wiki.vg
Jump to navigation Jump to search
m (Categorizing)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''The following is a slightly modified mirror of the no longer existent NBT.txt '''
+
NBT (Named Binary Tag) is a tag-based binary format designed to carry binary data accompanied by additional data.
  
----
+
An NBT file consists of a single root TAG_Compound and is compressed with GZip.
  
NBT (Named Binary Tag) is a tag based binary format designed to carry large amounts of binary data with smaller amounts of additional data.
+
A single named tag is structured as follows:
An NBT file consists of a single GZIPped Named Tag of type TAG_Compound.
 
  
A Named Tag has the following format:
+
{| class="wikitable" style="margin: 1em auto 1em auto;"
 +
! scope="col" | Tag Type
 +
! scope="col" | Name Length
 +
! scope="col" | Name
 +
! scope="col" | Payload
 +
|-
 +
| 1 byte || 2 bytes, 16 bit integer, signed, big endian || UTF-8 Encoded String || Depends on type of tag
 +
|}
  
    byte tagType
+
There are 10 different types of tags:
    TAG_String name
 
    [payload]
 
   
 
The tagType is a single byte defining the contents of the payload of the tag.
 
  
The name is a descriptive name, and can be anything (eg "cat", "banana", "Hello World!"). It has nothing to do with the tagType.
+
{| class="wikitable" style="margin: 1em auto 1em auto;"
The purpose for this name is to name tags so parsing is easier and can be made to only look for certain recognized tag names.
+
! scope="col" | Tag Type Value
Exception: If tagType is TAG_End, the name is skipped and assumed to be "".
+
! scope="col" | Name
 +
! scope="col" | Description
 +
! scope="col" | Comments
 +
|-
 +
| 0 || TAG_End || Marks the end of a TAG_Compound. || This tag occurs to end a previously opened TAG_Compound.  This tag has no name, and no payload.
 +
|-
 +
| 1 || TAG_Byte || Payload is a single signed byte (8 bits). ||
 +
|-
 +
| 2 || TAG_Short || Payload is a signed 16 bit integer (big endian). ||
 +
|-
 +
| 3 || TAG_Int || Payload is a signed 32 bit integer (big endian). ||
 +
|-
 +
| 4 || TAG_Long || Payload is a singed 64 bit integer (big endian). ||
 +
|-
 +
| 5 || TAG_Float || Payload is a 32 bit floating point value (big endian, IEEE 754) ||
 +
|-
 +
| 6 || TAG_Double || Payload is a 64 bit floating point value (big endian, IEEE 754) ||
 +
|-
 +
| 7 || TAG_Byte_Array || An array of bytes. || Payload consists of four bytes which form a signed 32 bit integer (big endian) which specifies the length of the remainder of the payload.
 +
|-
 +
| 8 || TAG_String || A string. || Payload consists of two bytes which form a signed 16 bit integer (big endian) which specifies the length of the remainder of the payload.  The remainder of the payload is a UTF-8 encoded string.
 +
|-
 +
| 9 || TAG_List || A list of tags. || Payload consists of one byte which specifies the type of tags found in the list, followed by four bytes which form a signed 32 bit integer (big endian) which specifies the number of tags which form the remainder of the payload.  Tags in the list do not specify their type (i.e. they're missing the first byte) and do not have a name (i.e. they do not have two bytes for the length of their name, nor the bytes which makes up the name).
 +
|-
 +
| 10 || TAG_Compound || The root of nested tags. || Payload consists of sequential named tags.  This sequence of named tags ends when a TAG_End is encountered.  Note that TAG_Compounds can be nested within themselves, so the next TAG_End is not necessarily the end of this TAG_Compound.  Recursion advised.
 +
|}
  
The [payload] varies by tagType.
+
----
  
Note that ONLY Named Tags carry the name and tagType data. Explicitly identified Tags (such as TAG_String above) only contains the payload.
+
Note that none of the examples below are GZip'd
  
The tag types and respective payloads are:
+
----
 
 
    TYPE: 0  NAME: TAG_End
 
    Payload: None.
 
    Note:    This tag is used to mark the end of a list.
 
            Cannot be named! If type 0 appears where a Named Tag is expected, the name is assumed to be "".
 
            (In other words, this Tag is always just a single 0 byte when named, and nothing in all other cases)
 
 
 
    TYPE: 1  NAME: TAG_Byte
 
    Payload: A single signed byte (8 bits)
 
 
 
    TYPE: 2  NAME: TAG_Short
 
    Payload: A signed short (16 bits, big endian)
 
 
 
    TYPE: 3  NAME: TAG_Int
 
    Payload: A signed short (32 bits, big endian)
 
 
 
    TYPE: 4  NAME: TAG_Long
 
    Payload: A signed long (64 bits, big endian)
 
 
 
    TYPE: 5  NAME: TAG_Float
 
    Payload: A floating point value (32 bits, big endian, IEEE 754-2008, binary32)
 
 
 
    TYPE: 6  NAME: TAG_Double
 
    Payload: A floating point value (64 bits, big endian, IEEE 754-2008, binary64)
 
  
    TYPE: 7  NAME: TAG_Byte_Array
+
Decoding example: http://mc.kev009.com/nbt/test.nbt
    Payload: Int length (NOT TAGGED)
 
            An array of bytes of unspecified format. The length of this array is <length> bytes
 
  
    TYPE: 8  NAME: TAG_String
+
The first byte of this file is 10. This means that the first tag is a TAG_Compound (which is to be expected).
    Payload: Short length (NOT TAGGED)
 
            An array of bytes defining a string in UTF-8 format. The length of this array is <length> bytes
 
  
    TYPE: 9  NAME: TAG_List
+
We read two more bytes to get the length of the name of this tag. The next two bytes are 0 and 11, meaning the name is 11 bytes long.  On a little endian system be sure to reverse them before creating a 16 bit signed integer with them.
    Payload: Byte tagId (NOT TAGGED)
 
            Int length (NOT TAGGED)
 
            A sequential list of Tags (not Named Tags), of type <typeId>. The length of this array is <length> Tags
 
    Notes:  All tags share the same type.
 
  
    TYPE: 10 NAME: TAG_Compound
+
We read the next 11 bytes and decode them as per UTF-8. The resulting string is "hello world".
    Payload: A sequential list of Named Tags. This array keeps going until a TAG_End is found.
 
            TAG_End end
 
    Notes:  If there's a nested TAG_Compound within this tag, that one will also have a TAG_End, so simply reading until the next TAG_End will not work.
 
            The names of the named tags have to be unique within each TAG_Compound
 
            The order of the tags is not guaranteed.
 
           
 
           
 
  
 +
Next we move onto the payload, which, since this is a TAG_Compound, is going to be more tags until we reach the TAG_End which corresponds to our TAG_Compound.
  
 +
Therefore, we read the next byte to determine the type of the first tag in the TAG_Compound.  The next byte is an 8 -- TAG_String.
  
Decoding example:
+
The next two bytes tell us that the length of the name of this string is 4, and the next 4 bytes UTF-8 decode into "name".
(Use http://www.minecraft.net/docs/test.nbt to test your implementation)
 
  
 +
Next we read two more bytes to find the name of the string which is the payload of this tag, these two bytes are 0 and 9.  The next 9 bytes UTF-8 decode into "Bananrama".
  
First we start by reading a Named Tag.
+
We read the next byte to get the type of the next named tag, and find that it is 0 -- TAG_End.
After unzipping the stream, the first byte is a 10. That means the tag is a TAG_Compound (as expected by the specification).
 
  
The next two bytes are 0 and 11, meaning the name string consists of 11 UTF-8 characters. In this case, they happen to be "hello world".
+
Therefore, we are done.
That means our root tag is named "hello world". We can now move on to the payload.
 
  
From the specification, we see that TAG_Compound consists of a series of Named Tags, so we read another byte to find the tagType.
+
The result:
It happens to be an 8. The name is 4 letters long, and happens to be "name". Type 8 is TAG_String, meaning we read another two bytes to get the length,
 
then read that many bytes to get the contents. In this case, it's "Bananrama".
 
 
 
So now we know the TAG_Compound contains a TAG_String named "name" with the content "Bananrama"
 
 
 
We move on to reading the next Named Tag, and get a 0. This is TAG_End, which always has an implied name of "". That means that the list of entries
 
in the TAG_Compound is over, and indeed all of the NBT file.
 
 
 
So we ended up with this:
 
  
 
     TAG_Compound("hello world"): 1 entries
 
     TAG_Compound("hello world"): 1 entries
Line 99: Line 78:
 
     }
 
     }
  
 +
For a slightly longer test, use http://mc.kev009.com/nbt/bigtest.nbt
  
 
For a slightly longer test, download http://www.minecraft.net/docs/bigtest.nbt
 
 
You should end up with this:
 
You should end up with this:
  
Line 150: Line 128:
 
     }
 
     }
  
[[Category:Minecraft Alpha]]
+
[[Category:Minecraft Beta]]
 
[[Category:Minecraft Classic]]
 
[[Category:Minecraft Classic]]
 
[[Category:File Formats]]
 
[[Category:File Formats]]

Revision as of 22:48, 29 November 2011

NBT (Named Binary Tag) is a tag-based binary format designed to carry binary data accompanied by additional data.

An NBT file consists of a single root TAG_Compound and is compressed with GZip.

A single named tag is structured as follows:

Tag Type Name Length Name Payload
1 byte 2 bytes, 16 bit integer, signed, big endian UTF-8 Encoded String Depends on type of tag

There are 10 different types of tags:

Tag Type Value Name Description Comments
0 TAG_End Marks the end of a TAG_Compound. This tag occurs to end a previously opened TAG_Compound. This tag has no name, and no payload.
1 TAG_Byte Payload is a single signed byte (8 bits).
2 TAG_Short Payload is a signed 16 bit integer (big endian).
3 TAG_Int Payload is a signed 32 bit integer (big endian).
4 TAG_Long Payload is a singed 64 bit integer (big endian).
5 TAG_Float Payload is a 32 bit floating point value (big endian, IEEE 754)
6 TAG_Double Payload is a 64 bit floating point value (big endian, IEEE 754)
7 TAG_Byte_Array An array of bytes. Payload consists of four bytes which form a signed 32 bit integer (big endian) which specifies the length of the remainder of the payload.
8 TAG_String A string. Payload consists of two bytes which form a signed 16 bit integer (big endian) which specifies the length of the remainder of the payload. The remainder of the payload is a UTF-8 encoded string.
9 TAG_List A list of tags. Payload consists of one byte which specifies the type of tags found in the list, followed by four bytes which form a signed 32 bit integer (big endian) which specifies the number of tags which form the remainder of the payload. Tags in the list do not specify their type (i.e. they're missing the first byte) and do not have a name (i.e. they do not have two bytes for the length of their name, nor the bytes which makes up the name).
10 TAG_Compound The root of nested tags. Payload consists of sequential named tags. This sequence of named tags ends when a TAG_End is encountered. Note that TAG_Compounds can be nested within themselves, so the next TAG_End is not necessarily the end of this TAG_Compound. Recursion advised.

Note that none of the examples below are GZip'd


Decoding example: http://mc.kev009.com/nbt/test.nbt

The first byte of this file is 10. This means that the first tag is a TAG_Compound (which is to be expected).

We read two more bytes to get the length of the name of this tag. The next two bytes are 0 and 11, meaning the name is 11 bytes long. On a little endian system be sure to reverse them before creating a 16 bit signed integer with them.

We read the next 11 bytes and decode them as per UTF-8. The resulting string is "hello world".

Next we move onto the payload, which, since this is a TAG_Compound, is going to be more tags until we reach the TAG_End which corresponds to our TAG_Compound.

Therefore, we read the next byte to determine the type of the first tag in the TAG_Compound. The next byte is an 8 -- TAG_String.

The next two bytes tell us that the length of the name of this string is 4, and the next 4 bytes UTF-8 decode into "name".

Next we read two more bytes to find the name of the string which is the payload of this tag, these two bytes are 0 and 9. The next 9 bytes UTF-8 decode into "Bananrama".

We read the next byte to get the type of the next named tag, and find that it is 0 -- TAG_End.

Therefore, we are done.

The result:

   TAG_Compound("hello world"): 1 entries
   {
       TAG_String("name"): Bananrama
   }

For a slightly longer test, use http://mc.kev009.com/nbt/bigtest.nbt

You should end up with this:

   TAG_Compound("Level"): 11 entries
   {
      TAG_Short("shortTest"): 32767
      TAG_Long("longTest"): 9223372036854775807
      TAG_Float("floatTest"): 0.49823147
      TAG_String("stringTest"): HELLO WORLD THIS IS A TEST STRING !
      TAG_Int("intTest"): 2147483647
      TAG_Compound("nested compound test"): 2 entries
      {
         TAG_Compound("ham"): 2 entries
         {
            TAG_String("name"): Hampus
            TAG_Float("value"): 0.75
         }
         TAG_Compound("egg"): 2 entries
         {
            TAG_String("name"): Eggbert
            TAG_Float("value"): 0.5
         }
      }
      TAG_List("listTest (long)"): 5 entries of type TAG_Long
      {
         TAG_Long: 11
         TAG_Long: 12
         TAG_Long: 13
         TAG_Long: 14
         TAG_Long: 15
      }
      TAG_Byte("byteTest"): 127
      TAG_List("listTest (compound)"): 2 entries of type TAG_Compound
      {
         TAG_Compound: 2 entries
         {
            TAG_String("name"): Compound tag #0
            TAG_Long("created-on"): 1264099775885
         }
         TAG_Compound: 2 entries
         {
            TAG_String("name"): Compound tag #1
            TAG_Long("created-on"): 1264099775885
         }
      }
      TAG_Byte_Array("byteArrayTest (the first 1000 values of (n*n*255+n*7)%100, starting with n=0 (0, 62, 34, 16, 8, ...))"): [1000 bytes]
      TAG_Double("doubleTest"): 0.4931287132182315
   }