Bit packets

From GPWiki

Files:GUITutorial_warn.gif The Game Programming Wiki has moved! Files:GUITutorial_warn.gif

The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server.

However, the GPWiki forums are still active! Come say hello.

In networking, sometimes having your smallest unit as a byte does not quite cut it. A single boolean is going to take 8 bits unless you pack it into a byte, in which case you better have 8 constant booleans or you will still end up with wasted bits. For most cases, working on a bit level for packets is going overboard and you should check out binary packets instead. For the rest of you, I welcome you to the world of bit-level packets.

Contents

Before you start

Required knowledge

This article requires that you understand binary well enough to perform the basic binary operations (OR, XOR, AND, NOT) along with the ability to work with binary on at least a basic level.

Phrasing overview

Not every system works the same, nor does every programming language use the same naming convention. This article will, for the most part, deal with only naming conventions found in C#.

Byte
A byte will be assumed as an octet (consists of 8 bits).
Value types
An octet will be considered a byte, a 16-bit integer a short, a 32-bit integer a int, a 64-bit integer a long, a 32-bit floating-point integer a float, a 64-bit floating-point integer a double, a true/false statement a boolean where 0=false and 1=true, and a sequence of characters a string.
u- prefix
A u- prefix on a variable will stand for unsigned (positive values only). For example, a uint is a 32-bit unsigned integer, ranging on the values of 0 to (2^32)-1, inclusive.

Layout

Unused bits

Normally in binary packets, using a byte suffices since most variable types consist of a number of bits divisible by a byte. A 16-bit integer (short, integer, int16) consists of two bytes, 32-bit integers (int, long, int32) and floating-point integers (float, single) consist of 4 bytes, and so on. Strings can often be represented in a one or two byte header containing the length, followed by a byte for each character (another approach is using a predefined terminating byte or sequence of bytes). This works well and all but you will end up using more bits than you need. The left-over bits we call wasted bits. This can be commonly found in two forms:

  • Unused bits: These are simply bits that just never get used. You won't have any of these if you only use variable types that can fit into a byte.
  • Unused bit sequences: Your client and server both, for example, store a character's position as a ushort so you send a ushort even though you know the map is only 500x500, making values 501-65535 known by both sides to never be used. Some bits will never be used, while some bit sequences (in our example, those to make the values of 501-512) will simply never even be written.

Unused bits are the easiest to fix as they require just using less bits. Unused bit sequences, on the other hand, require more work both to remove and for the processor to calculate the alternative. Because of this, emphasis in this article will be on the unused bits and not sequences.

Our theoretical network

We are going to assume we have a single server with multiple clients connected. To make things easy, just assume its TCP so we don't have to worry about dropped or out of order packets. One thing TCP will require you to take into consideration is packet fragmentation and combining, but we will completely ignore that in this article since the same rules apply no matter what kind of information the packets have since in the end, packets will be rounded up to the nearest byte.

Our theoretical packet

We are not going to concentrate on anything but the packet content (data), but do keep in mind that there is a lot of overhead to networking that is not exposed to you most of the time. TCP has a 20 byte overhead, UDP has 8, IPv4 has 20, and IPv6 has 40. This means on packet on a TCP/IPv4 connection is going to take 28 bytes + packet data size. If you shrink your packets in half, you are not using half as much bandwidth, and sending multiple packets gets very expensive very fast.

Binary structure

Now that all of that is out of the way, lets get to the fun part. In a byte, there are 8 bits. When reading this out, we often read it in a big-endian format, which means "the big end is first". In other words, we should know that:

Byte1: 0000 0010 = 2
Byte2: 0010 0000 = 32

This is all simple and easy (at least I hope it is for you if you're reading this!), but things get a little messier with multiple bytes. If you have a ushort composed of the two bytes shown above, you can have either:

Assume big-endian:
ushort1: Byte1 Byte2 = 0000 0010 0010 0000 = 1024 + 32 = 1056
ushort2: Byte2 Byte1 = 0010 0000 0000 0010 = 1073741824 + 2 = 1073741826

All we did is change the order in which we view the bytes and the values are nowhere near the same. In fact, without specifying what kind of endianness we are working in, we can only assume the value of the two bytes together as a single value type.

What we are going to use is a big-endian format and treat all values as a whole. No longer will we look at a ushort being 2 bytes. Instead, we will look at it at being 16 bits. This is going to make things a lot less complicated later on when we get into packing variables with bit counts not equal to 8, and thats the whole point of this.

Bit I/O theory

First, lets look at how we will be reading and writing the bits. Since we know it is going to just be a big stream of 0's and 1's, and each value type is going to be treated as a whole and in big-endian format, this makes it look very easy on paper. Lets say we wanted to write 4 bits, "0110" to our output:

outputBits = 0110

Simple enough. Now what if we wanted to write 3 more bits, "111"? All we have to do is append it!

outputBits = 0110111

Alright, no problems there. Just like in English, if you want to add the word "cat" to a sentence, "i has a ", you just add the new value, "cat", giving you "i has a cat". But looking at our outputBits above, when we appended the 3 bits, it not only turned into a value that, as a decimal, means nothing to us, but we can not split it up into the original values without knowing the bit size. This is hardly going to be a problem. In the example used earlier where we know the position is going to be on a scale of (0,0) to (500,500), the client and server can both agree on the size of a position value being 9 bits (0 to 511, inclusive) instead of assuming the highest value allowed, which in our case was 16 bits.

When reading this, the concept is the same. For visual purposes, we will just erase the bits after reading them. In coding, you will probably want to use a pointer defining which bit you are currently on. We will also, when reading, reset the position of the reading pointer to the very first bit and read to the write. This way we will read in the exact same order as we write.

outputBits = 0110111
read 4: result = 0110
outputBits = 111
read 3: result = 111

Bit I/O programming

Unfortunately, programming this is the hard part. Not just programming it, but programming it to work efficiently. Theres no variable types that we can use to just store as many bits as we want. The closest we have would be a string, but that would end up slow and unnecessary. So instead, we are going to have to use a combination of pointers and arrays to get things done.

byte[] buffer
A byte array containing the resulting combination of all the bits. Index 0 will be the first used, progressing through the array each time the byte is full.
int bitPointer
Contains which bit we are currently on. This value can range between 7 (most significant bit, one farthest to the left) and 0 (least significant bit, one farthest to the right).
int bufferPointer
Contains which byte in the buffer we are currently on.

To go with how things were explained above, we will start at the most significant bit (7) and write down to the least significant bit (0), the move on to the next bit. To get things started, lets work with just a single bit. The only hard part here is keeping track where we are and making sure we have enough bits allocated.

Booleans

So far, we have this:

class Program
{
    static void Main(string[] args)
    {
        BitStream b = new BitStream();
        
        // Fill up the buffer with some random values
        for (int i = 0; i < 2; i++) b.Write(true);
        for (int i = 0; i < 3; i++) b.Write(false);
        for (int i = 0; i < 15; i++) b.Write(true);
        for (int i = 0; i < 2; i++) b.Write(false);
        for (int i = 0; i < 35; i++) b.Write(true);
        b.Write(false);
        
        // Reset the pointer so we can read from the beginning
        b.ResetPointer();
 
        // Read the values
        for (int i = 0; i < 58; i++)
            Console.WriteLine("{0}: {1}", i, b.ReadBool());
 
        Console.ReadLine();
    }
}
 
class BitStream
{
    const int AllocLength = 8; // Number of bits we will allocate at a time
    byte[] buffer; // Contains the bits we are working with
    int bitPointer; // Which bit we are currently on
    int bufferPointer; // Which byte we are currently on
 
    public BitStream()
    {
        buffer = new byte[AllocLength]; // Create the buffer with the initial size
        ResetPointer(); // Move to the first bit
    }
 
    public void ResetPointer()
    {
        // Reset the pointer position back to the first bit
        bitPointer = 7; // Start at the highest bit
        bufferPointer = 0; // Start at the first byte in the buffer
    }
 
    void MoveToNextByte()
    {
        bitPointer = 7; // Reset the bit position
        bufferPointer++; // Increase the buffer position
 
        // Resize the buffer if needed
        if (bufferPointer >= buffer.Length)
            Array.Resize<byte>(ref buffer, buffer.Length + AllocLength);
    }
 
    public void Write(bool value)
    {
        // Validate the pointer position, moving to the next byte if needed
        if (bitPointer == -1)
            MoveToNextByte();
 
        // Write a 1 for true or 0 for false
        int bit;
        if (value) bit = 1; else bit = 0; // Convert the bool to a number
        buffer[bufferPointer] |= (byte)(bit << bitPointer); // Set the bit
        bitPointer--; // Increase the bit pointer
    }
 
    public bool ReadBool()
    {
        // Validate the pointer position, moving to the next byte if needed
        if (bitPointer == -1)
            MoveToNextByte();
 
        // Read a 1 for true or 0 for false
        int bit;
        if ((buffer[bufferPointer] & (1 << bitPointer)) > 0) bit = 1; else bit = 0;
        bitPointer--; // Move the pointer to the next bit
        return (bit == 1); // Return true if bit is 1, else return false
    }
}

A lot of this should be quite basic so far. The two lines that might need emphasis are the reading and writing lines themselves.

buffer[bufferPointer] |= (byte)(bit << bitPointer); // Set the bit 

On the left side of the equation, we have our current byte that we are working on. On the right, we first take the bit value (will be either 0 or 1) and move it over to the same bit position we are currently on by shifting it over bitPointer bits. We must convert this to a byte to apply the value to the buffer. When we have these two values, we can OR them together as stated by the |= line. As you should know, this means that we will set the bit if it is not already set. A bit will never end up being set twice since we move the pointer over every operation, so we are basically saying "keep it as 0, or set it as 1". As you can see, actually setting the bit is useless when it is 0 since if you OR a 0 onto a value, the result isn't going to ever change.

Lets assume our buffer contains the bits "11000", which it will after the first two bit write loops in our example. The next value is going to be a 1, so how this will look is:

buffer[bufferPointer] = 11000 000 // These last 3 bits that I separated have yet to be assigned
bit = 1
bitPointer = 2
* 11000000 OR (byte)(1 << 2)
* 11000000 OR (byte)100
* 11000000 OR 00000100
* 11000100

Nothing too hard, right? Good! Now lets move on to the read.

if ((buffer[bufferPointer] & (1 << bitPointer)) > 0) bit = 1; else bit = 0;

The buffer[bufferPointer] part you should already know. You should also already know that (1 << bitPointer) is how we create a value with just the one bit set - the bit we are working with. Using the AND operator on this will basically have every bit but the one we are working with set as 0. This leaves us with just this one bit that can possibly be non-zero, the one bit we are working with. Anything non-zero will mean the bit was set, and a zero will mean the bit wasn't set.

Optimizations can easily be made from here. Along with that, you can easily add support for just about any number of bits. Unfortunately, it will be far from optimized. But lets give it a shot anyways.

Unsigned ints

To add support for some other value types, we will just have to specify the value and the number of bits of that value to use. When we specify the number of bits we want to use, we are just going to use the number of bits starting from the right-most bit and counting to the left. We will start with just unsigned values for now.

public void Write(uint value, int numBits)
    {
        // Loop through the bits of the value, writing them one by one
        for (int i = 0; i < numBits; i++)
            Write((value & (1 << i)) > 0);
    }
 
    public uint ReadUInt(int numBits)
    {
        // Loop through the bits of the value, reading them one by one and placing them in the return variable
        uint ret = 0;
        for (int i = 0; i < numBits; i++)
            if (ReadBool())
                ret |= (uint)(1 << i);
        return ret;
    }

This will write the first numBits number of bits from the variable value. It is horribly slow and inefficient, but it gives you the idea on how it is done.

Signed ints

To add a sign, all it takes is a designated bit stating the value of the sign.

public void Write(int value, int numBits)
    {
        // Apply the sign 
        if (value < 0)
        {
            Write(true);
            value *= -1; // Convert to a positive value
        }
        else
            Write(false);
 
        // Write the rest of the bits. Since we already used one for the sign, we write numBits - 1.
        Write((uint)value, numBits - 1);
    }
 
    public int ReadInt(int numBits)
    {
        // Read the sign
        int sign;
        if (ReadBool()) sign = -1; else sign = 1;
 
        // Read the value, then multiply by the sign to support negatives
        return (int)(ReadUInt(numBits - 1) * sign);
    }

Again, very simple... and very inefficient. The first bit is reserved for the sign, then the rest of the bits are for the value.

Expanding

Theres a lot more to go over still - batch reading and writing bits (more than one bit at a time), writing whole variables at once when possible, support for signed variables and floating-point variables, strings, etc. Hopefully this is enough to get you started until this article can be expanded. :)

Existing BitStreams

The following links contain BitStreams that you can use. It is not suggested that you create your own from scratch unless you really know what you are doing (in which case you probably wouldn't be reading this) and have a lot of time to dedicate towards it. No point in reinventing the wheel if you don't need to!

See Also