Bit packets
From GPWiki
The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server. However, the GPWiki forums are still active! Come say hello. In networking, sometimes having your smallest unit as a byte does not quite cut it. A single boolean is going to take 8 bits unless you pack it into a byte, in which case you better have 8 constant booleans or you will still end up with wasted bits. For most cases, working on a bit level for packets is going overboard and you should check out binary packets instead. For the rest of you, I welcome you to the world of bit-level packets.
[edit] Before you start[edit] Required knowledgeThis article requires that you understand binary well enough to perform the basic binary operations (OR, XOR, AND, NOT) along with the ability to work with binary on at least a basic level. [edit] Phrasing overviewNot every system works the same, nor does every programming language use the same naming convention. This article will, for the most part, deal with only naming conventions found in C#.
[edit] Layout[edit] Unused bitsNormally in binary packets, using a byte suffices since most variable types consist of a number of bits divisible by a byte. A 16-bit integer (short, integer, int16) consists of two bytes, 32-bit integers (int, long, int32) and floating-point integers (float, single) consist of 4 bytes, and so on. Strings can often be represented in a one or two byte header containing the length, followed by a byte for each character (another approach is using a predefined terminating byte or sequence of bytes). This works well and all but you will end up using more bits than you need. The left-over bits we call wasted bits. This can be commonly found in two forms:
Unused bits are the easiest to fix as they require just using less bits. Unused bit sequences, on the other hand, require more work both to remove and for the processor to calculate the alternative. Because of this, emphasis in this article will be on the unused bits and not sequences. [edit] Our theoretical networkWe are going to assume we have a single server with multiple clients connected. To make things easy, just assume its TCP so we don't have to worry about dropped or out of order packets. One thing TCP will require you to take into consideration is packet fragmentation and combining, but we will completely ignore that in this article since the same rules apply no matter what kind of information the packets have since in the end, packets will be rounded up to the nearest byte. [edit] Our theoretical packetWe are not going to concentrate on anything but the packet content (data), but do keep in mind that there is a lot of overhead to networking that is not exposed to you most of the time. TCP has a 20 byte overhead, UDP has 8, IPv4 has 20, and IPv6 has 40. This means on packet on a TCP/IPv4 connection is going to take 28 bytes + packet data size. If you shrink your packets in half, you are not using half as much bandwidth, and sending multiple packets gets very expensive very fast. [edit] Binary structureNow that all of that is out of the way, lets get to the fun part. In a byte, there are 8 bits. When reading this out, we often read it in a big-endian format, which means "the big end is first". In other words, we should know that: Byte1: 0000 0010 = 2 Byte2: 0010 0000 = 32 This is all simple and easy (at least I hope it is for you if you're reading this!), but things get a little messier with multiple bytes. If you have a ushort composed of the two bytes shown above, you can have either: Assume big-endian: ushort1: Byte1 Byte2 = 0000 0010 0010 0000 = 1024 + 32 = 1056 ushort2: Byte2 Byte1 = 0010 0000 0000 0010 = 1073741824 + 2 = 1073741826 All we did is change the order in which we view the bytes and the values are nowhere near the same. In fact, without specifying what kind of endianness we are working in, we can only assume the value of the two bytes together as a single value type. What we are going to use is a big-endian format and treat all values as a whole. No longer will we look at a ushort being 2 bytes. Instead, we will look at it at being 16 bits. This is going to make things a lot less complicated later on when we get into packing variables with bit counts not equal to 8, and thats the whole point of this. [edit] Bit I/O theoryFirst, lets look at how we will be reading and writing the bits. Since we know it is going to just be a big stream of 0's and 1's, and each value type is going to be treated as a whole and in big-endian format, this makes it look very easy on paper. Lets say we wanted to write 4 bits, "0110" to our output: outputBits = 0110 Simple enough. Now what if we wanted to write 3 more bits, "111"? All we have to do is append it! outputBits = 0110111 Alright, no problems there. Just like in English, if you want to add the word "cat" to a sentence, "i has a ", you just add the new value, "cat", giving you "i has a cat". But looking at our outputBits above, when we appended the 3 bits, it not only turned into a value that, as a decimal, means nothing to us, but we can not split it up into the original values without knowing the bit size. This is hardly going to be a problem. In the example used earlier where we know the position is going to be on a scale of (0,0) to (500,500), the client and server can both agree on the size of a position value being 9 bits (0 to 511, inclusive) instead of assuming the highest value allowed, which in our case was 16 bits. When reading this, the concept is the same. For visual purposes, we will just erase the bits after reading them. In coding, you will probably want to use a pointer defining which bit you are currently on. We will also, when reading, reset the position of the reading pointer to the very first bit and read to the write. This way we will read in the exact same order as we write. outputBits = 0110111 read 4: result = 0110 outputBits = 111 read 3: result = 111 [edit] Bit I/O programmingUnfortunately, programming this is the hard part. Not just programming it, but programming it to work efficiently. Theres no variable types that we can use to just store as many bits as we want. The closest we have would be a string, but that would end up slow and unnecessary. So instead, we are going to have to use a combination of pointers and arrays to get things done.
To go with how things were explained above, we will start at the most significant bit (7) and write down to the least significant bit (0), the move on to the next bit. To get things started, lets work with just a single bit. The only hard part here is keeping track where we are and making sure we have enough bits allocated. [edit] BooleansSo far, we have this: class Program { static void Main(string[] args) { BitStream b = new BitStream(); // Fill up the buffer with some random values for (int i = 0; i < 2; i++) b.Write(true); for (int i = 0; i < 3; i++) b.Write(false); for (int i = 0; i < 15; i++) b.Write(true); for (int i = 0; i < 2; i++) b.Write(false); for (int i = 0; i < 35; i++) b.Write(true); b.Write(false); // Reset the pointer so we can read from the beginning b.ResetPointer(); // Read the values for (int i = 0; i < 58; i++) Console.WriteLine("{0}: {1}", i, b.ReadBool()); Console.ReadLine(); } } class BitStream { const int AllocLength = 8; // Number of bits we will allocate at a time byte[] buffer; // Contains the bits we are working with int bitPointer; // Which bit we are currently on int bufferPointer; // Which byte we are currently on public BitStream() { buffer = new byte[AllocLength]; // Create the buffer with the initial size ResetPointer(); // Move to the first bit } public void ResetPointer() { // Reset the pointer position back to the first bit bitPointer = 7; // Start at the highest bit bufferPointer = 0; // Start at the first byte in the buffer } void MoveToNextByte() { bitPointer = 7; // Reset the bit position bufferPointer++; // Increase the buffer position // Resize the buffer if needed if (bufferPointer >= buffer.Length) Array.Resize<byte>(ref buffer, buffer.Length + AllocLength); } public void Write(bool value) { // Validate the pointer position, moving to the next byte if needed if (bitPointer == -1) MoveToNextByte(); // Write a 1 for true or 0 for false int bit; if (value) bit = 1; else bit = 0; // Convert the bool to a number buffer[bufferPointer] |= (byte)(bit << bitPointer); // Set the bit bitPointer--; // Increase the bit pointer } public bool ReadBool() { // Validate the pointer position, moving to the next byte if needed if (bitPointer == -1) MoveToNextByte(); // Read a 1 for true or 0 for false int bit; if ((buffer[bufferPointer] & (1 << bitPointer)) > 0) bit = 1; else bit = 0; bitPointer--; // Move the pointer to the next bit return (bit == 1); // Return true if bit is 1, else return false } } A lot of this should be quite basic so far. The two lines that might need emphasis are the reading and writing lines themselves. buffer[bufferPointer] |= (byte)(bit << bitPointer); // Set the bit On the left side of the equation, we have our current byte that we are working on. On the right, we first take the bit value (will be either 0 or 1) and move it over to the same bit position we are currently on by shifting it over bitPointer bits. We must convert this to a byte to apply the value to the buffer. When we have these two values, we can OR them together as stated by the |= line. As you should know, this means that we will set the bit if it is not already set. A bit will never end up being set twice since we move the pointer over every operation, so we are basically saying "keep it as 0, or set it as 1". As you can see, actually setting the bit is useless when it is 0 since if you OR a 0 onto a value, the result isn't going to ever change. Lets assume our buffer contains the bits "11000", which it will after the first two bit write loops in our example. The next value is going to be a 1, so how this will look is: buffer[bufferPointer] = 11000 000 // These last 3 bits that I separated have yet to be assigned bit = 1 bitPointer = 2 * 11000000 OR (byte)(1 << 2) * 11000000 OR (byte)100 * 11000000 OR 00000100 * 11000100 Nothing too hard, right? Good! Now lets move on to the read. if ((buffer[bufferPointer] & (1 << bitPointer)) > 0) bit = 1; else bit = 0; The buffer[bufferPointer] part you should already know. You should also already know that (1 << bitPointer) is how we create a value with just the one bit set - the bit we are working with. Using the AND operator on this will basically have every bit but the one we are working with set as 0. This leaves us with just this one bit that can possibly be non-zero, the one bit we are working with. Anything non-zero will mean the bit was set, and a zero will mean the bit wasn't set. Optimizations can easily be made from here. Along with that, you can easily add support for just about any number of bits. Unfortunately, it will be far from optimized. But lets give it a shot anyways. [edit] Unsigned intsTo add support for some other value types, we will just have to specify the value and the number of bits of that value to use. When we specify the number of bits we want to use, we are just going to use the number of bits starting from the right-most bit and counting to the left. We will start with just unsigned values for now. public void Write(uint value, int numBits) { // Loop through the bits of the value, writing them one by one for (int i = 0; i < numBits; i++) Write((value & (1 << i)) > 0); } public uint ReadUInt(int numBits) { // Loop through the bits of the value, reading them one by one and placing them in the return variable uint ret = 0; for (int i = 0; i < numBits; i++) if (ReadBool()) ret |= (uint)(1 << i); return ret; } This will write the first numBits number of bits from the variable value. It is horribly slow and inefficient, but it gives you the idea on how it is done. [edit] Signed intsTo add a sign, all it takes is a designated bit stating the value of the sign. public void Write(int value, int numBits) { // Apply the sign if (value < 0) { Write(true); value *= -1; // Convert to a positive value } else Write(false); // Write the rest of the bits. Since we already used one for the sign, we write numBits - 1. Write((uint)value, numBits - 1); } public int ReadInt(int numBits) { // Read the sign int sign; if (ReadBool()) sign = -1; else sign = 1; // Read the value, then multiply by the sign to support negatives return (int)(ReadUInt(numBits - 1) * sign); } Again, very simple... and very inefficient. The first bit is reserved for the sign, then the rest of the bits are for the value. [edit] ExpandingTheres a lot more to go over still - batch reading and writing bits (more than one bit at a time), writing whole variables at once when possible, support for signed variables and floating-point variables, strings, etc. Hopefully this is enough to get you started until this article can be expanded. :) [edit] Existing BitStreamsThe following links contain BitStreams that you can use. It is not suggested that you create your own from scratch unless you really know what you are doing (in which case you probably wouldn't be reading this) and have a lot of time to dedicate towards it. No point in reinventing the wheel if you don't need to!
[edit] See Also |


