DirectX:Direct3D:Tutorials:Implementing a simple renderstate manager
From GPWiki
The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server. However, the GPWiki forums are still active! Come say hello.
[edit] IntroductionThis tutorial is all about the management of rendering states. Renderstates are very common in DirectX. These renderstates change a lot during the rendering of a single frame. One of the things you might have read in the past is that the renderstate changes should be kept to a minimum. Calling the renderstate functions of DirectX can cause a penalty limit on your application's frame rate. The DirectX9 SDK comes with an example showing how to do (DirectX9) specific renderstate managing. Since a lot of people are still using DirectX7 or DirectX8, this tutorial will focus on a general renderstate manager. One that can be used in the last three DirectX versions. Although this tutorial is writing in C++, everything applies to Visual Basic 6 or VB.NET as well. Porting the given example code should be very straight forward. Also note that at the end will be more in-depth information regarding the new DirectX9 functionality. Pure devices. A pure device is a special kind of device in DirectX. When you tell DirectX to make the device pure (it’s an extra flag next to HARDWARE_VERTEXPROCESSING), it is impossible to use functions like GetTransform. Most of the Get functions won't work anymore! Although I'm not completely sure, DirectX might already perform some caching when it is in a NON-Pure mode! I haven't found real factual information regarding this subject though. [edit] RenderstatesRenderstates are set using a number of methods on your Direct3D device:
Note: Some of the SetTextureStageState values were moved to the SetTextureSampler in DirectX9.
The renderstate control certain functions of your Direct3D device. For example they control:
[edit] Pixelshaders and VertexshadersDirectX8 and DirectX9 support pixelshaders and vertexshaders. Changing these unnecessarily creates a performance penalty as well, according to the DirectX SDK documentation and various forums. We will take care of this as well. [edit] The approachOur state manager is going to be a C++ class, and will be used as a singleton. A singleton is a design pattern that makes one single instance of a class. We only need a single instance of our class because it will take care of anything. In the unlikely event of having multiple Direct3D devices, you could remove the singleton code and give yourself the freedom to create and delete instances of the class. The goal of our class is to mimic the Direct3D renderstate interfaces. This way, we only have to search and replace our code to make use of the renderstate manager. A short example: m_pDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, true); m_pDevice->SetPixelShader(myPixelshaderhandle); m_pDevice->SetTextureStageState(0, D3DFILTER_MIN, D3DFILTER_LINEAR); Simply becomes: StateManager->SetRenderState(D3DRS_ALPHABLENDENABLE, true); StateManager->SetPixelShader(myPixelshaderhandle); StateManager->SetTextureStageState(0, D3DFILTER_MIN, D3DFILTER_LINEAR); As you can see, nothing that a simple search and replace can't handle. [edit] What happens in our codeThe code of our copied DirectX functions is fairly straightforward. We will use a STL vector (or a VB collection) to store the renderstate changes that are being sent to the state Manager. But, before we change *anything* we will first check perform two operations:
If any of those return false, we add the value to our vector, and call the same function again - But this time the real Direct3D function. We simply return any error code return by the Direct3D function. This way, the function that actually called the state manager can still check for any errors that might have occurred when trying to change the renderstate. [edit] Writing the StateManager codeThis chapter will deal with the actual code of the statemanager. If you want to download the complete code, please download this file <file.zip>. (You can use this code freely wherever you want, for more information on the license see the file included in the zip file). Some parts might look quite similar to the DirectX9 SDK example. Which is logical, because this statemanager was written after I found this example. We start with a simple define. This define is simply used to set a maximum of cached states. There is no point in caching states which aren't used. And since we need to allocate memory for the number of states being cached, we better set a limit. The default limit is *2*. This means that stage 0 and 1 will be cached by this class. #define CACHED_STAGES 2 ///The number of stages to cache ///Remaining stages are simply passed through with no ///redundancy filtering. ///For this sample, the first two stages are cached, while ///the remainder are passed through Then we have a template. This is only a small part of it, the complete internals will be discussed in a later point: template < typename _Kty, typename _Ty > class multicache { protected: map< _Kty, _Ty > cache; } This template allows us to store any type we want in the map, while we can store the multicache object as a single entity in a different container. A 'container' is like a box where you can put objects in. The Standard Type Library (STL) for C++ has a number of container classes, each with their own different properties. For more in-depth information regarding STL please proceed here. The multicache class uses a map container. A map container uses two (or more) arguments. The first argument is the "key", which can be used to find stored items back by name or ID easily. The second argument is the value. When you know the key of an object, you can get the value instantly using a map container. Our actual class is quite straightforward too. Note that the code below is not the complete class. class CEStateManager: CSingleton<CEStateManager> { public: CEStateManager() {}; ~CEStateManager(void) {}; protected: typedef multicache<D3DTEXTURESTAGESTATETYPE, DWORD> textureStateStageCache; typedef multicache<D3DSAMPLERSTATETYPE, DWORD> SamplerStateCache; protected: multicache<D3DRENDERSTATETYPE, DWORD> cacheRenderStates; /// cached RenderStates vector<textureStateStageCache> vecCacheTextureStates; /// cached TextureStage States vector<SamplerStateCache> vecCacheSamplerStates; /// cached SamplerStage States /// [...] //Keep track of current shaders. LPDIRECT3DPIXELSHADER9 m_curPixelShader; LPDIRECT3DVERTEXSHADER9 m_curVertexShader; /// [...] public: void Init( LPDIRECT3DDEVICE9 pDevice ); void DirtyCachedValues(); void EndFrameStats(); ///Set a renderstate HRESULT SetRenderState(D3DRENDERSTATETYPE d3dRenderState, DWORD dwValue ); ///Set a texture stage HRESULT SetTextureStageState(DWORD dwStage, D3DTEXTURESTAGESTATETYPE d3dTextureStageState, DWORD dwValue); HRESULT SetSamplerState(DWORD Sampler, D3DSAMPLERSTATETYPE Type, DWORD Value); ///Keep track of shaders. HRESULT SetPixelShader(LPDIRECT3DPIXELSHADER9 shader); HRESULT SetVertexShader(LPDIRECT3DVERTEXSHADER9 shader); }; Although the containers might be confusing, the idea is quite simple. First, I will summarize the items the class will cache:
We will have to store all the states of above functions! The first storage definition is: multicache<D3DRENDERSTATETYPE, DWORD> cacheRenderStates; /// cached RenderStates Which will store all the Device->SetRenderState() changes being made. Since the RenderStates do not work with a texture index like the texture states, we only need a single dimensional array. Something which our map container is by default. We can store something in our map like cacheRenderstates<key, value> so key can be "D3DRS_ALPHABLENDENABLE" and the value can be "true". Although this is quite simple, things get a little bit harder with the texturing states. The texture stages need three values, which makes thins a little bit more complicated. We need the texture index, the option we want to change, and the new value. For example: SetTextureStageState(0, D3DFILTER_MIP, D3DTSFILTER_LINEAR). We cannot do this with a map container, since we only have room for one key. But, there is a clever way of solving this. The only thing left which we need to add is the pixel- and vertexshader caching. Since there can only be one shader at the time, we can just have two separate variables for this (m_curPixelShader and m_curVertexShader) Now that we have a place to store our states and shaders, we can work on the actual code that will perform the checking. [edit] Caching//Set device renderstate when needed. HRESULT CEStateManager::SetRenderState(D3DRENDERSTATETYPE d3dRenderState, DWORD dwValue ) { // Update the render state cache // If the return value is 'true', the command must be forwarded to the D3D Runtime. if( cacheRenderStates.set_val( d3dRenderState, dwValue ) ) return m_pDevice->SetRenderState( d3dRenderState, dwValue ); return S_OK; } HRESULT CEStateManager::SetTextureStageState(DWORD dwStage, D3DTEXTURESTAGESTATETYPE d3dTextureStageState, DWORD dwValue ) { // If this dwStage is not cached, pass the value through and exit. // Otherwise, update the texture stage state cache and if the return value is 'true', the // command must be forwarded to the D3D Runtime. if( dwStage >= CACHED_STAGES || vecCacheTextureStates[dwStage].set_val( d3dTextureStageState, dwValue ) ) return m_pDevice->SetTextureStageState( dwStage, d3dTextureStageState, dwValue ); return S_OK; } These two functions are replacing the standard DirectX functions with the same name. The contents of both functions are straightforward: Check if the renderstate/texture state already has the value the program wants to set. If it is equal, do not do anything. Is it different, then change it in the cache, and call the DirectX function as well. The functions which work with texture stages also check if the stage number is higher or equal than the CACHED_STAGES constant. If so, the Direct3D function is called immediately, since we are not keeping track of the stage. The pixel- and vertexshader checks are even simpler: HRESULT CEStateManager::SetPixelShader(LPDIRECT3DPIXELSHADER9 shader) { //Nothing to do, return if (shader == m_curPixelShader) return S_OK; m_curPixelShader = shader; return m_pDevice->SetPixelShader(shader); } HRESULT CEStateManager::SetVertexShader(LPDIRECT3DVERTEXSHADER9 shader) { //Nothing to do, return if (shader == m_curVertexShader) return S_OK; m_curVertexShader = shader; return m_pDevice->SetVertexShader(shader); } These functions replace the Direct3D calls (again). So you have to replace all your Direct3DDevice->SetPixelShader() and Direct3DDevice->SetVertexShader() calls with this one. The function simply checks if the pointer is the same as the one currently active. If so, it will return immediately. When it is different, the cache variables are updated and the actual device call is made. [edit] The comparison functionAs promised, I will discuss some about the inner juice of the state manager. The functions which are used above. The set_val function is part of the multicache class. inline void dirty( _Kty key ) { map< _Kty, _Ty >::iterator it = cache.find( key ); if( cache.end() != it ) cache.erase( it ); } // Called to update the cache // The return value indicates whether or not the update was a redundant change. // A value of 'true' indicates the new state was unique, and must be submitted // to the D3D Runtime. inline bool set_val( _Kty key, _Ty value ) { map< _Kty, _Ty >::iterator it = cache.find( key ); if( cache.end() == it ) { cache.insert( map< _Kty, _Ty >::value_type(key, value) ); return true; } if( it->second == value ) return false; it->second = value; return true; } Although the functions might look overwhelming (because of the template part), it is quite simple. The dirty function removes an item from the cache, with the given key. The set_val function will try to find the given key in the container. If it can't be found, it will make a new entry in the map, and return true - Which tells the other statemanager functions like SetRenderState(). If it has been found, and the value is equal to the cached one it will return false - No updates needed. When the item has been found, and the values are not equal, it will return true again, so that the value will be set. [edit] But not everything is shiny...There is one problem using the method described throughout the tutorial, replacing all the appropriate device calls. When you are using extra libraries (for GUI rendering, for example) which uses the Direct3DDevice as well, the state manager will lose it's synchronization with the device. The other library does not use our state manager at all, so the state manager cannot keep track of the changes. My experience is though, that it does not really matter. It might result in some redundant renderstate calls, but not a lot every frame. And finally, the results seem to vary a lot. Inside my own project the FPS does not really change that much about 4-5fps), in the StateManager example it's quite some more. Still my own opinion is that the amount of renderstate calls should be kept to a minimum. Every videocard seems to react differently to it, and therefore the impact might be more significant on older video cards. Without statemanager (55fps) With statemanager (76fps) [edit] DirectX9DirectX9 has a special feature available, using the EffectManager. We can inherit this, copy the function, and automatically let things happen when modifying effect files! This should save the hassle of changing every SetRenderState call in your application. The code required to use this feature is quite a lot though. Therefore I will refer you to the “StateManager” example from the DirectX9 SDK which can be found in the \Direct3D samples directory. After you have opened the solution, the file “EffectStateManager.h” and “EffectStateManager.cpp” are the ones that take care of this functionality. A really quick summary of the system:
Example: STDMETHOD(SetFVF)(THIS_ DWORD dwFVF ) { m_nTotalStateChanges++; return m_pDevice->SetFVF( dwFVF ); } STDMETHOD(SetLight)(THIS_ DWORD Index, CONST D3DLIGHT9 *pLight ) { m_nTotalStateChanges++; return m_pDevice->SetLight( Index, pLight ); } STDMETHOD(LightEnable)(THIS_ DWORD Index, BOOL Enable ) { m_nTotalStateChanges++; return m_pDevice->LightEnable( Index, Enable ); }
[edit] ConclusionThis tutorial builds on top of the example that came with the DirectX9 SDK and shows how to implement your own simple state manager class which will reduce the amount of state changes during every frame. The performance gains are not completely stable and are likely to be different on everyone system. Still, the amount of render state changes should be kept as low as possible to prevent extra overhead. The code has been ported to DirectX8, since the SDK itself comes with a DirectX9 version. Some extra functionality is added as well. For example the renderstate manager in this tutorial caches vertex shader changes as well. (Although it looks like in the October DX9 release sample that the vertex and pixel shaders are cached, framerates seem to double when sorting is on) [edit] Source codeThe code shown in this tutorial can be downloaded here: Tutorial_StateManager.zip. Please read the license file, for more information regarding the license of the code. Also check out the readme.txt file for more information regarding the zip-file contents. |




