If you just want to see some screenshots and a video about the CES SoulCraft Tech Demo, go to the next blog post
, this post is all technical and boring :D
Welcome to my super big blog post for the SoulCraft
Tech Demo, more specifically the first iteration of the demo for the CES Nvidia press conference which was today
. Nvidia was showing their Tegra2 chip on different devices like the LG mobile phones and Acer tablets. And our SoulCraft Tech Demo is among the things they showed, my colleague Karsten was on stage today (20 minutes ago) and helped presenting it. Here is a photo from the event with our demo on screen
Instead of blogging normally about the development process I decided to do delayed blogging of the last 8 days developing
this first version of our tech demo. Delayed blogging means I wrote down what I did every day, but I did not publish it. Instead it is now all published at once. Reasons for this are:
- This was a Top Secret Project for NVidias CES press conference ^^
- I also did not want to piss anyone off by blogging about problems with tools or hardware.
- No one would have believed me that we could pull it off (just look at the first screenshot)
- Even internally everyone was very skeptic. If we would fail and not be able to bring the demo on the Nvidia Tegra device, I could just wait until it finally works. The CES was NOT a milestone for us. This demo was planned to be done till mid-February.
- Nice way to blog the daily process without having to think too much about making beautiful screenshots, just the end result counts :)
I am not going to talk a lot of what happened before (this blog post is already long enough). As mentioned in my previous blog post
we spend most time in the last months on the Delta Engine and optimizing 2D and 3D rendering for all platforms we support (iPhone, iPad, iPod Touch, Android, Windows Phone 7, PC, Xbox, etc.). These platforms are very different and some of them like PC and Android have hundreds of possible configurations and performance. Some platforms do not support custom shaders at all (like the Windows Phone 7, ES 1.1 Android devices or older iPhones) or not very well (like slower Android devices or even the iPad, which just has too many pixels to fill for its slow GPU), these are all things we have to worry about when designing our engine and building shaders and 3D content.
Day 1 (day after Chrismas)
Well, this is not really Day 1 of the SoulCraft Tech Demo, it even has nothing to do with SoulCraft yet and it was actually the whole Christmas week (20-26. Dec 2010). The result of all the hard work is still boring. All I worked on the whole week was to get a line on the screen. This might sound retarded, everyone knows it would not take much more than a line of code to put a line of red pixels on the screen, but in this case this line was a crazy amount of work. The reason for all this work that even goes back to November is our new rendering system, which is fully dynamic and can handle all kinds of vertex formats and data on all platforms we support, which are very different. When working on the Delta Engine it is always easier to focus on one specific task and optimize it like crazy, which is how the whole Delta Engine is build. We rarely have to go back to optimized parts and rethink them, they usually kick as (like our module system, the rendering system, the input system, etc.).
We spend ruffly 80% of our time in December to get all the following to work (obviously not just for one single line, but for everything else to come, but this line proofs it all works now):
- Dynamic shader creation with many different features that can all be combined and customized, fully optimized for each platform, most tweaking work goes into this.
- Fully flexible vertex formats (you can put whatever data into it and it works on all platforms), this makes our messy optimization code for iPhone, Android, WP7 now much easier, we just use shorts, bytes or floats and can switch formats around with some flags without having to rewrite huge portions of code (it took many weeks to put those iPhone optimization in place)
- All rendering is now fully abstracted away, models have no knowledge of any vertex formats, shaders, geometry data, vertex buffers, etc. we just have to pass in the precomputed geometry (or calculate new one on the fly like for lines, effects, unit tests, etc. which can easily be done millions of times per second) and let the Delta Engine handle all the rendering and optimizations. This makes writing rendering code A LOT easier, namespaces like Model Rendering shrunk 90% and render now 10-100 times faster.
- Low level graphics code has become more complex and we spend a lot of time optimizing stuff there, but high level rendering is now very easy and straight forward. Now changes are easy to make and we can experiment a lot more.
- Also finished optimized 2D, 3D, dynamic and static geometry rendering (10k fps, without SwapBuffer up to 300k with lines). Implemented a much more efficient system that works great for now. Some improvements like merging geometries still have to be done, but thats easier to do now too.
- Tested and fixed font rendering and also improved 3D sphere generation, now UVs fit much better and there are no more wiggly UV errors!
- We also had a major holdup on day 1 because OpenGL would randomly crash with InvalidOperation when loading dds textures, which made testing shaders and rendering very hard. It actually took a whole day until this issue was fixed. The problem was not the loading itself, but instead what happened before that, some gl methods reported an error, but we did not catch it until the texture loading happened (which actually worked, but looked like it crashed).
Day 2 is also very boring, I just extended the unit tests from yesterday and we now can draw some more complex models, still generated on the fly. We also had to do a lot of design decisions for the 3D content. It is the week between Christmas and New Year and 90% of our artists are not here, enjoying their holiday. We had to stick with what already was working and reduced the level a lot to make sure we could really pull it off.
We also worked on adding more shaders, testing the path camera to be used in the demo and prepared our FBX importing tools and validators for tomorrow to test importing all 3D models.
On day 3 we started testing content and optimizing it. Even the reduced level has over 150 single meshes in it, rendering them all out one by one would be way too slow (that even would be slow on PC). One important thing the Delta Engine does automatically for you at the content generation stage is to pack textures together and handle all the rendering code fully automatic for you. This works great for 2D content, but it was not implemented for 3D models yet. So today I remapped all 3D models to use the correct atlas texture coordinates. Later this can be used to merge all meshes that share the same atlas textures and shaders.
Another task today was to test all 3D models as many of them were not successfully imported with our new FBX importer. The code changes were pretty quick, but exporting and testing all 3D models one by one takes a lot of time. Good thing we did reduce the demo early this week from 50 different 3D models to 20 that worked. Today we reduced it to around 15, the rest was too hard to fix and would have taken too much time (only one artist left to help us out and clean up the mess from before, thanks Mathias for your hard work ^^).
Today I helped fine-tuning and optimizing shaders, a lot of things could be merged and simplified. For example all of our 3D models we tested yesterday have the same ambient, diffuse and specular colors set, so we could optimize them all out and simplify some shaders by a few instructions.
Then we went crazy and added Fresnel, Specular Maps, Normal Mapping with Diffuse Maps, Light Maps and Diffuse Specular for the shaders used by the characters used in the SoulCraft demo. This is our friend the Abomination. Actually he is just a big enemy to be slaughtered in the game. The shader complexity went up to 40-50 instructions, but we optimized it down to 4 texture reads (diffuse map, normal map, specular map, light map) and 23 instructions to do all of the above. The result is what you see on the screenshot above. It runs pretty good on our emulator, but we have to test it on the mobile devices soon and make sure that this will not be rendered fullscreen (pixel fill-rate will kill us). Please note that we will optimize all shaders more in the next month. The light map will not be used anymore (instead he will be animated and we might think about real time shadows) and the specular map could be used from the diffuse map (e.g. inverted) or merged with the diffuse or normal map. Finally we have to test this on more platforms and check if our fall-backs look good enough.
Please also note that this Abomination guy has way too many polygons and vertices (28000 used vertices), but our artists had no time to reduce him more. Good thing we decided to skip animation and the level is pretty small, so our vertex shaders can be very fast enough. This won't be as easy once we enable animation again and make the level much more complex and bigger.
Originally we planed to deploy a first build today on Android, but a lot of our content still did not work well, shaders needed to be tested and optimized more. We also wanted to implement vertex compression to make sure we can save as much bandwidth as possible (that was the biggest performance improvement for the ZombieParty iPad version), but in the end that feature did not make it yet because we could not re-normalize the normals and tangents correctly.
We also started today with importing the level with all its meshes and loading all lightmap uvs from the level and merge it with the already imported 3d models without lightmaps. As you can see from the screenshot above the lightmap and its UVs work fine after importing and look pretty cool, it gives the model ambient occlusion, especially when there are lot of vertices (and yes, this guy has a lot of vertices and and a lot of detail). Some 3D models were broken and had no tangents exported. The lightmaps UVs from the level also did not fit with our existing imported meshes. We hoped we can fix this issue by using the vertices from the level and remapping them to the 3d models imported earlier. On the screenshot above you see the result of merging these vertices, but we had major problems using the index buffers and had to disable them. What we did not know yet was that all merged vertices were totally messed up, only the position and lightmap uv fitted, but the normal uvs were totally messed up (rotated, flipped and whatever else the stupid lightmap baker did) and all texture seams were also broken. But it came even worse: Different 3D meshes from the level had different numbers of lightmap uvs and even different uv seams, aside from breaking all our geometry and uvs anyway. Our initial approach to render a list of meshes with a extra lightmap uv vertex stream could not work anymore. At least not that easily. In theory it is possible to merge all used meshed in the level that have the same vertex positions to fit into one unified mesh that could work with all lightmap uvs, but this could even mean that index buffers could not be used and much more vertices would have to be processed. This is really complicated and I will probably write about this in the future when we attack this problem again.
This was basically a show-stopper. We could have said right there and then: Okay, we have a major roadblock here, we cannot continue. Everyone agreed that we should continue trying to search for alternatives. Maybe rendering out the whole level in merged meshes could work too. This would mean A LOT more different vertices need to be processed. For example some models like the "Wall" is set 30-40 times in the level, the same as some other columns or trees. Now each of those would have to be rendered out individually instead of using instancing. This obviously also has advantages as we can combine a lot of meshes into big meshes that all share the same materials and atlas textures. However, this all would be to be implemented till Monday (today was Friday and the last of the year and tomorrow is new years). Risky risky ..
Day 6 (New Years)
Okay, after a short new years party (at least I did not crash my non-existing car like my brother did) and after some sleep (not enough) I started working on a solution for the lightmap uv stream problem from yesterday. First all mesh data would have to be merged, this was actually quite easy due our content processor system. Rendering the positions and UVs out with the merged lightmap uv stream was also quick and easy, I just forced a simple textured shader with 1 instructions on all geometry and rendered it out. But all UVs were completely messed up, rotated and just plain wrong. This happened because the vertices from the level and the ones from the mesh were different (same in number, but just ordered differently). So I used the UVs from the level since our instancing would not work anyways, this would not matter (each mesh, even if it was just a copy, had different UVs). To keeps things simple all other features were disabled (no normals, no tangents, no index buffers, no vertex compression, no shaders except textured shader). The following was the result. The level working finally with normal diffuse map uvs (yesterday we had just the lightmap uvs working).
So far so good, now I tried to enable one feature after another. This is what happened when I tried to use the index buffers from the imported meshes. They totally did not fit anymore. There was also another issue on this screenshot, I forgot to add the vertex offset for each mesh to the index buffer, but even after fixing this the original index buffer could not be used and a completly new index buffer needed to be created, which was a lot bigger to my surprise. But after thinking about it it mades sense. The new vertex contains not only positions, uvs, the normals and tangents, but now also the lightmap uvs, which have different texture seams and borders than the normal uvs for diffuse and normal mapping. Again, not good for performance and obviously our fault for not testing this out earlier, but what can you do if the level was only completed yesterday.
After going back and just merging the vertex positions, uvs, normals and tangents and displaying the level again with the new index buffers and using different shaders (normal mapping, etc.), some models like the church in the background were inside out because they have mirrored world matrices. I needed to find an algorithm to detect inverted render matrices, but was unable to do so. Instead I hacked in a couple of simple checks that see if the X, Y and Z axis form a right handed coordinate system (if Z is pointing in the other direction, then it must be inverted). Now if such a case was detected all triangles of that mesh are flipped around, then it looks all good again. Another thing that was already wrong in the first screenshot is the position of each mesh. The object matrix was already applied for all imported meshes, but the level stored the render matrix for each mesh in a way that applied the object matrix again. This had to be undo-ed by applying the pivot transformation inverse. On this third screenshot the whole level was already rendering with 1200 FPS with over 250000 vertices. Not bad, but remember my PC is obviously faster than a mobile device and none of the complex shaders are used yet.
After fixing the mirrored matrices and fixing the positioning of each mesh, this is what our little world looks like. Originally even the reduced level was filled with almost twice as many objects, but we needed to cut some because of the already mentioned tangent problems and because of performance. Basically we deleted everything that was not visible from the center of the level. For me this was when I realized that the demo would not look any better than this screenshot except for better shaders and fixing some other things. With lightmaps enabled this looks already pretty nifty, but it would be more amazing to have a more full level and all the other currently missing features like animations, effects, etc. But you have to settle with what you have and we made some quick decisions to skip all content that did not work yet to get things done in the next few days.
Uhh, just one day left until this all has to work on the Android Tegra device and we did not even put anything except some simple unit tests on the device. The problem was the level was not even working on the PC yet and every little problem took many hours to fix. We wanted to save time by only having to deploy the project once on the tegra and then do some final fixes there if we still run into problems. But still the idea was to deploy a first version on Android by the end of the day. We already missed our deadline at Day 5 for the first build because of all the level and lightmap problems. Early this day I played around with different texture settings (mipmaps on, off, sharpen mipmaps or not, using bilinear filtering or using trilinear filtering). When I deployed the following unit test on the Android Nvidia Tegra device it looked even worse and there were compression artifacts clearly visible and because of the 565 bit framebuffer the bilinear filtering lines looked way worse. It made the whole demo look crappy. So for a minor performance cost (not detectable on PC, but on mobile devices trilinear filtering costs a bit) I enabled trilinear filtering, which made it look better, but the mipmaps were still crappy and blurry. Sharpening made it even worse, they did not fit together anymore. Again because of time constrains I left it the way it was before (mipmaps, but no sharpening, and using trilinear filtering for big planes like the ground). In the end no one noticed anything bad about the ground, we even did reduce the shader from 18 instructions with NormalMapping, Specular-Diffuse calculation and applying our fake-HDR LightMaps and so on to just 5 instructions (2 texture reads, displaying the diffuse map and our fake HDR lightmap on top). This was one of the biggest performance gains beside reducing the resolution (see tomorrow).
Next, we tried to enable vertex data compression again, each vertex has 52 bytes (3 floats for position, 2 floats for uvs, 3 floats for normals, 3 floats for tangents and finally 2 more floats for the lightmap uvs, which is 13 floats = 13*4 bytes = 52 bytes), which is a crazy amount for mobile platforms. As mentioned on Day 1 our content system can happily change anything at the geometry and vertex level. For example vertex compression can be enabled when generating content and then it goes down to 13 shorts = 26 bytes (32 bytes on platforms that require 4 byte alignments like XNA/DirectX), which can be twice as good for bandwidth and performance. As you can see from the following screenshot it was not so easy because we had this 4 byte alignment still turned on, but only stored 26 bytes for each vertex. This messed up the whole level rendering. Even after fixing this we still had the problem that normals and tangents were totally messed up and even re-normalizing them in the vertex shader did not help. We tried around fixing this for a few hours, but again had to give up because there was no more time left. We would rather have a slow working demo than nothing.
So at the end of the day everything came together, we quickly put together a demo on the PC with the Delta Engine, added the path camera and displayed the level with all shaders. There were obviously issues like lighting was a little dark and sky cube mapping was broken, but most of it could be fixed easily or there would be more time tomorrow when fine tuning. Sadly the Android deployment turned out to be a disaster! I started deploying in the evening when most of our level content issues were fixed and it at least somehow worked on the PC. Compiling it for Android with the Delta Engine was pretty easy and it ran right away on the Tegra, but we got a major problem with content loading, everything above 1MB crashes the Android AssetManager with an Java IO exception. I worked all night trying to find a solution to this problem and wrote a little re-packaging tool that used the aapt tool (Android Asset Packaging Tool) with the -0 "" flag to save all content uncompressed. This worked for a while with reduced content, but the full content with >48MB still crashed badly. In fact after launching the application nothing happened anymore, no log, nothing, it just froze. Debugging also did not work (has always been pretty broken anyway).
But this was not the only issue. When testing level meshes one by one new problems quickly appeared like we saw that the depth buffer was disabled and our attempt to turn it on was completely ignored. We would have to write our own framework methods to call the OpenGL device creation methods ourselves as the MonoDroid framework is very broken. I reported a couple of bugs for the MonoDroid framework (which is still in preview btw, you can't expect that everything works, we had plenty of other issues in the past, but always found a way around them).
The plane to the CES in Las Vegas was also leaving in a few hours for one of us and my plane would go at the following night. I already had to decide not to fly to the CES because this would take at least another day to fix and I needed to sleep badly after working through so many nights (was awake for 24 hours this day already). This was pretty sad for me because I would have enjoyed some days of relaxing in Las Vegas and seeing the CES for the first time, but the flight would have been too stressful and the demo needed more time and fine-tuning than the few hours that were left.
Some of the issues like the Depth Buffer (without it the demo is unusable) and the 1MB issue were real deal breakers again. Others would have probably given up and again no one internally believed that the Android Tegra version could be fixed in this short amount of time left ^^ I worked all day on Day 8 and I got plenty of help from my fellow programmers, but again most time was lost in the night when I tried to reduce each content to below 1MB to make the demo run on the device (the whole level needed to be restructured and I re-saved every single texture and 3d model again). Once I had something working, everything else broke. For example I wanted to test if the resolution change worked and switched to another Android tegra device and suddenly all my drivers brake and I could not deploy anything anymore. Reinstalling drivers took hours and then the Android Tegra devices crashed and did not boot up again. It had to be flashed again and this kind of stuff obviously has to happen at the very last day when no time is left. Obviously the guys at Nvidia and my colleague that already was sitting in a plane to Las Vegas were pretty nervous that this all could fail. But in the end it all worked out, all the hard work payed off, as you can read in Day 8:
Screenshot 1 from the final result today (this looks 1:1 the same way on Android). Pretty good for just a few days of hard work :)
Screenshot 2 shows the Abomination, which runs a little below 30fps when looking at it this close on the Android Tegra device, but looks pretty amazing.
And Screenshot 3 shows our level and how everything fits together. Obviously the level is small and boring yet, but it will be improved in the near future! See the video from the next blog post to get a feel how this all looks together and on the device. Again, this is only the first iteration of the demo, we will make it more spectacular till next month.
Here you can see a few screenshots from the Android Tegra Version that went through a lot more optimization and tweaking today. Since I spoke about optimizations and other Android issues yesterday (some of them were also fixed today), I will focus on the content in today's discussion as most of the final touch and optimization was about content. As mentioned before some content had to be removed because we had tangent export problems (they were missing from some models and we have no regenerate tangent method yet), other content was too shader complex, some shaders could be simplified, other content had to be moved around to fill less pixels on the screen. And there was also some fine-tuning like moving overlapping planes around to avoid z-fighting and repackaging some atlas textures, increase textures near the camera, reduce some in the background, etc.
Our target for this SoulCraft Tech Demo is reaching at least 30fps in 1024x600 on the Android Tegra. Now that we also have a 1338x768 Tegra and played around with it, we might even want to build a version in that resolution (and later even full hd 1920x1080 version). Some of our unit tests run perfectly fine with 60fps in that resolution, but for now the resolution was reduced a bit because of our content, shaders are too complex right now and some models are too heavy too. We also could not enable all features and vertex compression yet because we had some last minute problems and no time to figure them out for this first iteration (like enabling AA and testing our optimized shaders with compressed vertices).
Due our complex shaders and because most of our 3D content was created for PC/Console we could only reach about 15-20fps from the version yesterday and thus reduced the resolution a bit to 800x480 (1/3 less pixels to render) today. I also did some extra optimizations (ground shader complexity reduced from 18 instructions to 4). Now it runs absolutely stable at 30fps on Android Tegras (we also did test this on other Android devices, but performance is not very good there, we need a much lower resolution there and lower poly counts). Performance is good now, but some content does not look as great as it could because of some optimizations (like the ground shader, but even the character shaders have some issues right now and will be improved in the future, you can see the difference if you look at the screenshots from the last days).
Some pixel shaders use up to 30 instructions, e.g. the Characters have Diffuse Maps, Normal Maps, Specular Maps, Fresnel calculation and even use Light Maps right now (to fit better into the level, we will have to remove light maps once animation are implemented). This is quite a lot for a mobile platform, especially if the resolution is high. On PC or Consoles often complex shaders with 50 or 100 instructions can be used, but even that is not common, often a simple shader with 10-20 instructions looks good enough (or 1-8 instructions for older hardware) and allows crazy high resolutions (like on my 30 inch screen with 2560x1600, I love it). We will optimize our shaders further and try to achieve good effects with them by doing some more tricks (merging textures, doing crazy pixel shader optimizations, etc.). Our target is 1-4 (2 on average) instructions for low mobile hardware, 2-12 (6 on average) instructions on medium mobile hardware and 4-24 (6-16 on average) for high quality mobile hardware like Tegra. There is also another profile for PC and Consoles that does not have an upper limit (30, 50 or even 100 instructions are possible). The good thing is a shader can be very complex on the PC and will work fine on a fast GPU, but if the user has a slower PC we can easily switch to the high/medium/low mobile shader profiles and use the same optimizations for PCs or slower consoles.
Textures were initially 2048x2048, but there was visually no difference with 1024x1024 textures and many textures have gone down to 512x512 or even 256x256 to optimize bandwidth even more. Our level was initially more than twice as big and has about 50 models, but only 15 made it into this first demo because of time constraints. There is a lot more to come for this SoulCraft Tech Demo, the CES version is just the first iteration. It will not be easy for our artists as many of them are working for the first time on 3D content for mobile platforms and we had many heated debates, but now with a working demo on the Tegra everyone will understand the vertex, polygon and texture issues a lot easier.
Another example is target specification for characters or complex buildings: Max. 5000 vertices and max. 4000 polygons are allowed. We did have some 3D models with less polygons, but all buildings, the characters for this demo are around 25000-30000 vertices with 8000-10000 polygons. That is way too much for a mobile platform. We still managed to render this out quite good because we optimized our vertex shaders and this demo has no animation yet (but it will be added soon). This content will be optimized better for the next iteration of this demo, we will add a lot more content instead and make the world bigger, fuller and more exciting with animations and effects.
Please also note that the SoulCraft Tech Demo is NOT just a showoff for the great Nvidia Tegra platform on Android, it runs perfectly fine on all of our supported platforms: iPhone, iPad, iPod Touch, Android phones, Android tablets, Windows Phone 7, PC, Xbox 360, Linux and MacOS. We currently focused content and optimizations for the Tegra, but the demo itself is written without any knowledge of the target platform (like ZombieParty and like every project with the Delta Engine). With a click of a button it compiles and runs on all the mentioned platforms, but we will obviously optimize our engine for all target platforms and make sure all content works on all platforms as well (that is actually most of the work left to do, creating a complex 3D game is no easy or quick task for Artists). If all goes well we should have a pretty good SoulCraft demo in a month that works on all platforms and shows all 3D features of the Delta Engine. Then we can focus on the game-play again and finish the game itself :)