Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Optimizing Android Game mTricks Looting Crown on the Intel® Atom™ Platform

6 Jul 2015 1  
This article shows how to analyze and improve the performance of a mobile game and how to optimize graphic resources for a mobile platform, using mTricks Looting Crown as an example.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Intel® Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for Android, Internet of Things, Intel® RealSense™ Technology and Windows to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathon’s, contests, roadshows, and local events.

Abstract

Games for smartphones and tablets are the most popular category on app stores. In the early days, mobile devices had significant CPU and GPU constraints that affected performance. So most games had to be simple. Now that CPU and GPU performance has increased, more high-end games are being produced. Nevertheless, a mobile processor still has less performance than a PC processor.

With the growth in the mobile market, many PC game developers are now making games for the mobile platform. However, traditional game design decisions and the graphic resources of a PC game are not a good fit for mobile processors and may not perform well. This article shows how to analyze and improve the performance of a mobile game and how to optimize graphic resources for a mobile platform, using mTricks Looting Crown as an example. The looting crown IA version is now released with the following link.

https://play.google.com/store/apps/details?id=com.barunsonena.looting

Figure 1. mTricks Looting Crown

1. Introduction

mTricks has significant experience in PC game development using a variety of commercial game engines. While planning its next project, mTricks forecasted that the mobile market was ready for a complex MMORPG, given the performance growth of mobile CPUs and GPUs. So it changed the game target platform for its new project from the PC to mobile.

mTricks first ported the PC codebase to Android*. However, the performance was less than expected on the target mobile platforms, including an Intel® Atom™ processor-based platform (code named Bay Trail).

mTricks was encountering two problems that often face PC developers who transition to mobile:

  1. The low processing power of the mobile processor means that traditional PC graphic resources and designs are unsuitable.
  2. Due to capability and performance variations among mobile CPUs and GPUs, game display and performance vary on different target platforms.

2. Executive summary

Looting Crown is SNRPG (Social Network + RPG) style game, supporting full 3D graphics and various multi-play modes (PvP, PvE and Clan vs Clan). mTricks developed and optimized on a Bay Trail reference design, and the specification is listed in Table 1.

Table 1. Bay Trail reference design specification and 3DMark score

  Bay Trail reference design 10"
CPU Intel® Atom™ processor Quad Core 1.46 Ghz
RAM 2GB
Resolution 2560 x 1440
3DMark ICE Storm Unlimited Score 15,094
Graphics score 13,928
Physics score 21,348

mTricks used Intel® Graphics Performance Analyzers (Intel® GPA) to find CPU and GPU bottlenecks during development and used the analysis to solve issues of graphic resources and performance.

The baseline performance was 23 fps, and Figure 2 shows GPU Busy and Target App CPU Load statistics during a 2 minute run. The average of GPU Busy is about 91%, and the Target App CPU Load is about 27%.

Figure 2. Comparing CPU and GPU load of the baseline version with Intel® GPA System Analyzer

3. Where is the bottleneck between CPU and GPU?

There are two ways to know where the bottleneck is between CPU and GPU. One is to use an override mode, and the other is to change CPU frequency.

Intel GPA System Analyzer provides the "Disable Draw Calls" override mode to help developers find where the bottleneck is between CPU and GPU. After running this override mode, compare each result with/without the override mode and check the following guidelines:

Table 2. How to analyze games with Disable Draw Calls override mode

Performance change for "Disable Draw Calls" override mode Bottleneck
If FPS doesn’t change much The game is CPU bound; use the Intel® GPA Platform Analyzer or Intel® VTune™ Amplifier to determine which functions are taking the most time
If FPS improves The game is GPU bound; use the Intel GPA Frame Analyzer to determine which draw calls are taking the most time

Intel GPA System Analyzer can simulate the application performance with various CPU settings, which is useful for bottleneck analysis. To determine whether your application performance is CPU bound, do the following:

  1. Verify that your application is not Vertical Sync (Vsync) bound.
    Check the Vsync status. Vsync is enabled if you see the gray highlight in the Intel GPA System Analyzer Notification pane.
    • If Vsync is disabled, proceed to step 2.
    • If Vsync is enabled, review the frame rate in the top-right corner of the Intel GPA System Analyzer window. If the frame rate is around 60 FPS, your application is Vsync bound, and there is no opportunity to increase FPS. Otherwise, proceed to step 2.
  2. Force a different CPU frequency using the sliders in the Platform Settings pane (Figure 3) of the Intel GPA System Analyzer window. If the FPS value changes when you modify the CPU frequency, the application is likely to be CPU bound.

Figure 3. Modify the CPU frequency in the Platform Settings pane

Table 3 shows the simulation results for Looting Crown. With "Disable Draw Calls" override on, the FPS remained unchanged. This would normally indicate the game was CPU bound. However, the "Highest CPU freq" override also didn’t change FPS, implying that Looting Crown was GPU bound. To resolve this, we returned to the data in Figure 2, which showed that the GPU load was about 91% and CPU load was about 27% on the Bay Trail device. The CPU could not be utilized well due to the GPU bottleneck. We proceeded with the plan to optimize the GPU usage first and then retest.

Table 3. The FPS result of the baseline version with Disable Draw Calls and Highest CPU Frequency.

Bay Trail device FPS
Original 23
Disable Draw Calls 23
Highest CPU freq. 23

4. Identifying GPU bottlenecks

We found that the performance bottleneck was in the GPU. As a next step, we analyzed the cause of the GPU bottleneck with Intel GPA Frame analyzer. Figure 4 shows the captured frame information of the baseline version.

Figure 4. Intel® GPA Frame Analyzer view of the baseline version

4.1 Decrease the number of draw calls by merging hundreds static mesh into one static mesh and using bigger texture.

4 and 5 show the information captured by Intel GPA Frame analyzer.

Table 4. The captured frame information of the baseline version

Total Ergs 1,726
Total Primitive Count 122,204
GPU Duration, ms 23 ms
Time to show frame, ms 48 ms

Table 5. Draw call cost of the baseline version

Type Erg Time(ms) %
Clear 0 0.2 ms 0.5 %
Ocean 1 6 ms 13.7 %
Terrain 2~977 20 ms 41.9 %
Grass 19~977 18 ms 39.0 %
Character, building and effect 978~1676 19 ms 40.6 %
UI 1677~1725 1 ms 3.4 %

Total time of "Terrain" is 20 ms while the time of "Grass" in the "Terrain" is 18 ms. It’s about 90% of "Terrain" processing time. So we analyzed further to see why it takes a lot of time for "Grass" processing.

Figures 5 and 6 show the output of the ergs for "Terrain" and "Grass".

Figure 5. The terrain

Figure 6. Texture of "Grass"

Looting Crown drew the terrain by drawing a small grass quad repeatedly. So the number of draw calls in "Terrain" was 960. The drawing time of one small grass is very small; however, the draw call itself has overhead, which makes it an expensive operation. So we recommended to decrease the number of draw calls by merging hundreds of static mesh into one static mesh and using bigger texture. Table 6 shows the changed result.

Table 6. Comparison of draw cost between small and big texture

Small texture, ms 18 ms
Number of ergs 960
Big texture, ms 6 ms
Number of ergs 1

Figure 7. The changed terrain

Though we simplified, the tile-based terrain required a lot of draw calls, so we decreased the number of draw calls and saved 12 ms on drawing the "Grass".

4.2 Optimizing graphics resources

Tables 7 and 8 show the new information captured by Intel GPA Frame analyzer after applying the big texture for grass.

Table 7. The captured frame information of the 1st optimization version

Total Ergs 179
Total Primitive Count 27,537
GPU Duration, ms 24 ms
Time to show frame, ms 27 ms

Table 8. Draw call cost of the 1st optimization version

Type Erg Time(ms) %
Clear 0 2 ms 10.4 %
Ocean 18 6 ms 23.6 %
Terrain 1~17, 19, 23~96 14 ms 54.3 %
Grass 19 6 ms 23.2 %
Character, building and effect 20~22, 97~131 1 ms 5.9 %
UI 132~178 1 ms 5.7 %

We checked if the game is still GPU bound. We did the same measurement with "Disable Draw Calls" and "Highest CPU Frequency" simulation.

Table 9. The FPS result of 1st optimization version with "Disable Draw Calls" and "Highest CPU Frequency"

Bay Trail device FPS
Original 40
Disable Draw Calls 60
Highest CPU freq. 40

In Table 9, "Disable Draw Calls" simulation increased the FPS number while "Highest CPU Frequency" simulation didn’t change the FPS number. So, we knew Looting Crown was still GPU bound. And we also checked CPU load and GPU Busy again.

Figure 8. CPU and GPU load of the 1st optimization version with Intel® GPA System Analyzer

Figure 8 shows GPU load is about 99% and CPU load is about 13% on Bay Trail. CPU still could not be a source of speedup due to GPU bottleneck on Bay Trail.

Looting Crown was originally developed for PCs, so the existing graphic resources were not suitable for mobile devices, which have lower GPU and CPU processing power. We did several optimizations to the graphic resources as follows.

  1. Minimizing Draw Calls
    1. Reduced the number of materials: The number of object materials was reduced from 10 to 2.
    2. Reduced the number of particle layers.
  2. Minimizing the number of polygons
    1. Applied LOD (level of detail) for characters using the "Simplygon" tool.
      Figure 9. A character with progressively reduced LOD
    2. Minimized number of polygons used for terrain: First, we minimized the number of polygons for faraway mountains that did not require much detail. Second, we minimized the number of polygons for flat terrain that could be represented by two triangles.
  3. Using optimized light maps
    1. Removed the dynamic lights for "Time of Day".
    2. Minimized the light map size of each mesh: Reduced the number of light maps used for the background.
  4. Minimizing the changes of render states
    1. Reduced the number of materials, which also reduced render state changes and texture changes.
  5. Decoupling the animation part in static mesh
    1. Havok engine didn’t support a partial update of an animated part of an object. An object with only a small moving mesh was being updated even for the static mesh part of the object. So, we separated the animated part (smoke, red circle on Figure 10) from the rest of the object, dividing it into two separate object models.

Figure 10. Decoupled animation of the smoke from the static mesh

4.3 Apply Z-culling efficiently

When an object is rendered by the 3D graphics card, the three-dimensional data is changed into two-dimensional data (x-y), and the Z-buffer or depth buffer is used to store the depth information (z coordinate) of each screen pixel. If two objects of the scene must be rendered in the same pixel, the GPU compares the two depths. The GPU overrides the current pixel if the new object is closer to the observer. So Z-buffer will reproduce the usual depth perception correctly. The process of Z-culling is drawing the closest objects first so that a closer object hides a farther one. Z-culling provides performance improvement on rendering of hidden surfaces.

In Looting Crown, there were two kinds of terrain drawing: Ocean drawing and Grass drawing. Because large portions of ocean were behind grass, lots of ocean areas were hidden. However, the ocean was rendered earlier than grass, which prevented efficient Z-culling. Figures 11 and 12 show the GPU duration time of drawing ocean and grass, respectively; erg 18 is for ocean and erg 19 is for grass. If grass is rendered before ocean, then the depth test would indicate that the ocean pixels would not need to be drawn. It would result in decreased GPU duration of drawing ocean. Figure 13 shows the ocean drawing cost on the second optimization. The GPU duration decreased from 6 ms to 0.3 ms.

Figure 11. Ocean drawing cost of 1st optimization

Figure 12. Grass drawing cost of 1st optimization

Figure 13. Ocean draw cost of 2nd optimization

Results

By taking these steps, mTricks changed all graphics resources to be optimized for mobile device without compromising graphics quality. Erg numbers were decreased from 1,726 to 124; Primitive count was decreased from 122,204 to 9,525.

Figure 14. The change of graphics resource

Figure 15 and Table 10 show the outcome of all these optimizations. After optimizations, FPS changed from 23 FPS to 60 FPS on the Bay Trail device.

Figure 15. FPS Increase

Table 10. Changed FPS, GPU Busy, and App CPU Load

  Baseline 1st Optimization 2nd Optimization
FPS 23 FPS 45 FPS 60 FPS
GPU Busy(%) 91% 99% 71%
App CPU Load(%) 27% 13% 22%

After the first optimization, Bay Trail still was GPU bound. We did the second optimization to reduce the GPU workload by optimizing the graphic resources and z-buffer usage. Finally the Bay Trail device hit the maximum (60) FPS. Because Android uses Vsync, 60 FPS is the maximum performance on the Android platform.

Conclusion

When you start to optimize a game, first determine where the application bottleneck is. Intel GPA can help you do this with some powerful analytic tools.If your game is CPU bound, then Intel VTune Amplifier is a helpful tool. If your game is GPU bound, then you can find more detail using Intel GPA.To fix GPU bottlenecks, you can try to find an efficient way of reducing draw calls, polygon count, and render state changes. You can also check the right size of terrain texture, animation objects, light maps, and the right order of z-buffer culling.

About the Authors

Tai Ha is an application engineer focusing on enabling online games in APAC region. He has been working for Intel since 2005 covering Intel® Architecture optimization on Healthcare, Server, Client, and Mobile platforms. Before joining Intel, Tai worked for biometric companies based in Santa Clara, USA as a security middleware architect since 1999. He received his BS in Computer Science from Hanyang University, Korea.

Jackie Lee is an Applications Engineer with Intel's Software Solutions Group, focused on performance tuning of applications on Intel® Atom™ platforms. Prior to Intel, Jackie Lee worked at LG in the electronics CTO department. He received his MS and BS in Computer Science and Engineering from The ChungAng University.

References

The looting crown IA version is now released on Google Play:

https://play.google.com/store/apps/details?id=com.barunsonena.looting

Intel® Graphics Performance Analyzers
https://software.intel.com/en-us/vcsource/tools/intel-gpa

Havok
http://www.havok.com

mTricks
https://www.facebook.com/mtricksgame

Intel, the Intel logo, and Atom are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here