In previous years I was part owner in a small web business. The business was based around the idea of dynamically creating Flash websites through a real-time GUI that was easy enough to use that even Grandpa Bert could make, create, and design his own website in 30 minutes. We had a great front end team with highly skilled individuals. Unfortunately, a business decision was made to out source the backend to a team who charged a mere fraction of the cost of an in-house team and this was the decision that killed our business. How so?
It’s not a bad thing to want a deal. We buy stuff on sale all the time (such as my brand new 55″ 3D TV at a huge discount which is great for Nintendo Land). So what is the difference between developers and the rates they charge? Why might my hourly rate be more than developer “Guy Foo”? And even though it might be, how will the client still save money in the end?
I recently joined a project about 2 months into it. The application was suffering from lag when scrolling a ListView component. The amount of lag was more than just noticeable but significantly interrupted the usability of the application. At this point there had been 2 developers, 1 project manager, and 2 quality assurance people working for 2 weeks on this issue among a slew of others. The following is comparative analysis report I wrote for the client detailing the changes in performance from my refactoring. Let’s begin:
A Quantitative Analysis of a Refactored Project
While I could see the lag, the first thing I did was confirm my suspicions that the application was taking too long to render a frame. To measure the drawing I profiled scrolling the ListView by doing a system dump of the “gfxinfo” and mapped it to a bar graph. The bars in the diagram below represent how long each frame took to draw. Anything over 16 milliseconds is lag and becomes noticeable to the user.
There are two spikes in the graphic that occur when an image is added to the renderers in the ListView. A user sees this as jitter. Below is another graph but instead of profiling the scrolling on a ListView it profiles the opening layout of the application.
Notice the y-axis is out of 100 milliseconds. The first quarter of the graph is all above the 16-millisecond threshold as well as the part. 80 milliseconds of lag is very noticeable to the user.
The graph below is from after the refactoring. Notice the y-axis is out of 12 milliseconds. When the ListView is now scrolled there is no lag in drawing as it is under the 16-millisecond threshold.
The next graph is of the startup time. As before, each bar represents a single frame of execution. After the refactoring only one or two frames are above the 16-millisecond threshold during the opening of the application. This is a significant improvement. Had time permitted, the xml layouts could have been refactored further reduce these times.
Must Go Deeper
While measuring how long each frame takes to render gives a very good idea what is going on it doesn’t tell the full story. There could be lag between the frames which would cause rendering to be skipped. I checked this by running a “systrace” on the application. The below graph is from before the refactoring and was profiled while scrolling the ListView.
The green represents how long Android spent going through all the UI elements on the screen (traversing the hierarchy) and the blue represents how long Android spent measuring the layouts. The small bars of purple are how long the drawing took. Notice how large the first 4 blocks are. These large blocks are measured around 183 milliseconds. This is 183 milliseconds between drawing frames which is 11 drawing cycles skipped. The UI lagged for 11 frames before the next update was made. You can see that below the large blocks there is no purple which means the draw was skipped.
Below is the graph after the refactoring. Both graphs have the same x-axis time scale of starting at 1 second and ending at 2 seconds (means the widths of the gree and blue bars can be compared). Notice the frames are much smaller. Here the layout and traverse take 5.5 milliseconds which is below the 16-millisecond threshold and a draw occurs.
Draw All The Things!
Drawing is an expensive operation. Android can draw the entire screen about 1.5 times before it lags. The below graphs are heat maps of how many layers Android had to draw to render the UI. The coloring means:
- No Color – no overdraw occurred
- Blue – 1 overdraw occurred
- Green – 2 overdraws
- Red – 3 overdraws
- Dark Red – 4 or more overdraws
An efficient application will be in the blue with a little bit of green and reds. A little red/dark red is ok.
Below is the Overdraw map before the refactoring. Notice it is very red.
And the below image is from the refactoring.
It’s now mostly blue. The ListView is 2 steps of improvement, down from red and green to blue.
The graphs below show the CPU consumption from scrolling the ListView before and after refactoring, respectively. The green bar represents the time the CPU spent in processing the application while the red deals with the Linux kernel. The refactored version consumes about a third of the CPU from before.
Nesting Is For the Birds
While there are still more ways to profile an application the final test I did was to look at the view hierarchy. The below two images show the hierarchy before and after refactoring. Notice it’s nearly the same. While the hierarchy is rather deep, due to time and money constraints I only refactored small pieces of the xml layout files. While not ideal it’s okay as the application performs well despite the deep hierarchy.
There Are Other Things
There were other things I could have done to gain even more performance such as limiting the “requestLayout”s of the ImageViews in the ListView. However, without refactoring the xml layouts most of the other performance increases couldn’t have been done. But even without this, the application is running great and each frame renders under the 16-millisecond threshold.
It’s All About the $$$
To get these performance increases, the changes I made in refactoring included:
- The splash page
- The interstitial ads
- The banner ads
- The Slide-to-Open Pull Menu
- The navigation
- The main content page and it’s 3 subpages
- And finally a popup dialog.
In addition to refactoring the application for performance increases, I was able to properly encapsulate these areas. All of changes took 19 hours of my time.
There are hidden cost of development.
That was the end of the report. As I mentioned above, before I refactored there were 2 developers, 1 project manager, and 2 QA people working on and trying to fix these issues for 2 full weeks. Had the decision been made to do this from the start the difference would have been 381 saved man-hours and a better application from the initial launch. Additionally, this was just 1 area of the application. There were other areas where QA was taking longer than it should have. There are hidden cost of development. Going back to my original story about my web business, after 18 months of working with the outsourcing team and it being an utter failure, the team was completely fired. We decided to throw away the entire code base and 18 months of work by 5 people. Me and the other front end developer rewrote the entire backend in 4 months and it was great. The company lost 18 months of salaries, plus an additional 4 months to rewrite the backend, plus 2 years of lost business revenue, among other things. In the end, it was just too much for the company to survive.
I’m a firmer believer that when you hire, hire good and hire well. You’ll be more likely to end up with a superior product that’s extensible and flexible to change and will out perform what would have been the alternative. And in the end it will cost less to develop.