Every morning I get up early, get on the highway and drive into work. There may be a little snag through the Grandview triangle (it’s kind of like KC’s version of the Bermuda Triangle), but it’s usually just a 20 minute drive if I leave at 7:00 a.m. After 7:30 a.m., that drive turns into an hour as fours times the commuters enter the triangle. Now, if there is an accident? Well, I’ll hope to see the office around lunch.
I shouldn’t complain. I have it EASY compared to other commuters (LA, Atlanta, Chicago—I don’t even want to know)!
Every browser is kind of like the Grandview triangle. It works well until something creates a snag. The higher the volume, the worse the bottleneck.
Stackify is a data company. We have lots of it, and all that data changes at a rapid pace. The nature of what we offer demands both large amounts of aggregate data and detail data available at your fingertips, often in the same “page” of information. If you combine lots of data with high rate of data change, guess what…you’ve just turned your browser into rush hour on the Grandview triangle.
How do you keep things flowing and reduce snags? How do you keep performance up?
Like the highway, some of our data flows in haphazardly and in large, fast bursts of information. Our monitoring pages show lists of monitors whose statuses and values can change quickly. These monitors, when updated cause animations and DOM manipulation. The browser has no control over when these updates are transmitted, and may receive updates for the same monitors milliseconds apart.
There are two main performance problems that we had to deal with for these monitors.
As any UI guy will tell you, changing the DOM can cause a lot of CPU cycles to be spent painting the page on the screen. There are many solutions to handling this volume of data from a DOM perspective. React.js is designed to handle the issues of DOM thrashing internally, but if you’re using something like Knockout.js (like we do) you have to handle DOM thrashing manually.
One solution that Knockout.js offers natively is the “throttle” extension. What this does is separate the changing of the observable from its dependencies taking action on the change. So if you throttle the the observable for one second, you can make as many changes to the observable’s value as you want and the dependencies will only act on that change once every second. The throttle extension uses a setTimeout call internally to handle this functionality.
The throttle extension was the wrong decision for us to monitor pages with. While it dramatically reduced the amount of DOM thrashing that could occur, it raised up a different problem: setTimeout was resource expensive. Overzealous use of the throttle created many setTimeout instances and those ate memory and CPU cycles quickly. We limit use of throttle to direct UI user changes, i.e. changing a search filter or clicking a toggle.
While throttle was the right direction, doing it piecemeal was too expensive. This led to using a static interval. An interval allows us to marshal the accumulation of changes to the monitor entity, only updating the DOM at a set timing and only for those entities that actually did change. The DOM can now change at a measured and predictable pace for the entire set of entities, rather than thrashing as individual entities are changing. We also found that absolute real time updates don’t add any additional user experience benefit over a relatively fast interval update.
The monitors that are updated are sorted, but not in a linear fashion. The ID of the monitor does not correspond at all to an order on the page. We also receive the data haphazardly, so seek performance in the javascript is paramount. Choosing the correct approach to find the entity to update is very important to avoid unnecessary CPU cycles.
If you have a small array, using a simple linear search with find or findIndex might be performant enough. But, when you get into medium, large and really large arrays, find will perform poorly. If you have a large array or you’re doing many find operations, stay far away from a linear search. For the case of the monitors, find was like a rolling roadblock—it brought everything to a halt regardless of the time of day.
If the array you are seeking against is sorted linear in some way then generally a binary search provides the best performance. Binary searches are very fast and if you can use one, do so. In the case of the monitors, the entities were not sorted in a linear fashion and the ID’s of the entities did not correspond to an easy or contiguous index position. We would have loved to use this search but it wasn’t in the cards.
Since the data doesn’t work well with a linear or binary search, we chose a Dictionary approach. Using the ID of the entity you can create a key (e.g. “key_1”) that is predictable, repeatable and simple. Then an index operation can reference the entity stored on the Dictionary. This provides a consistent performance level that doesn’t change all that much with the size of the dictionary.
I generally tend to gravitate towards the Dictionary method because the nature of our data doesn’t match up well to sorted arrays as primary containers. However, this is also a very testable benchmark to make. Here is a JSFiddle that will run performance tests comparing the different seek methods (mileage will vary depending upon your native browser performance):
https://jsfiddle.net/ghmyzty4/show/
The Linear “Find” method had the worst performance of all as you would expect. A linear search is a Big O(n) search. The Binary search against the sorted array was the fastest method. It has a Big O(log n) search performance. The Dictionary method has a very fast performance as well. The results of the fiddle benchmark looks at a glance like Big O(log (n+1)) but take that with a grain of salt.
Now go “test” someone’s browser with an infinite loop!
Dan Gidman
Sr. Developer / Stackify
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]