Imagine you’re driving a car, but there’s no windshield. You can’t see in front of you. On top of that, your car is full of friends looking out the side windows and yelling various things to you: “We should turn left!” “No, we should turn right!” “I’m pretty sure the next turn is in two miles.”
In this scenario, how likely are you to reach your destination instead of ending up in a ditch or careening off a bridge?
DevOps is a great mindset and can accelerate your team to new speeds. But it brings with it many difficulties. Application performance monitoring, or APM, is like the windshield of your car. With it, you can navigate such roadblocks. Without it, you may end up in a ditch.
In this article, you’ll learn what these roadblocks are and how DevOps APM can help you navigate them.
The definition of DevOps can get convoluted, but it essentially means your team owns your software now. You are not only in charge to make it but also to maintain, deploy, and fix it. And you do this across all environments in which the software lives. This ownership should give you a lot of freedom to tweak your system as you need without going through another team.
But that ownership comes with a price: The buck stops with your team. No one’s going to come save you if you get stuck in a ditch. There’s no separate ops team to ease the burden of production support. So, if you drive into DevOps blind, you may hit many obstacles for which you aren’t ready.
What are some key roadblocks associated with DevOps?
As a software developer, you probably trust your gut a lot. Much like the friends yelling directions mentioned earlier, you have strong opinions on how to design code. Yet if you have no data to back up your instincts, you may do more harm than good.
All architecture has tradeoffs. If you speed up one part of the application, then you’re likely to slow down others. If you can’t monitor where real performance bottlenecks are, then you may design the wrong patterns. In places where the users may need something fast, you may make it even slower. On an action where your users don’t care about speed, you may overengineer.
If DevOps isn’t handled correctly, then the work of two teams can collapse onto the shoulders of one team. In the past, you might have had an entire team of people dedicated to supporting the application in production, but that’s now your job. Even when your team handles the situation correctly, this new burden can catch you all off guard. The time it takes to triage and troubleshoot incidents, as well as communicate with customers, can easily pile up. Not only will this new work slow you down if it’s mismanaged, but it can also burn you out. This roadblock reminds me of a fantastic clip from the show I Love Lucy when Lucy and Ethel are trying to keep up wrapping all the candies coming down the assembly line.
It can be quite embarrassing when you have no idea something is wrong with your application until a customer tells you. I’ve even seen teams that didn’t know their system was completely down until a customer emailed them. The quicker you know there’s a problem, the faster you can react. And the faster you react, the less time there is for customers to get angry. When you’re running blind on what problems your app has, it takes longer for you to fix those problems. And that means customers get angry.
With strong application performance management, you can deftly swerve around these roadblocks. APM will give you much of the visibility you need to steer your team appropriately.
When people think about APM, they mostly think of it as a way of looking at performance metrics. Performance can mean a few things here.
Performance metrics give you the data you need to make wise architectural decisions. You can find where your real bottlenecks are by looking at your latency across your system. Then you can tweak the right areas of your app to give the appropriate performance to your users.
You can also use performance metrics to find out where things may be going wrong. For example, if you note that your error rates have spiked by 30% since Thursday, it’s possible you put in a bug from the Wednesday deploy. You can quickly react to that, even before customers find out. Advanced APM tools, such as Retrace, can even point out likely places of concern in your system proactively, preventing these issues from becoming large problems.
Error tracking lets you dive deep into what sort of error response your system is regularly producing. Performance metrics can give you a rough signal, but you’ll need to drill down and figure out exactly what’s going wrong. The sooner you do this, the fewer customers are likely to experience these errors. It’ll also ease your production support burden because you can get the information you need to resolve customer problems quickly.
An APM with error tracking makes it clear what sort of errors are occurring and when. It lets you see what events caused the errors. It also gives you a wide variety of diagnostic data related to the errors.
Systems are more complex and distributed across more processes than ever before. Instead of one monolithic application running on one server, you have many coordinating services running across dozens of servers or more. You can no longer simply look at the log files of each of these and mentally stitch together what sorts of events are occurring. That would be like reading a book where every paragraph requires you to jump to a different volume, rotating through them until you finish a chapter.
You need a centralized place to gather these log events and look at them cohesively. APM often provides storage for these logs. It also lets you index this data so that you can easily search for the information you need. This lets you turn around customer incidents quickly, easing your production support burden.
Code profiling is also called tracing. It’s the ability to follow a request from start to finish. In a similar vein to centralized logging, you can see a cohesive story. An effective APM tool can profile a single request from when the customer sent it to when the server sent a response, including database calls. Centralized logging and monitoring can show you what the system’s doing as a whole, but code profiling lets you pinpoint the cause of a specific issue.
With performance metrics, centralized logging, and code profiling, you can have blazing fast response times to production incidents.
Driving on the road of DevOps is freeing but dangerous. Significant roadblocks can stand in your way. But with strong application performance management in place, you can steer clear of the debris and have a team based on speed and quality.
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]