How 2wav Monitors Web Applications in the Cloud - Part 1

We've built a lot of web applications over the years. Regardless of the technologies we use, or their purpose, all the applications have one thing in common: they need monitoring. Our devops specialist André Brown was kind enough to write a rather comprehensive guide to how we do it. Here's part one!
Monitoring is essential to ensuring that your web application is serving users as expected. If your application is down, you want to notify your users in a timely manner, preferably before they start to email you or tweet about it. Deploying applications in the cloud requires multiple levels of monitoring to ensure that you’re keeping an eye on all the pieces that can go wrong.

In this series of blog posts we share with you how 2wav manages the following aspects of monitoring:

  • Monitoring resource utilisation
  • Monitoring the application stack
  • Ensuring that our sites are actually up
  • Monitoring our infrastructure provider
  • Managing notifications

This week we'll talk about the tools that we use, and how we monitor the basic resources required to run any web application.

Monitoring Tools

There are a plethora of monitoring tools available, but most offer the same set of features. Your choice will likely come down to personal preference and ease of use.

If you have little interest in setting up your own tool, and don’t mind paying a subscription fee for one, you can get basic monitoring up and running with little effort using a cloud based monitoring tool. Here are a few that we have experimented with at 2wav:

AWS CloudWatch

https://aws.amazon.com/cloudwatch/
2wav relies on AWS for hosting our applications, as well as for hosting most of our client’s applications. So it goes without saying that the first thing we looked at was AWS’s monitoring solution.

AWS offers basic monitoring through CloudWatch. CloudWatch isn’t very useful for server monitoring, lacking even the ability to monitor memory usage. However, where it comes in useful is when you’re building more complex applications using multiple AWS services. For example, it is indispensable or monitoring your resource utilization to trigger scaling events.

DataDog

http://datadoghq.com
DataDog is a comprehensive monitoring service that offers server, infrastructure, application, and endpoint monitoring. It can be used to monitor on server resource utilization, such as how much memory you’re consuming, or hard drive utilization. It can also be used to monitor your cloud provider’s infrastructure through integrations. For example, they integrate with Amazon Web Services to offer centralized monitoring of the AWS stack.

Monitor.us

http://monitor.us
Monitor.us offers website uptime monitoring, server health monitoring, network performance monitoring, and even monitoring for your mobile apps. Unlike other providers we tried, instead of just offering a free trial period, they offer a free tier which you can continue to use after your trial is over.

Sensu

http://sensuapp.org
If you are averse to paying a monthly subscription, or need more control over what you monitor, then you can roll your own monitoring using open source tools and frameworks. After reviewing a few, we settled on Sensu, a relatively newcomer to the monitoring game.

Sensu offer a comprehensive monitoring framework that can be customized to monitor pretty much anything you need to keep an eye on, whether that’s software or hardware. It has a collection of open source plugins than can be used to extend core features. And if you’re not interested in putting together a collection of open source plugins, you can purchase the Enterprise version which has built-in plugins and integrations with enterprise tools.

During this series of blog posts, we will tell you how we use Sensu to keep an eye on the things that matter to us.

Resource Utilisation

The basics of monitoring involve keeping an eye on the basic building blocks of any web application—the resources available on your server. This is true if you're running servers on dedicated hardware that you had in your home basement or running a Tier 3 datacenter, and it remains true for monitoring in the cloud.

CPU activity, memory usage, and storage consumption are some key resources that you’ll want to keep an eye on. Below, we talk about how to monitor each of these.

CPU Activity

Spikes in CPU utilization may indicate that your server is busy serving clients, or that some errant process has claimed the CPU as its servant. In either case, you will want to know about it.

At 2wav, we use the Sensu CPU checks plugins (https://github.com/sensu-plugins/sensu-plugins-cpu-checks) to monitor CPU activity. With the default settings, it warns us when CPU activity spikes to 80%, and throws a critical warning if it increases to 90%. In a later post, we will talk about how we avoid getting overwhelmed by notifications for normal activities that may cause CPU usage spikes.

Memory Usage

The Sensu community also offers a plugin to monitor memory usage. With this plugin, you can check how much RAM or swap is available on your server, which is great if you need to ensure that all servers are deployed with a minimum amount of memory. You can also check memory usage in megabytes or percentages, and warn when usage is high.

Most of our servers at 2wav are cloud servers that don’t use swap, so we primarily monitor memory usage as a percentage of available memory and trigger notifications when memory consumption is high.

Storage Space

Storage consumption can be a pretty nasty gotcha with web application servers. Things like logs that aren’t being rotated,or application bugs can result in your storage space being exhausted. To make things worse, you can end up unable to run commands to diagnose the issue because your operating system has no space to write temporary files!

On more than one occasion, memory monitoring has helped 2wav to identify application bugs that were consuming dangerous amounts of storage space on development and production servers. We warn on on 80% disk usage, and investigate these warnings before they have a chance to create system lockups.

Baselines and Thresholds

Once you’ve decided what to monitor, you’ll want to establish what normal application usage looks like. You can do this by profiling your application under load, or you can make an educated guess. Whichever way you go, you’ll want to setup alerts to notify you when you’re nearing your threshold, and when you’ve actually hit. Sensu allows us to do both, as all checks have a warning threshold and a critical threshold.

For example, when monitoring storage space, you don’t want to be notified when your hard drive is full, you’ll want to know ahead of time that you’re getting there. A warning when you hit 70% will give you enough time to clean up, or to plan an upgrade. A second warning at 90% should remind you if you forgot. Chances are, if you reach 100% you’ll get warning from other things—like customers tweeting to say that your site is down.

In our next installment in the series, we'll take monitoring a step further by monitoring the applications that keep your web app online.