Monitoring Kubernetes Clusters with Prometheus

Monitoring Kubernetes with Prometheus
The growing adoption of microservices and distributed applications gave rise to the container revolution. Running on containers necessitated orchestration tooling, like Kubernetes. But managing the availability, performance, and deployment of containers is not the only challenge. It is important to not only be able to deploy and manage these distributed applications, but also to monitor them. An observability strategy needs to be in place in order to keep track of all the dynamic components in a containerized microservices ecosystem. Such a strategy allows you to see whether your system is operating as expected, and to be alerted when it isn’t. You can then drill down for troubleshooting and incident investigation, and view trends over time.

Kubernetes can simplify the management of your containerized applications and services across different cloud services. It can be a double-edged sword, though, as it also adds complexity to your system by introducing a lot of new layers and abstractions, which translates to more components and services that need to be monitored. This makes observability even more critical.

There are many open-source tools that can help you monitor your applications running in your Kubernetes cluster. In this post, we will talk about Prometheus, and discuss how to configure them to monitor your Kubernetes applications and services, at scale.

Prometheus
Prometheus is an open-source application used for metrics-based monitoring and alerting. It calls out to your application, pulls real-time metrics, compresses and stores them in a time-series database. It offers a powerful data model and a query language and can provide detailed and actionable metrics. Like Kubernetes, the Prometheus project has reached a mature “graduated” stage with CNCF.

Prometheus Limitations and Challenges
While Prometheus is a great solution for your monitoring needs, it is also purposely designed to be simple and versatile. It is meant to store all of its compressed metrics into a single host, in its own time-series database on disk. Prometheus is not designed for long-term storage (so you can keep data for a few weeks, but not for months or years), and the storage layer is not distributed (all the data is on one machine). Prometheus is great for alerting and short-term trends, but not for more historical data needs (i.e. for use cases such as capacity planning or billing – where historical data is important and you typically want to keep data for a long time).

Prometheus does not provide multi-tenancy ; which means that it can scrape many targets, but has no concepts of different users, authorization, or keeping things “separate” between users accessing the metrics. Anyone with access to the query endpoint and web endpoints can see all the data. This is the same for capacity isolation . If one user or target sends too many metrics, it may break the Prometheus server for everyone. All of these factors limit its scalability , which can make running Prometheus in an enterprise environment challenging.

Setting up Cluster Monitoring on Platform9

When you create a new Managed Kubernetes cluster, Platform9 automatically enroll the cluster into our SaaS based monitoring for your cluster’s health. This means if there is a problem with the cluster or nodes we will notify you. In addition to this you can also deploy Prometheus`` and Grafana``` during the Cluster creation. Selecting monitoring will deploy a managed deployment providing cluster level metrics and alarms that supplement and enrich our SaaS based monitoring.

Important: Monitoring is turned on by default for all clusters created starting Managed Kubernetes version 4.1 onwards. If you created a cluster with a version of Managed Kubernetes older than 4.1, you can turn on monitoring for your cluster by following the steps below.

Follow the steps given below to enable monitoring on existing Managed Kubernetes cluster.

If a cluster is already running you may enable Monitoring from the Infrastructure Clusters dashboard by selecting the desired cluster and clicking the Enable Monitoring button. Once enabled Platform9 will create a dedicated Namespace within the cluster and utilize an Operator to setup and run Prometheus, Alertmanger and Grafana. After the deployment completes a Grafana Dashboard link will be available on the Infrastructure Clusters view as well as the Cluster Details view.

Some key links:
Signup for Platform9 Free Tier: platform9.com/signup
Setting up monitoring for DIY Kubernetes: https://platform9.com/blog/kubernetes-monitoring-at-scale-with-prometheus-and-cortex/