Microservice-based architecture comes with many advantages over monoliths, especially in areas of scalability, enhance-ability, and maintainability of the application, as instead of a big application we are dealing will smaller pieces that are easier to manage and update.
But every good thing comes with some challenges, and in the case of microservice-based architecture, monitoring of application is one such challenge.
Earlier you were looking at one place for logs, server health, etc for any issues or status. But with such a s distributed system where we have tens or hundreds of microservices, it is difficult to monitor the status of each service individually. For example, say we have a scenario where a service is calling another service which in turn might be calling another service to fulfill a user’s request. Now if a request is failing or responding very slow, which of the service is the culprit? Which logs are to be analyzed?
To solve this issue, we have a set of practices that can help us to build a robust and effective Monitoring Strategy.
Before getting into the Strategy to monitor microservices, let’s take a look at a few core concepts that one needs to be aware of, which are Logs, Metrics, and Traceability.
Logs: Logs are the first place you will look at if you see your application is not behaving in an expected manner. Your application emits logs to publish the current state. Logs are mostly categorized into, Debug, Info, Warning, and Error.
Metrics: Metrics are Time series data published by applications to provide a quick view of an aspect that changes with time depending on external conditions like request traffic. For example, Latency Metrics can show data like if 95% of all calls respond under 300ms.
Traceability: Traceability is very important when it comes to distributed systems with multiple microservices. Say Service A calls Service B which calls C and so on. If you see requests failing or responding slowly, you need to track which services are facing issues. Traceability helps track the journey of a request and monitor it at every step.