Grafana Labs
It is a company that supports many open-source projects, such as Grafana, Prometheus (monitoring system and time series database), Loki (log aggregation system), Tempo (tracing service), Mimir (Prometheus log-term storage), and many others.

Grafana Itself
Grafana is a service that acts as a “First Pane of Glass” for observability on all your systems. It uses many data sources without replicating their data; it just consumes directly from those data sources through Grafana plugins. Inside Grafana, you can visualize, alert and correlate your systems’ data.

Problem (Operational Blindness)
In modern infrastructures, a single transaction passes through many services. Each of those services is usually handled by a specific team. So, when an outage appears, all teams need to work together to find out the breaking-point. If the teams don’t have a shared space for investigating the issue, the collaboration will be ineffective and this might result in an increased MTTR and unresolved RCAs.
Some SRE metrics require different kinds of information (e.g. The Four Golden Signals require information about latency, traffic, errors, and saturation; all these might not be extractable from logs alone). Some industry-standard observability applications don’t have all these types of data, and your SRE practices might be spread across an observability stack. That slows down onboarding (and might affect operations as well).
Solution
Grafana presents itself as a tool to integrate all these different data sources and create a single point of contact for everyone at a company. With Grafana, you can do a set of useful things around observability.
- Create personalized dashboards for different audiences.
- Correlate data from different data sources (e.g. Splunk, Dynatrace, Jira).
- Set alerts based on queries using different types of query mechanisms.
- Automatize the generation of reports through its interface.
- Query the other data-sources directly from Grafana (it can be used as a “Single Pane of Glass” for your organization).
- Use out-of-the-box machine learning features on your data.
- Do all the above with code and go through your existent quality assurance process (use Terraform or REST APIs).

Glossary
- RCA: Root-Cause Analysis
- MTTR: Mean Time to Recovery
Â
Â
Â