Date:  Feb 20, 2026

Observability Engineer

Location: 

ID

Level: 
Employment Status:  Permanent
Department:  Group Digital Commercial
Description: 

About the Role

We are seeking an Observability Engineer to enhance the visibility, performance insight, and operational intelligence of our cloud-native and hybrid systems. This role focuses on designing and implementing observability strategies that provide deep insight into system health, performance, and user experience.

You will work closely with platform engineers, SREs, and application teams to instrument services, implement telemetry standards, and build actionable monitoring that enables proactive incident prevention and faster troubleshooting.

This role blends SRE fundamentals, telemetry engineering, and application-level instrumentation.

  

Key Responsibilities (1/2)

  🔹 Observability Strategy & Platform Ownership

  • Design and implement end-to-end observability architecture across hybrid and cloud environments.
  • Define telemetry standards for metrics, logs, and traces.
  • Ensure full service visibility across microservices and infrastructure layers.

  

🔹 Metrics, Monitoring & Alerting

  • Build and maintain monitoring solutions using:
  • Develop actionable alerting strategies that reduce noise and improve signal accuracy.
  • Tune alert thresholds and implement intelligent escalation logic.
  • Define service health indicators and golden signals.

Key Responsibilities (2/2)

Logging & Log Intelligence

  • Implement centralized logging using ELK Stack or Grafana Loki.
  • Build structured logging standards and log correlation strategies.
  • Enable log-driven troubleshooting and anomaly detection.

  

🔹 Distributed Tracing & Telemetry Instrumentation

  • Implement distributed tracing using OpenTelemetry.
  • Instrument applications and services to expose telemetry data.
  • Work with developers to integrate tracing and metrics into application code.
  • Ensure trace correlation between logs, metrics, and spans.

  

🔹 Application-Level Observability

  • Collaborate with development teams to embed observability into services.
  • Define telemetry instrumentation standards for microservices.
  • Support performance profiling and latency analysis.
  • Ensure end-to-end transaction visibility.

  

🔹 CI/CD & Observability Integration

  • Integrate observability checks into CI/CD pipelines.
  • Ensure deployments include telemetry validation and monitoring readiness.
  • Support reliability gates and observability-driven deployment validation.

 

🔹 Performance & Reliability Insights

  • Analyze system performance trends and detect anomalies.
  • Support capacity planning and performance optimization.
  • Provide insights to improve system reliability and user experience.

Required Qualifications

 Education

 Bachelor’s degree in Computer Science, Informatics, Information Systems, Electrical Engineering, Mathematics/Statistics, or related field.

 

 Experience

  

  • 2–5 years experience in Observability, SRE, DevOps, or Platform Engineering.
  • Experience supporting production systems and troubleshooting complex distributed systems.

  

Technical Skills

 

 Observability & Monitoring

  • Hands-on experience with Datadog, Prometheus, and Grafana.
  • Experience designing actionable alerting & reducing alert fatigue.
  • Understanding of golden signals and service health metrics.

 

 Telemetry & Tracing

  • Experience with OpenTelemetry instrumentation.
  • Strong understanding of distributed tracing concepts.
  • Knowledge of metrics, logs, and traces correlation.

 

 Logging & Analysis

  • Experience with ELK Stack or Loki.
  • Structured logging and log parsing strategies.

  

Platform & Infrastructure

  • Familiarity with Kubernetes environments.
  • Understanding of microservices architecture.
  • Basic cloud platform knowledge (GCP preferred).

 

Programming & Automation

  • Experience with Bash, Python or similar scripting languages.
  • Ability to instrument services and analyze telemetry data.

  

 What Makes You Successful in This Role

  

  • You can distinguish signal vs noise in monitoring.
  • You think in telemetry, visibility, and system behavior, not just dashboards.
  • You collaborate with developers to improve observability inside applications.
  • You design monitoring that prevents incidents — not just racts to them.