We are seeking a Site Reliability Engineer (SRE) with strong computer science fundamentals, hands-on development experience, and a solid understanding of system design and observability. In this role, you'll work closely with development teams to ensure system reliability, scalability, and performance - and take ownership in resolving complex production challenges. Strong knowledge of computer science fundamentals (data structures, algorithms, OS, networking). Software development experience with at least one modern language (e.g., Go, Python, Java, C#). Experience designing and supporting distributed systems and microservices. Proficiency in troubleshooting complex issues in production environments. Hands-on experience with observability tools (e.g., Prometheus, Grafana, OpenTelemetry). Working knowledge of Kubernetes and containerized infrastructure. Familiarity with cloud platforms (AWS, GCP, Azure). Solid understanding of Linux systems and scripting. Experience with CI/CD pipelines and DevOps practices. Strong problem-solving skills and a curious, analytical mindset. Effective communicator - able to clearly explain technical concepts to both engineers and non-engineers. Team player with a collaborative approach to working across engineering, product, and operations. Takes ownership and initiative. Comfortable working in a fast-paced, evolving environment. Attention to detail while keeping an eye on the bigger picture. Eager to learn continuously and stay up to date with emerging technologies and practices Opportunities for professional growth and career advancementCompetitive salary and bonusesComprehensive insurance packageSupportive and positive work environmentVisa Premium salary cardCorporate discounts and eventsAdditional vacation daysDiscounted education and employee loan opportunitiesNew and cozy studio office at Port Baku Tower 2Strong teamwork based on Agile principles, regular team-building activities, and themed eventsMulticultural environment with knowledge sharing from foreign colleagues Collaborate with development and infrastructure teams to design and maintain scalable, reliable systems. Write automation and tooling for infrastructure management, deployments, and operational tasks. Build and maintain observability stacks (metrics, logs, traces) to ensure visibility and fast issue resolution. Lead and participate in troubleshooting and debugging efforts across the full stack - application, platform, and infrastructure. Conduct post-incident analysis, drive root cause investigations, and implement long-term fixes. Define and monitor service level objectives (SLOs), indicators (SLIs), and error budgets. Contribute to CI/CD pipelines and infrastructure as code efforts. - Continuously seek to improve system performance, resilience, and developer experience