Engineering Manager - Site Reliability Engineering/SRE (f/m/d)*

Parloa • Full-time • Berlin Office • 1w ago

YOUR MISSION:

As an Engineering Manager for Site Reliability Engineering & Developer Experience (f/m/d) at Parloa, you will nurture and support a collaborative team that ensures the reliability, scalability, and performance of our products while empowering engineers with thoughtful tools and workflows. Your mission is to cultivate and grow SRE practices that enable our architectural transformation and ensuring our systems meet availability targets. You'll foster a caring culture where reliability is a shared responsibility and where automation helps reduce toil, empowering engineers to focus on meaningful work.

IN THIS ROLE YOU WILL:

Build and nurture a supportive team that harmonizes SRE excellence with developer experience
Collaborate to establish SRE practices: SLI/SLOs, error budgets, and trust-based postmortems using Datadog metrics
Create comprehensive observability strategies leveraging our monitoring stack
Support sustainable incident response, on-call processes, and automation using GitHub Actions and Terraform to improve MTTR
Partner with engineers and engineering teams to integrate reliability practices into CI/CD pipelines (ArgoCD, GitHub Actions) while supporting developer wellbeing
Foster adoption of reliability best practices across our Kubernetes-based infrastructure through mentorship and collaboration
Guide teams in leveraging our Azure cloud platform effectively while preparing for multi-cloud architectures
Support the thoughtful adoption of AI tools (Cursor, GitHub Copilot) for operational efficiency

WHAT YOU BRING TO THE TABLE:

Experience supporting SRE, DevOps, or platform teams with focus on reliability and collaboration
Understanding of SRE principles: SLI/SLOs, error budgets, and toil reduction
Hands-on experience with our observability stack (Datadog for metrics/APM, ELK for sensitive logs) and production systems at scale
Deep empathy for developer workflows and creating sustainable on-call processes that support work-life balance
Familiarity with Infrastructure as Code using Terraform and container orchestration with Kubernetes
Experience with CI/CD platforms (GitHub Actions, ArgoCD) and integrating reliability into deployment pipelines
Understanding of Azure cloud services and preparing organizations for multi-cloud transformations
Warm communication skills for incident support, postmortem facilitation, and cross-team collaboration
Experience with authentication systems (e.g. Okta, EntraID) and their role in system reliability
Commitment to building inclusive teams that balance operational care with innovation and wellbeing
Appreciation for AI-assisted tools in improving operational efficiency and reducing toil
Background with databases (MySQL, Redis, MongoDB) and their reliability considerations is valued
Experience with multi-cloud architectures and distributed systems is warmly welcomed

WHAT'S IN IT FOR YOU?

Join a diverse team of 40+ nationalities with flat hierarchies and a collaborative company culture.
Opportunity to build and scale your career at the intersection of customer-facing roles and engineering in a dynamic startup on its journey to become an international leader in SaaS platforms for Conversational AI.
Deutschland ticket, Urban Sports Club, Job Rad, Nilo Health, weekly sponsored office lunches
Competitive compensation and equity package.
Flexible working hours, 28 vacation days and workation opportunities.
Access to a training and development budget for continuous professional growth.
Regular team events, game nights, and other social activities.
Hybrid work environment. However, we love to build real connections and want to welcome everyone in our beautiful Berlin office on certain days.