C3 AI is seeking a Site Reliability Engineer to join our team in London, England.
Responsibilities:
- Work with customers to design and implement customized installations of the C3 AI Platform that meet unique access and security requirements
- Maximize system uptime and availability, ensuring functional and performance SLAs
- Establish end-to-end monitoring and alerting on all critical aspects
- Solve complex problems for critical services and build automation to prevent problem recurrence
- Initiate and lead scripting and automation to streamline system updates and upgrades
- Set up critical infrastructure, tools, and framework to streamline the deployment cycle
- Work cross-functionally with Services and Engineering teams
Qualifications:
- Bachelor’s degree in a Science, Technology, Engineering or Mathematics (STEM), or comparable area of study
- Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Kubernetes-based infrastructure in AWS, GCP, and other public clouds
- Expertise in Linux Operating Systems, Networking, and Database concepts
- Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP
- Experience with Infrastructure-as-Code configurations such as Terraform, Ansible, or Puppet
- Experience in Ruby, Bash, or Python; to automate and monitor systems
- Excellent problem-solving, critical thinking, and communication skills
- Experience supporting as a DevOps or sys admin for commercial SaaS solutions. Customer facing experience is a plus.
C3 AI provides a competitive compensation package and excellent benefits.