Site Reliability Engineer

Overview

Phaidra is looking for a driven Site Reliability Engineer to be a part of our engineering team. You are bold and creative, and have deep empathy for customers who may not be tech-savvy. You will work in the Infrastructure Engineering team to build and maintain world class infrastructure. You will have the opportunity to make an immediate impact with your work and guide the product and team as we grow.

Job Description

Phaidra is building the future of industrial automation. Phaidra creates AI-powered control systems for the industrial sector, enabling industrial facilities to automatically learn and improve over time. Phaidra uses reinforcement learning algorithms to provide this intelligence, converting raw sensor data into high-value actions and decisions. Phaidra focuses on industrial applications, which tend to be well-sensorized with measurable KPIs — perfect for reinforcement learning.

Responsibilities

- Build and maintain infrastructure for large-scale data ingestion and processing
- Enable distributed model training, evaluation, and inference
- Automate the end-to-end system for continuous improvement and deployment
- Develop and manage developer environments and build systems
- Deploy multi-cloud setups using AWS, Azure, and GCP
- Utilize cloud native technologies like Kubernetes, Prometheus, and gRPC
- Build CI/CD infrastructure and pipelines
- Apply SRE principles for observability, SLOs, automation, and change management
- Document and maintain tooling for infrastructure and processes
- Establish cross-functional relationships with internal teams for driving initiatives

Required Skills

- 5+ years of work experience
- Bachelor''s or Master''s in Computer Science, or equivalent experience
- Proven experience automating Cloud and Networking infrastructure on AWS, GCP or Azure
- Good understanding of Linux-based Operating Systems, Containerisation and Orchestration technologies like Docker and Kubernetes
- Experience with Terraform or other configuration management tools like Jsonnet, Kapitan, Helm or Kustomize
- Experience with Monitoring stacks such as Prometheus, Influx, Stackdriver or Zabbix
- Programming experience, ideally with Python, Go or Bash scripting
- Experience with writing Kubernetes Operators
- Good understanding of DevOps, SRE principles and Platform Engineering
- Share company values: curiosity, ownership, transparency & directness, outcome-based performance, and customer empathy

Benefits

- 100% remote company with a digital nomad policy
- Competitive compensation & equity
- Outsized responsibilities & professional development
- Training: functional, customer immersion, and development training
- Medical, dental, and vision insurance (exact benefits vary by region)
- Unlimited paid time off, with a minimum of 20 days off per year requirement
- Paid parental leave (exact benefits vary by region)
- Home office setup allowance, coworking space stipend, and company MacBook

Apply Now ->

About the company

Phaidra

Phaidra provides artificial intelligence controls to optimize mission critical facilities. Our closed-loop AI control service helps your operations team deliver step function improvements in plant stability, energy efficiency and sustainability.