
Site Reliability Engineer
Overview
Float is hiring a Site Reliability Engineer to help automate smarter, improve visibility across engineering, and ensure reliability as they scale.
Job Description
Float is the leading resource management software for professional services teams. Since 2012, we’ve grown every year—independently, self-funded, and profitably. We’re rated #1 for resource management on G2 and trusted by 4,500+ customers worldwide. As a certified B Corporation, we’re committed to making a positive impact on our team, customers, the environment, and the remote community. Our 50+ person team works 100% remotely across the globe, with perks and benefits designed to support us in living our Best Work Life.
Responsibilities
- - Maintain and validate the processes that keep our Kubernetes infrastructure up-to-date
- - Remove noisy, unused, or misfiring boot alerts
- - Partner with engineers to configure services within our clusters
- - Review and optimise usage across Kubernetes services
- - Lead exploration and implementation of service mesh options
- - Define and roll out standardised playbooks for production incidents
- - Build deep familiarity with our next-gen data layer (CDC)
- - Help teams define, measure, and meet reliability goals
Required Skills
- - Confident writing scripts in Bash and proficient in at least one go-to language (ideally PHP, NodeJS, or Python)
- - Strong production experience managing and optimising Kubernetes clusters
- - Solid understanding of infrastructure as code using Terraform
- - Familiarity with Google Cloud Platform
- - Iteration mindset
- - Strong written communication skills
Benefits
- - Global async remote company
- - Diverse team
- - Transparency in perks & benefits
- - Significant deep work time with very few meetings
About the company
Rated the #1 Resource Management Software on G2. Trusted by 4500+ professional services teams to plan projects and schedule work.
All Job Openings at Float