Site Reliability Engineer - Data Platform

Overview

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion.

Job Description

Kraken is a world-class team with crypto conviction, united by the desire to discover and unlock the potential of crypto and blockchain technology. As a fully remote company, Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space.

Responsibilities

- Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations
- Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage
- Provide a state-of-the-art suite of BI tools for multiple teams within the company
- Guarantee the availability, high performance, scalability and cost efficiency of our data platform
- Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
- Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
- Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments
- Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
- Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
- Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
- Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC)
- Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration
- Implement effective incident response procedures and participate in on-call rotations
- Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions
- Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement
- Support AI/ML teams with their infra requests

Required Skills

- Bachelor''s degree in Computer Science, Engineering, or a related field (or equivalent experience)
- Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security
- Experience with real-time data processing technologies, such as Kafka and Debezium
- Working experience in managing hybrid systems particularly AWS and (HashiCorp nice to have)
- Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
- Experience with containerization and orchestration tools, particularly Kubernetes and Docker
- Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python or Rust)
- Familiarity with CI/CD deployment pipelines and related tools
- Strong problem-solving skills and the ability to troubleshoot complex systems
- Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus

Benefits

- Bonus program
- Equity program
- Wellness allowance
- Medical, dental, vision and 401(k) [US Only]

Apply Now ->

About the company

Kraken

Buy, sell, trade and learn about crypto on Kraken — the simple, powerful crypto platform that grows with you.

All Job Openings at Kraken