Terraform
Kubernetes
Docker
Site Reliability Engineer - Data Platform
Overview
Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion.
Job Description
Kraken is a world-class team with crypto conviction, united by the desire to discover and unlock the potential of crypto and blockchain technology. As a fully remote company, Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space.
Responsibilities
- - Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations
- - Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage
- - Provide a state-of-the-art suite of BI tools for multiple teams within the company
- - Guarantee the availability, high performance, scalability and cost efficiency of our data platform
- - Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
- - Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
- - Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments
- - Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
- - Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
- - Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
- - Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC)
- - Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration
- - Implement effective incident response procedures and participate in on-call rotations
- - Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions
- - Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement
- - Support AI/ML teams with their infra requests
Required Skills
- - Bachelor''s degree in Computer Science, Engineering, or a related field (or equivalent experience)
- - Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security
- - Experience with real-time data processing technologies, such as Kafka and Debezium
- - Working experience in managing hybrid systems particularly AWS and (HashiCorp nice to have)
- - Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
- - Experience with containerization and orchestration tools, particularly Kubernetes and Docker
- - Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python or Rust)
- - Familiarity with CI/CD deployment pipelines and related tools
- - Strong problem-solving skills and the ability to troubleshoot complex systems
- - Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus
Benefits
- - Bonus program
- - Equity program
- - Wellness allowance
- - Medical, dental, vision and 401(k) [US Only]
About the company
Buy, sell, trade and learn about crypto on Kraken — the simple, powerful crypto platform that grows with you.
All Job Openings at Kraken