TLDR
Klaviyo
Klaviyo - Senior Software Engineer, Site Reliability Engineering - Platform Services
Posted 2023-03-02 by Klaviyo
Job Description

Check out this quick video with the hiring manager to learn more about the role!

Engineers come to Klaviyo with experience in a variety of languages and from a number of disciplines. All engineers are expected to become extremely proficient in the technologies we use (not exhaustive):

  • Python, Django, Celery
  • MySQL, Cassandra, RabbitMQ, Redis, Pulsar
  • React, HTML, JavaScript, Backbone.js
  • Amazon Web Services (EC2, RDS, Aurora, etc.), Kubernetes on EKS

The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing applications and eking out every drop of performance possible.

Internally, we call this role Senior Site Reliability Engineer on the Platform Services team. As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.

Mission and Vision of the Platform Services SRE Team

Vision: Remove the operational burden of commonly utilized application layer services so that application engineers can focus on implementing business logic in a safe and performant manner.

Mission: Provide services, tools, and processes that are enjoyable, reliable, performant, and seamless to use for common application operational tasks.

What You'll be Working With

  • Building application level services for product development teams. Database, routing tooling.
  • Biggest project in 2023: (chariot) Building a highly scaled asynchronous processing framework. Talk about how most of Klaviyo compute is asynchronous. We are building a framework on top of Apache Pulsar and pushing it further than most other companies. We expect to see millions of events per second. Opp to work on/build extremely high scale async processing framework.

How You'll Make a Difference

  • Ship foundational services to enable Klaviyo engineering to move faster with confidence
  • Design and develop systems and processes that enable highly available & scalable systems
  • Design, build and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
  • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
  • Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, RabbitMQ, Redis, Cassandra, Postgresql to advance Klaviyo’s platform
  • Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
  • Contribute to the company as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
  • Mentor and pair with other Klaviyo engineers to build better software by focusing on performance, self-healing system, configuration as code; defensive programming, application security, etc.
  • Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue 
  • Prototype and advocate for architectural improvements to achieve breakthrough results in Klaviyo systems’ operational scalability and reliability
  • Work hand-in-hand with product-facing engineers to ship impactful code
  • Perform quantitative investigation to understand and scale Klaviyo systems and manage the cross-functional effort to resolve scalability issues
  • Produce and advocate for preventative, upstream solutions with internal stakeholders and external vendors and dependencies
  • Confidently make informed, data-driven choices in a fast paced environment with competing priorities
Ideal Candidate Description

Who You Are 

  • Knowledge of Linux operating systems and computer networking
  • Experience writing code in a programming language such as Python, Ruby, Go, etc.
  • Experience administering cloud-based infrastructure (e.g. AWS)
  • Ability to troubleshoot production issues related to computer infrastructure, configuration, monitoring, deployments, and continuous integration and delivery
  • Ability and willingness to learn
  • Ability to communicate clearly and mentor and coach others on a team
  • Ability to participate in an on-call rotation
Company Description

Klaviyo is a world-leading database analytics and marketing automation platform dedicated to accelerating revenue and customer connection for online businesses. Klaviyo makes it easy to store, access, analyze and use transactional and behavioral data to power highly-targeted customer and prospect communications. The company's hybrid customer-data and marketing-platform model allows companies to grow by fostering direct relationships with customers, without giving up their valuable data to popular big-tech ad platforms. Over 265,000 innovative companies like Unilever, Custom Ink, Living Proof and Huckberry sell more with Klaviyo. Learn more at www.klaviyo.com .

Job Info
Seniority: Individual Contributor
Remote Policy: Hybrid/Remote Part Time
Company Info
Company Website http://klaviyo.com