Seemless Integration, Exceptional Results

Senior Infra Engineer: Observability

Railway

Job Description

Job description

Our core mission at Railway is to make software engineers higher leverage. We believe that people should be given powerful tools so that they can spend less time setting up to do, and more time doing.

Many infrastructure platforms simply focus on how you deploy your singular application, and now how these applications function in concert. Questions like “How do you build systems for zero downtime deployment”, “How do you do service-to-service communications”, etc are usually left up to the engineers to define.

At Railway, our goal is to be an all encompassing solution to all these problems. As such, we take special care as we define our networking infrastructure.

Note: Networking falls under the platform engineering umbrella. If you’re specialized, we’d love to chat! That said, we’d also like it noted you’re probably going to do a lot of non-networking + platform things

“But the world would be a better place if more engineers, like me, hated technology. The stuff I design, if I’m successful, nobody will ever notice. Things will just work, and will be self-managing”

– Radia Perlman

About The Role

For this role, you will:

  1. Build ingestion pipelines to consume 1M+ RPS streams of logs, metrics, and other telemetry
  2. Build scalable, fault tolerant alerting engines for notifying users, in real-time, of threshold breaches
  3. Craft rich backend observability APIs, working with product to build amazing experiences for instantly grokking their application
  4. Provide APIs to access realtime log/metrics streams to be consumed by the Dashboard and Product Teams
  5. Build Golang/Rust GRPC services from scratch capable of supporting tens of thousands of users, and the million+ to come.
  6. Define infrastructure that can be torn down, failed over, and reconstituted from scratch using principle of immutable infrastructure using Terraform and Ansible.
  7. Write Engineering Requirement Documents to take something from idea, to defined tasks, to implementation, to monitoring it’s success.
  8. Interface with our TypeScript and GraphQL edge to expose your microservice APIs for both internal and potentially external consumption

This is a high impact, high agency role with direct effect on company culture, trajectory, and outcome.

About You

  • A strong understanding of distributed systems. You enjoy building fault tolerant, resilient, and scalable services
  • Interests in VictoriaMetrics, ClickHouse, and other systems for building observability stacks from the ground up
  • A solid intuition about how long your solutions will last. All systems age. In startups, we can hope for 2-3 orders of magnitude, or 12-18mo.
  • The tact to implement your solution, creator monitors for it’s error boundaries, and document any requirements for when you’re not around
  • A great sense of direction and prioritization when it comes to dealing with the ambiguity of an early stage startup
  • A sense of grit to dive into a problem, implement a solution, scale that solution, and replace it when needed
  • A great set of communication skills for getting your point across, solution implemented, and beyond

We value and love to work with diverse persons from all backgrounds

Things to know

For better or worse, we’re a startup; our team dynamics are different from companies of different sizes and stages.

  • We’re distributed ALL across the globe, and that’s only going to be more and more distributed. As a result, stuff is ALWAYS happening.
  • We do NOT expect you to work all the time, but you’ll have to be diligent about your boundaries because the end of your day may overlap with the start of someone else’s.
  • We’re a small team, with high ownership, who are not only passionate about what we do, but seek to be exceptional as well. At the time of writing we’re 21, serving hundreds of thousands of users. There’s a lot of stuff going on, and a lot of ambiguity.
  • We want you to own it. We believe that ownership is a key to growth, and part of that growth is not only being able to make the choices, but owning the success, or failure, that comes with those choices.

Share this job

Related Jobs

fjorge

Senior WordPress Developer

Looking for a Full-stack Developer with extensive Wordpress Custom experience

GDS Africa

IT Application & Governance Engineer

Maintain, and enhance custom and internal enterprise applications

Link Exchange

Technical Product Administrator

We are seeking a Technical Product Administrator

Remotown

Frontend-Focused Full Stack Developer (Next js.& C#)

You will contribute to real products, collaborate with experienced developers

Remotown

Jnr AI Automation Engineer (NSS)

Seeking a curious and technically skilled AI Automation Engineer (NSS)

Ops Support Engineer

Techland IT Solutions Limited

Technical Presales Engineer

In search of a motivated and technically competent Technical Presales Engineer

Clank Tech Group

Senior Software Engineer

You will sit at the intersection of technical excellence and team empowerment.

Exodus Movement Inc

Security Engineer

You will play a critical role in strengthening the security and resilience

Ecobank Transnational Incorporated

Technical Lead – Enterprise Virtualization & Storage Systems

This role ensures high availability, scalability, security