Senior Site Reliability Engineer
Company: 4IR
Location: Nashville
Posted on: February 13, 2026
|
|
|
Job Description:
Job Description Job Description About This Role We deliver
mission-critical IT/OT infrastructure—in cloud and on-prem—for
industrial customers that can't afford downtime. Small team. Hard
problems. Practical solutions. No bureaucracy. No blame. No egos.
We ship it, own it, and make it better—blameless but accountable,
shoulder to shoulder. We work hard. We stay human. We trust each
other. We figure it out. If you know what to do, delight in
building it, and feel the ownership to support it—keep reading.
What You'll Do Customer Delivery Design complex IT/OT
architectures—in cloud and on-prem—that are secure, recoverable,
and sized appropriately Work directly with customers to understand
their environment and estimate effort Own customer solutions
end-to-end: requirements design build support Build or use reusable
modules when it makes sense—build bespoke when it doesn't Deploy
and manage Kubernetes-based infrastructure and stateful
applications across diverse customer environments Incident Response
& Ownership Participate in on-call rotation alongside the rest of
the team—everyone here supports what we ship Own incidents through
resolution, then drive root cause analysis that eliminates the
class of problem—not just the symptom Build the runbooks, alerts,
and automation that make the next incident less likely or less
painful Infrastructure & Automation Work with
Infrastructure-as-Code tools to provision and manage diverse
customer environments Implement and maintain GitOps workflows for
in-cluster deployments Ensure all infrastructure and application
changes are declarative and version-controlled Automate
self-healing and system updates—reduce manual intervention and keep
environments current Observability & Reliability Build and maintain
monitoring, alerting, and dashboards using Prometheus, Loki, and
Grafana Define SLIs and SLOs that reflect what actually matters to
customers Surface real problems, reduce noise, and continually
improve reliability and team efficiency Shape the Future We don't
have everything figured out. You'll help build, create, and shape
how we operate Contribute to standards, patterns, and processes
that make us better—not bureaucracy for its own sake Bring the SRE
mindset: automate toil, prefer boring/stable systems, and
relentlessly improve What We're Looking For 5 years in SRE, DevOps,
or Infrastructure Engineering Strong Kubernetes skills in
production environments—you'll troubleshoot real clusters, not just
tutorials Experience with GitOps tooling (ArgoCD, Rancher Fleet,
FluxCD, or similar) Solid understanding of Infrastructure-as-Code
concepts (Terraform, Pulumi, Crossplane, or similar) Real incident
response experience—you've been on-call, stayed calm, and fixed
things under pressure Comfort with heterogeneous environments—every
customer site is a little different and you need to adapt Clear
communication skills—you can write a useful runbook, gather
requirements on a customer call, and document what you learned
Ability to operate in ambiguity—we're building clarity, not waiting
for it Strong Plus Azure experience (our primary cloud) Experience
with SUSE ecosystem (SLE Micro, RKE2, Rancher, Longhorn)
Industrial, manufacturing, or OT environment experience Familiarity
with Inductive Automation's Ignition platform and MQTT Experience
in a startup or small-team environment where you wore many hats The
SRE Mindset This matters here. We need someone who: Sees repetitive
manual work as a problem to automate, not a fact of life Prefers
stable, predictable, "boring" production over clever and fragile
Supports what they create—no throwing things over the wall Treats
incidents as opportunities for systemic improvement Works well on a
small team where everyone carries weight Stays current with SRE
practices, emerging technologies, and cloud/edge trends A Few
Honest Words This is a startup. Hours can be demanding. Priorities
shift. You won't have a team of 30 backing you up. What you will
have: the autonomy to make real decisions, teammates who own their
work, and customers who genuinely depend on what we build. We work
hard because the work matters—and we have fun doing it. If you want
a structured 9-5, predictability, and a clear ladder—this probably
isn't the right fit. If you want to build, learn, and be part of
something that's actually going somewhere—let's talk. What We Offer
Comprehensive benefits (Medical, Dental, Vision, 401K) Fully
remote—work from anywhere in the world A team where it's safe to be
honest, learn from mistakes, and get better together
Keywords: 4IR, Bowling Green , Senior Site Reliability Engineer, IT / Software / Systems , Nashville, Kentucky