Find Jobs Find Talent
Lead Site Reliability Engineer
Minnetonka, Minnesota  |  Remote, Onsite
Contract to Hire Position
It appears that you have already applied to this job.
Applied on December 6, 2022
Job Id #57151 Posted June 21, 2022


The Lead Site Reliability Engineer is a technical Subject Matter Expert that pro-actively drives the technical stability and performance of the applications in the provider technology portfolio. They combine software and systems engineering to design solutions in physical, virtual and cloud environments that automate fault detection, containment, and resolution without customer impact or human intervention. These solutions typically involve software development for metrics and event collection/correlation across distributed architectures, automation, monitoring, intelligent alerting, random fault injection, and self-healing. Focus areas include…

  • High Availability, Disaster Recovery, Sustained Resiliency, Chaos Engineering
  • Service and Operational Level Agreements
  • SRE - Standards and best practices
  • Performance Engineering
  • Application scalability/Capacity Management
  • Technical debt Reduction
  • Logging, monitoring, intelligent alerting, self-healing
  • Security Vulnerabilities and Compliance
  • Solution design
  • Application Knowledge Support Artifacts, etc.

Primary Responsibilities:

  • Responsible in Site Reliability Engineering practices – improve availability / reliability, latency, performance, efficiency, monitoring, emergency response and capacity planning / forecasting / management
  • Design self-healing and resiliency patterns
  • Responsible for running production systems - ensure applications are available per business SLAs.
  • Accountable for facilitation, communication, and resolution of high / critical business impact issues and drive blameless post-mortems and Root Cause Analysis.
  • Communicates system related problems and collaborates with other IT teams and managers on solutions, enhancements, and process improvements.
  • Responsible for production best practices, technical and operating standards, design and implementation of performance and operational enhancements.
  • Work with engineering teams across SDLC activities to implement best practices to make applications secure and reliable.
  • Integrate security/compliance tools in deployment pipelines. leadership and teaming skills to coordinate and perform vulnerability assessments using tools and remediation of vulnerabilities within established timeframes.
  • Responsible in coordination, technical planning and implementation of Product Life cycle upgrades, production maintenance and technology debt reduction activities.
  • Ability to drive technical features including intake, prioritization, creation, grooming and implementation
  • Drive Chaos Engineering practices to test under real-world conditions
  • Provide inputs in architectural and design decisions
  • Design and implement end-to-end monitoring solutions for Application and Infrastructure components, based on cutting edge SLO-based telemetry tools
  • Lead a team of talented software development engineers responsible for a hybrid of software engineering and operations, with an emphasis on reducing operational toil
  • Manage on-call rotations across continents, using a follow-the-sun model  

You will be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role, as well as providing development for other roles you may be interested in.

Required Qualifications:

  • BS or MS in Computer Science, a related field, or equivalent experience
  • 6+ years of experience in site reliability engineering practices
  • Experience in supporting and operating large-scale production systems
  • Experience in programming in Java Spring Boot and APIs
  • Knowledge in Unix/Linux shell, can write shell scripts, and understands Linux internals
  • Experience with CI/CD and infrastructure automation tools - Jenkins, Terraform, etc.
  • Experience in infrastructure and application logging, monitoring and observability tools, intelligent alerting, and automated self-healing

Preferred Qualifications:

  • Experience in public cloud ecosystems – AWS
  • Experience in Elastic Search
  • Experience in Kafka Streaming
  • Experience with containers, such as with Kubernetes
  • Experience with Chaos Engineering

Horizontal is proud to be an Equal Opportunity and Affirmative Action Employer. We seek to provide employment opportunities to talented, qualified candidates regardless of race, color, sex/gender including gender identity and/or expression, national origin, religion, sexual orientation, disability, marital status, citizen status, veteran status, or any other protected classification under federal, state or local law.

In addition, Horizontal will provide reasonable accommodations for qualified individuals with disabilities. If you need to request a reasonable accommodation in order to complete the application or interview process, please contact

All applicants applying must be legally authorized to work in the country of employment.


What is your gender?

What is your ethnicity?

What is your Veteran / U.S. Military Status?

Do you identify with one or more of the classifications of protected veterans below?

If yes, please indicate by checking the appropriate box below

Do you have a disability?

You are considered to have a disability if you have a physical or mental impairment or medical condition that substantially limits a major life activity, or if you have a history or record of such an impairment or medical condition.

Horizontal is proud to be an Equal Employment Opportunity/Affirmative Action Employer providing a drug-free workplace.


You have saved your first job! To see all your Saved Jobs, click here. Or continue scrolling through jobs and bookmark openings that catch your eye and apply for those jobs later.

Return to Job Search

We’re sorry!

There are currently no open positions in your location or accepting applications from out of the country

Return to Home
Cookies help us improve your website experience.
By using our website, you agree to our use of cookies.