Our client, a leading federal defense contractor is seeking a Site Reliability Engineer (SRE) responsible for maintaining survivability and reliability of mission critical resources.
The SRE will monitor high priority systems and automate recovery mechanisms to ensure they remain operational for the warfighter.
Responsibilities:
- Ensuring Uptime of Critical Systems (Incident Response / Triage)
- Monitor, and Troubleshoot Enterprise Services (Prometheus, Grafana, Splunk)
- Configure Enterprise Services (Ansible, YAML, JSON)
- Must be US Citizen due to government requirement
- Must be able to obtain TS/SCI (active TS is preferred)
- Requires a Bachelor's degree in a STEM field and 5+ years of job-related experience, or a Master's degree plus 3 years of job-related experience.
- Experience monitoring large scale systems and using automation to triage emerging issues
- Experience with Prometheus (preferred) and/or Grafana and Splunk.
- Experience automating Systems Administration Activities (Bash / Python / Ansible are preferred)
- Experience developing recovery procedures for large systems (Backup and Restore, Blue/Green Deployment)
- Linux experience
- Collaborative team player with experience working on teams with diverse engineering skills
- Mixed job experience involving software engineering, systems administration, and network engineering
Scottsdale, AZ
1
Monday, August 11, 2025
Contract
6-12 months w/ CTH option
Monday, July 21, 2025
Know someone who would be a good fit? We pay for referrals!