Stay up to date with up to the minute news from XSEDE and XSEDE User Portal. Subscribe for email notifications.
Sr. Site Reliability Engineer
Harvard Research Computing is looking for a motivated, full-stack site-reliability engineer with experience in designing, configuring, and deploying advanced monitoring and alerting systems for mission-critical services.
This position will report to the Team Lead for Software/Cloud as Infrastructure, and will design monitoring and alerting solutions for a wide range of mission-critical services within FASRC’s infrastructure, as well as train technical staff on monitoring procedures and expectations for monitoring events. Applying best practices and providing recommendations to improve problem identification and response time is key to this position. A strong candidate will have a history of partnering with engineers, administrators, developers and project managers to plan and implement complex technical monitoring solutions and architecture. A successful engineer will work well under pressure, is able to take direction, and relies on their extensive experience and judgment to proactively implement and configure future monitoring solutions.
For more information, please visit: https://www.rc.fas.harvard.edu/about/employment/