Sr. Engineer – DevOps/Site Reliability
- Job Title
- Sr. Engineer – DevOps/Site Reliability
- Job ID
- Ann Arbor, MI 48106
- Other Location
Sr. Engineer - DevOps/Site Reliability
2 Openings -
- Sr. SRE engineer (Automation improvement, Container technology knowledge, understand or worked with Cloud, Voldemort knowledge is a big plus)
- SRE engineer (Release focused, automation, CI/CD, scripting, Container and Cloud knowledge is a plus)
We are looking for a Site Reliability/DevOps Engineer that will be responsible for designing and developing fully automated deployment solutions, as we head down the path to continuous deployment. This position will serve as an infrastructure and operations engineer within the Next Generation Store Systems Department. This role possesses a mix of development, networking, security, and system administration skills, as this Site Reliability Engineer is required to provide developer support, application systems administration, production support and many other tasks. In addition to building full scale environments and deploying full application solutions on demand, you will have the pleasure of implementing creative monitoring solutions, and providing full visibility into all areas of our system. Continuous Innovation and Continuous Improvement is key to succeeding in this role. You’ll have an integral part in helping the Next Generation Pulse become scalable to address our future growth. The specific focus for the Engineer is on establishing best practices around configuration, automation and optimization of the development, test and release processes for the Next Generation Pulse Platform. This role works collaboratively with the Agile Delivery Teams to deploy and operate our systems, automate and streamline our operations and processes, build and maintain tools for deployment, monitoring and operations, and to troubleshoot and resolve issues in our production and non-production environments
(40%) System Administration
- Engineer extensive scripting and automation to enable applications to install and run in all environments with minimal manual intervention
- Evaluate, test, deploy and maintain both custom developed and third party software upgrades
- Maintain SDLC systems such as test environments, source control and automated build/test/deploy systems
- Provide developer support on an ongoing basis, frequently embedded in development teams to facilitate collaboration
- Create & maintain application architecture and troubleshooting documentation
(30%) Web Production Support
- Provide 24x7 production support as part of a team rotation, resolving or escalating issues as appropriate
- Maintain production services to highly demanding SLA’s
- Take ownership of production issues, working closely with the infrastructure and development teams on issue resolution
- Support releases on a regularly scheduled basis, as well as emergency releases as needed
- Deploy application and data changes to all environments as needed
- Provide Level 2 technical support.
(30%) Planning, Design and Implementation
- Design and implement new environments, services and application architecture modifications.
- Experience with Infrastructure as a Code and/or Configuration Management tools (Puppet, Ansible, Terraform).
- Design and implement build, deployment and configuration management.
- Manage CI and CD tools with team.
- Handle code deployments.
- Monitor metrics and develop ways to improve
- Brainstorm for new ideas and ways to improve development delivery.
- Research, evaluate and implement operational improvements, application packages and architectural modifications
- Participate in change control, release planning, and other operational planning
- Remain current on industry leading solutions in both private and public cloud hosting (VMWare, Xen, KVM, Amazon Web Services (AWS), Azure, Google App Engine, etc.)
- Remain current on modern open-source persistence technologies (Hazelcast, BDB, Project Voldemort, Cassandra, MEMCACHED, etc.)
- Remain current on modern containerization technologies (Docker, vSphere Integrated Containers, Kubernetes)
- Bachelor’s degree in computer science or equivalent experience
- Release automation (e.g. Jenkins), system administration, system configuration, and system debugging experience.
- Experience with configuring and maintaining Jenkins and Jenkins Pipelines.
- Experience with Linux and Windows Administration.
- 5+ years production application support experience in a high uptime environment
- 5+ years UNIX/Linux administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
- 5+ years hosting experience in a large heavy-traffic environment
- 4+ years software engineering experience (Java, C, C++)
- Understanding of networking principles, esp. TCP/IP
- Excellent troubleshooting and analytic skills
- Broad understanding of modern containerization technologies (Docker, VMWare PKS Kubernetes).
- Knowledge of cloud infrastructure environments (e.g. AWS, Azure).