|Job Type:||Full Time|
Site Reliability Engineering and Automation – Associate
JPMC are looking to develop a core set of set of data management capabilities to drive consistency across each line of business. This data platform will be deployed on premise and longer term in the public cloud. The initial focus is on sourcing, storing, enriching and making available information to supporting internal management reporting, external regulatory reporting, as well as machine learning and other data analysis applications.
We are seeking an experienced software engineering lead in our global Site Reliability Engineering (SRE) team supporting our Big Data platform. This individual will be expected to lead a team of software engineers who will grow into subject manage experts, work with functional application development teams, partner with infrastructure engineers and production support analysts to determine requirements for designing and developing automation, SDLC and development environment testing & integration tools. The toolsets developed must pass the rigor of JPMC’s cyber security standards.
The SRE team runs, maintains and improves the Big Data Platform against established Service Level Objectives by applying software engineering practices. It is responsible for the availability, performance, change management, monitoring, and capacity management of their services, with special emphasis being placed on the automation of the processes/workload in support of the above. The SRE team is also responsible for the operational support of the Big Data infrastructure, with emphasis being placed on the ability to submit outage/issue/incident data into a design and SDLC feedback loop to ensure maximum automation and outage avoidance.
Key responsibilities this role would include:
- Develop, test and deliver the software to automate manual operational work and ensure application performance and resiliency
Key contributor to SRE and functional development teams throughout the life cycle to help create software for reliability and scale, ensuring minimal refactoring or changes
- Troubleshoot incidents, participate in blameless post-mortems and ensure permanent closure of the incidents
- Identify application patterns and analytics in support of better service level objectives
- Analyze self-healing and resiliency patterns and contribute to software which can use these outcomes
- Conduct the performance tests, identify the bottlenecks, opportunities for optimization and capacity demand
- Implement best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting
- Test and implement automated software and product upgrades, change management and release management solutions
- Engage with Technology Controls organization to ensure tooling and ecosystem meets the Firm’s rigorous cyber policies
- Contribute to Firm level SRE community via engineering projects
- Be part of the 24x7 support coverage as needed
This rolerequires a wide variety of strengths and capabilities, including:
Bachelor’sDegree in Computer Science, Engineering or Business
Priorexperience in DevOps and/or application development teams
Excellentdebugging and trouble shooting skills
Hands on experience using largescale software development, preferably in one of these languages: Java, Python,scripting languages
Handson experience of GIT, BitBucket, Jenkins, SONAR, SPLUNK, Maven, AIM and/ orContinuous Delivery tools
Hands on experience in Unix: Linuxand Solaris, relational (Oracle,MS SQL DB, Sybase, etc) and non DB technologies
Knowledgeof Load balancing, IP, DNS
Exposure to new and emergingtechnologies such as cloud and virtualization – such as AWS (Lakeformation, Glue, Redshift,Athena)
- Exposure to messaging technologies: eg Kafka, etc
- Exposure to Orchestration and configuration management tools for applications
- Familiarity with Agile Methodologies
- Hands on experience building out and maintaining data management platforms/workbenches either in house or as part of a commercial offering
- Experience with infrastructure components utilized in data warehousing or big data environments.
Excellentcommunication skills, both written and oral appropriately scaled for seniortechnical and senior business audience
Abilityto work and effectively prioritize in a highly dynamic work environment thatincludes a global focus