Title: Application Support Engineer
Location
: Alpharetta GA
Interview
type: L1 – Video and L2 - Face to Face
Position
Overview:
The
Wealth Management Production Management Site Reliability Engineer position is a
highly visible/critical role, which will be a team member of technical SMEs
managing the stability and optimization of the Wealth Management systems. Scope
includes but is not limited to, the day-to-day support of the organization’s
technology related outages, collaboration on technology projects focused
on stability, optimization, business impact analysis, and associated
risk-related methodologies. This role will be responsible for overall stability
of the Wealth Management Investment Management application platforms,
participation on key optimization initiatives, and collaboration with multiple
technical teams within . Additionally, partner with WM business units, various
levels of management and staff to collect, analyze and make recommendations on
optimizing the platform. This position will mainly perform DevOps/SRE roles in
Java, Unix & SQL technologies.
Responsibilities
include:
- Incident Management -Create and manage necessary process involving
incidents
- Partner with Ops Control to ensure IT and/or End User
communications are handled appropriately
- Engage with the development team throughout the life cycle to
support Application build for Reliability
- Develop software to automate manual operational work
- Run, maintain and improve the service against established Service
Level Objectives by applying software engineering principles
- Responsible for the availability, performance, change (CP)
management, monitoring, and capacity management of their services
- Troubleshoot priority incidents, conduct blameless post-mortems and
ensure permanent closure of the incidents
- Analyze patterns of production incidents, develop permanent
remediation plans, and implement automation to prevent future incidents
from occurring through software engineering
- Manage process related functions around large-scale events such as
disaster recovery. Communicate closely with impacted groups to ensure all
events are properly managed.
Primary
Skills / Must have
- Site Reliability Engineer (SRE) in which 80% will be supported
[React/Protect], 10% will be in Dev Ops[Enable] space.
- Proven track record supporting large scale multi-tiered cloud-based
applications.
- Analyze ITSM activities of the platform and provide feedback loop
to development teams on operational gaps or resiliency concerns
- Hands on experience with Java, Angular, Spring, DB2, Unix scripting
and experienced in scheduler tools such as TWS, autosys
- L2-L3 Production Support, Debugging skills, problem solving
- Experience working in an Agile Development environment
- Proven ability to understand and troubleshoot complex problems
under pressure
- Excellent communication skills (both written and oral), listening
skills, influencing and negotiation skills
- Experience with performance troubleshooting and remediation
- Experience with observability tools such as Splunk, Kibana,
Grafana, Prometheus
- Support the application CI/CD pipeline for promoting software into
higher environments through validation and operational gating, and lead
in DevOps automation and best practices.
Secondary
Skills / Desired skills
- Having good expertise on Linux and shell scripting. Need to be very
comfortable with Linux
- Grafana/Kibana dashboarding experience
- Good problem-solving skills
- Good communicator
- Good understanding of brokerage business
- Jobs (controlM/CBSS/CRON) experience
- Bachelor’s/Master’s Degree in Computer Science, Information Systems
or related field