Greetings,
This is Rahul from Quantum world Technologies; I am working as Senior Technical Recruiter in this company. I have a onsite/hybrid Job Opportunity with one of our clients. Please share your resume if you are interested in the job details given below:
Job Title: Site Reliability Engineer
Location: NYC, NY (Hybrid)
strong experience in FIX and the Google SRE model
Job Summary
What You’ll Do:
● Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements
● Handle technical escalations, troubleshoot complex FIX and API connectivity issues, and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
● Adhere to and administer incident and change management policies
● Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability
● Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones
● Coordinate Incident Post Mortems and RCA analysis
● Design, implement, and maintain comprehensive monitoring, logging, and tracing solutions (observability stack) to provide deep insights into system performance and user experience
● Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), managing error budgets to ensure service reliability meets business needs
Required Qualifications:
● 5+ years in a senior SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations
● Knowledge of FIX protocol and messages, ability to read FIX logs
● Familiarity with REST APIs and a strong understanding of API integration
● Proficient in Python and scripting for automation and system management, with a proven track record of developing and implementing automation solutions
● Expertise in SQL and transactional databases, including querying and troubleshooting
● Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis
● In-depth knowledge of core networking concepts including TCP/IP, routing, and DNS.
● Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo)
● Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage
● Knowledge of change management processes and risk management
Preferred Qualifications:
● Experience in the brokerage or financial industry.
● Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB.
● Experience maintaining and supporting containerized systems, with familiarity in orchestration tools.
● Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation.
● Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow.
● Advanced skills in managing containerized environments using Kubernetes and OpenShift.
● Practical experience with Confluent Cloud, RedPanda for event streaming architectures.
● Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging.
Thanks & Regards
Rahul Pandey