JOB PURPOSE
As a Senior Site Reliability Engineer working on critical services, your mission will be to ensure our services are fast, highly available, scalable, and able to withstand unprecedented increases in load. He/she will be at the
heart of solving production problems with scope from the kernel to the application.
JOB CONTEXT
The position requires the flexibility to take a holistic approach to troubleshoot and the ability to delve deeply
into technical details. The Site Reliability Engineer will co-locate with the various application development
teams. This ensures that he/she will acquire the necessary domain knowledge to troubleshoot and repair an
outage effectively. The team will build automation tools for system health and production acceptance tests to
validate production changes. The Site Reliability Engineer will ensure the system is well-instrumented and
highly fault tolerant. He will drive the automation of delivery pipelines for effective deployment and ultimately
handle the execution of AIOps principles and practices to automate operation processes.
KEY RESPONSIBILITIES
Engage, influence, and evangelize SRE practices with development, operational and product groups to
align technology service/solution delivery.
Drive quality accountability within the team with well-defined processes, metrics, and goals for
process quality. This includes leading effective post-mortems and ensuring actions are followed-up.
Manage availability, latency, scalability and efficiency of all platforms, infrastructures, applications
development by instilling engineering reliability into our development life cycle with a focus on fault
tolerant approaches.
Drive capacity planning, performance analysis, instrumentation, and other non-functional systems
requirements.
Must be able to define and report “progress” on platforms and project level tasks to all stakeholders
including senior management, use effective communication approaches with each constituency.
Implement metrics driven processes to ensure service quality targets are met.
Attend to Emergency Fixes and manage Chaos.
KNOWLEDGE, SKILLS & EXPERIENCE
Key skills:
Knowledge of software development principles and practices.
Expert knowledge in all aspects of designing, developing, managing large real-time systems.
Knowledge of Linux/Unix OS.
Coding experience beyond simple scripts – Expected to code on the job, build automation
Knowledge in shell, PHP, or Perl Programming.
Demonstrated experience working in large, complex systems environments.
Deep understanding of internet and networking protocols.
A passion for performance excellence, robustness, and engineering mindset.
Knowledge and Experience:
The successful candidate will possess an outstanding record of professional experience and will thrive
in an environment that demands accountability. He/she must possess significant technology
management and product development experience. He/she must also have strong planning,
organizational, communication skills, and be a key driver to help the team understand the big picture
perspective.
Proven leader of technology solutions in a high-volume transaction environment.
Have excellent time management, communication, decision-making, presentation, and organizational
skills.
Maintain excellent written and verbal communications with clients, employees, and management
chain, including status reports, project plans, presentations, etc.
Ability to lead across functions and motivate a matrix staff
Qualifications
Education: – A University degree in Computer Science or related disciplines.
Professional Qualifications: – ITIL.
Minimum of 7+ years IT experience, of which 3 years cognate experience in Fintech or Banking
MBA will be an added Advantage
Multi-lingual ability will be an added advantage