Growth Hacker
Job Description
About the Role
We are looking for a highly skilled site reliability engineer to manage and scale our on-premise payments infrastructure. You will work on onsite environment spanning virtual machines and containerized workloads on bare metal, ensuring high availability, security, and performance for mission-critical systems.
Key Responsibilities
- Operate and optimize virtualized environments (VMs) and containerized workloads (Docker on bare metal)
- Manage and scale middleware systems like:
- Nginx (traffic routing, reverse proxy, load balancing)
- Redis (caching, HA setup)
- Kafka (streaming, partitioning, fault tolerance)
- Build and maintain CI/CD pipelines using Jenkins
- Manage infrastructure and application configurations using Git-based version control
- Ensure high availability, resilience, and performance tuning across systems
- Work on Linux system administration (RHEL/CentOS/Ubuntu)
- Implement and maintain automation frameworks using:
- Ansible
- Shell scripting
- Manage and troubleshoot networking components:
- TCP/IP, DNS, Load balancing
- Firewalls, WAF policies
- Akamai
- Handle security and compliance requirements
- Maintain accurate inventory and asset management systems
- Participate in incident response, RCA, and system reliability improvements
- Collaborate with application, security, and DevOps teams
Required Skills & Qualifications
Core Infrastructure
- Strong hands-on experience with Linux system administration
- Experience managing on-prem data center environments
- Solid understanding of:
- Virtualization (VMware / KVM or similar)
- Bare metal provisioning
Containers & Middleware
- Experience running Docker in production (non-Kubernetes setups preferred)
- Strong operational knowledge of:
- Nginx
- Redis
- Kafka
- RDBMS
- Java
Observability, Alerting & Reliability
· Design and manage observability platforms:
o Elastic Stack (ELK)
o Grafana / Prometheus stack
· Build and maintain:
o Metrics, logs, and tracing pipelines
o Dashboards for system health and business KPIs
· Develop intelligent alerting strategies:
o Reduce noise (alert fatigue)
o Improve signal quality
· Build correlation mechanisms / alert aggregation systems to:
o Reduce MTTD (Mean Time to Detect)
o Reduce MTTR (Mean Time to Recover)
· Drive proactive monitoring and anomaly detection
· Lead incident response, debugging, and RCA with data-driven insights
CI/CD & Version Control
- Hands-on experience with:
- Git (branching strategies, code reviews, infra-as-code workflows)
- Jenkins (pipeline creation, build automation, deployment orchestration)
Networking & Security
- Good understanding of:
- Networking fundamentals (L3/L4 concepts)
- Firewalls and WAF (rule tuning, debugging)
- Experience handling secure production environments
Automation
- Hands-on experience with:
- Ansible
- Shell scripting (bash)
Operations
- Experience with:
- Monitoring, alerting, and logging systems
- Incident management & RCA
- Capacity planning
Preferred Qualifications (Good to Have)
- Experience in UPI / Payments domain
- Understanding of:
- High TPS systems
- Low latency architecture
- Exposure to:
- Ceph / SAN / storage systems
- HA/DR design patterns
- Knowledge of observability stacks (Prometheus, ELK, etc.)
- Experience working in regulated environments (PCI-DSS, RBI guidelines)
Pay: ₹600,000.00 - ₹1,700,000.00 per year
Work Location: In person
Preparing for this role?
Practice with an AI interviewer tailored to SRE at Wits Innovation Lab.
More Jobs
View all jobsStaff Attorney II
Compositor (Flame / Nuke)