Call for motivated professionals to join our OpenShift Container Platform (OCP) Operations Team as first-level support. This team is essential to maintaining the stability and performance of OpenShift clusters that power key business systems. As part of a tiered support structure (L1, L2, L3), the role is focused on day-to-day operational tasks, continuous monitoring, initial incident handling, and supporting ongoing maintenance activities.
Your work will directly impact the reliability of containerized services that are critical to enterprise functions, ensuring a secure, scalable, and highly available platform.
Responsibilities
- Diagnose and Resolve Platform Issues: Troubleshoot problems affecting workloads, Persistent Volume Claims (PVCs), ingress traffic, service endpoints, and image registries to ensure smooth operations.
- Apply Configuration Updates: Use tools like YAML, Helm, or Kustomize to implement changes across the platform in a consistent and reliable manner.
- Cluster Maintenance and Upgrades: Handle the upkeep of Operators, carry out OpenShift cluster upgrades, and perform post-update checks to confirm platform stability.
- Support CI/CD and DevOps Teams: Collaborate closely with development teams to identify and fix issues in build and deployment pipelines.
- Namespace and Access Management: Oversee and automate tasks like namespace creation, access control (RBAC), and applying network security rules (NetworkPolicies).
- Monitor System Health: Manage logging, monitoring, and alert systems such as Prometheus, EFK (Elasticsearch, Fluentd, Kibana), and Grafana to proactively identify issues.
- Plan and Participate in Maintenance Cycles: Contribute to change request (CR) planning, patching schedules, and coordinate downtime and recovery procedures when needed.
No. of Resources: 5
Role Focus: Advanced Troubleshooting, Change Management, Automation