Call for motivated professionals to join our OpenShift Container Platform (OCP) Operations Team as first-level support. This team is essential to maintaining the stability and performance of OpenShift clusters that power key business systems. As part of a tiered support structure (L1, L2, L3), the role is focused on day-to-day operational tasks, continuous monitoring, initial incident handling, and supporting ongoing maintenance activities.
Your work will directly impact the reliability of containerized services that are critical to enterprise functions, ensuring a secure, scalable, and highly available platform.
Key Responsibilities
- Oversee Full Platform Lifecycle: Take charge of the complete lifecycle of the OpenShift platform, including upgrades, patches, disaster recovery (DR) planning, and backup strategies.
- Automate Operational Tasks: Use tools like GitOps, Ansible, and Terraform to streamline and automate repetitive or complex platform operations.
- Lead Critical Incident Management: Act as the primary lead during high-severity (SEV1) incidents, conduct post-incident reviews, and drive root cause analysis (RCA) processes.
- Establish and Enforce Security Standards: Define and implement platform compliance policies, including access controls (RBAC), Security Context Constraints (SCCs), network segmentation, and CIS benchmark hardening.
- Integrate with External Tools: Set up and manage integrations between OpenShift and key tools like ArgoCD, HashiCorp Vault, Harbor, and GitLab to enhance platform capabilities.
- Enhance Platform Visibility and Performance: Lead efforts to improve observability, including metrics, logging, and alerting, while tuning the platform for optimal performance.
- Coach and Guide Team Members: Support and mentor junior (L1) and mid-level (L2) engineers by sharing knowledge, encouraging best practices, and leading by example.
No. of Positions: 4
Team Type: Platform Operations
Role Focus: Architecture, Lifecycle Management, Platform Governance