Responsibilities
Architecture and Design
- Develop and maintain DevOps architectural standards and best practices
- Develop and maintain infrastructure-as-code using tools like Terraform, Helm, and GitOps workflows
- Design and implement secure, automated, and scalable CI/CD pipelines
- Implementation and management of AWS infrastructure components such as VPC, EC2, S3, RDS, CloudFront, etc with a focus on security
Kubernetes Expertise
- Design and architect enterprise-grade Kubernetes cluster
- Configure and optimize Kubernetes addons to extend platform capabilities and securely integrate with external systems such as auto-scaling, load balancing, service meshes and observability solutions
- Enforce RBAC, network policies and other guardrails for cluster security in multi-tenant environment
- Provide expertise in troubleshooting and resolving issues in a Kubernetes platform and hosted applications
- Evangelize Kubernetes best practices and drive adoption within the organization
Security Integration
- Lead the integration of security practices into the DevOps process, including code scanning, vulnerability assessments, and security testing
- Implement and enforce security controls throughout the development lifecycle
Toolchain Management, Automation and Orchestration
- Evaluate, select, and implement DevOps tools and technologies that align with organizational goals
- Optimize existing toolchains for improved efficiency and security
- Implement automation scripts and tools for infrastructure provisioning, configuration management, and deployment
Monitoring and Performance Optimization
- Establish monitoring and alerting systems for both security and performance metrics.
- Continuously optimize infrastructure and applications for improved efficiency and reliability.
Qualifications
- 5+ years of experience in SRE/DevOps/Infrastructure engineering
- Mastery of containerization and orchestration platforms (Docker, Kubernetes)
- Strong expertise in AWS, including hands-on experience with key services (EC2, S3, RDS, Lambda, VPC, etc.)
- Experience with deploying and securing modern cloud infrastructure environments and adjacent tooling
- Strong expertise in CI/CD pipeline design and implementation (GitHub Actions, CircleCI, Jenkins, etc.)
- Experience with infrastructure as code tools (e.g., Terraform, Ansible)
- Hands-on configuration management experience with production environments
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack)
- Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., Redis, MongoDB, Cassandra)
- Strong motivation and experience to apply automation to reduce operational toil for engineering teams