Summary
Overview
Work History
Education
Skills
Languages
Certification
Timeline
Generic

MASTAN BABU SHAIK

Waterloo

Summary

I am a skilled DevOps/Cloud Engineer with hands-on experience in building and managing scalable cloud infrastructure, data pipelines, and automation frameworks. I’ve worked extensively in Google Cloud Platform (GCP) using Kubernetes, Terraform, and Apache Airflow to support multi-tenant deployments and data migrations. My expertise includes designing CI/CD pipelines, automating environment provisioning, and implementing robust Live Data Migrator solutions to enable seamless user data transitions. I’m also experienced with Agile workflows using Jira and Confluence, ensuring smooth collaboration and delivery across cross-functional teams.


I have played a key role in developing and maintaining a Live Data Migrator tool that enables seamless and zero-downtime migration of user data across cloud environments. This involved building fault-tolerant, secure, and performance-optimized pipelines, integrating with Apache Airflow for orchestration and ensuring high data integrity through validation and rollback mechanisms.


In addition to my technical skills, I have leveraged Jira and Confluence to manage agile development workflows, track project milestones, and document technical architectures and operational procedures. My work is driven by a focus on reliability, automation, and operational excellence, ensuring both speed and stability in delivering cloud-based solutions.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Software Engineer

Tech Mahindra
03.2022 - Current
  • Designed and deployed scalable Kubernetes (K8s) clusters in Google Cloud Platform (GCP) to support containerized microservices and application workloads.
  • Automated infrastructure provisioning and configuration using Terraform, enabling consistent and repeatable environment setups across dev, staging, and production.
  • Supported and debugged Terraform provisioning errors, ensuring stable infrastructure rollout across environments.
  • Reviewed and approved Terraform pull requests, enforcing IaC standards and avoiding resource drift.
  • Maintained and updated Terraform modules, incorporating best practices and new resource types as needed.
  • Handled node failures, pod crashes, and autoscaling issues, ensuring high availability of critical services across GKE clusters.
  • Implemented GCP IAM policies to manage access control and enforce security compliance across dev, staging, and prod environments.
  • Performed routine GCP infrastructure audits and cleanup, optimizing resource utilization and minimizing unnecessary billing.
  • Implemented monitoring and logging for K8s workloads, improving system observability and reducing mean time to recovery (MTTR).
  • Developed and maintained CI/CD pipelines for tenant-specific deployments, streamlining release processes and reducing deployment errors by 40%.
  • Investigated and resolved pipeline failures in tenant-specific deployments, including build, test, and release stages.
  • Provided on-call support during production deployments, coordinating with development and QA teams to minimize impact.
  • Added pipeline health checks and alerting to improve visibility into failed jobs or stalled deployments.
  • Built and orchestrated data workflows using Apache Airflow, including DAGs for ETL, monitoring, and automated reporting pipelines.
  • Monitored Airflow DAGs for job failures, task retries, and performance issues, ensuring timely data pipeline execution.
  • Debugged DAG issues such as task dependency failures, timeouts, and database connectivity problems.
  • Maintained DAG configuration files and environment variables for smooth deployment and operation in production.
  • Developed and maintained a Live Data Migrator tool to support seamless and near real-time migration of user data across GCP environments with zero downtime.
  • Extensive experience in managing end-to-end data migration projects, ensuring successful transfer of live data with minimal downtime.
  • Designed robust data ingestion and validation pipelines to ensure integrity and consistency of user data during migrations.
  • Implemented multi-tenant support, enabling parallel migrations across isolated environments while preserving data segregation and tenant-specific configurations.
  • Built automated monitoring and alerting for migration tasks using GCP tools and custom health checks, significantly reducing the time to detect and recover from failures.
  • Integrated the migrator with Airflow DAGs for scheduled and dependency-based execution of large-scale migration jobs.
  • Introduced checkpointing, retries, and rollback mechanisms to ensure fault tolerance and recovery for long-running data transfers.
  • Tuned performance by optimizing batch sizes, resource allocation, and concurrency, reducing migration runtime by 35% in high-volume use cases.
  • Provided hands-on support during live migration windows, monitoring job progress and ensuring data consistency.
  • Investigated data mismatches, incomplete loads, or integrity issues and implemented fixes with rollback/retry mechanisms.
  • Maintained logs and metrics for all migration activities, facilitating post-mortem analysis and performance tuning.
  • Supported migration testing, dry runs, and validation scripts to verify data completeness before and after cutovers.
  • Collaborated cross-functionally with application, database, and infrastructure teams to ensure smooth coordination of live migration events.
  • Managed project tracking, task assignments, and sprint planning using Jira, ensuring on-time delivery of development milestones.
  • Handled ticket triaging for operational incidents, feature requests, and access control changes through Jira.
  • Participated in daily stand-ups and retrospectives to report and resolve recurring support issues.
  • Documented architectural decisions, deployment runbooks, and operational procedures in Confluence, improving team onboarding and knowledge sharing.
  • Created and maintained knowledge base articles and runbooks in Confluence to standardize support procedures.

Education

Post-Graduate Certificate - Computer Science

Conestoga College Institute of Technology And Advanced Learning
Kitchener, None
05-2021

Skills

  • Google Cloud Platform (GCP)
  • Terraform (Infrastructure as Code)
  • Kubernetes (K8s)
  • Helm Charts
  • CI/CD (GitLab CI, Jenkins, or similar)
  • Docker & Containerization
  • Apache Airflow (DAG Design, Scheduling, Monitoring)
  • Data Pipelines & ETL
  • Live Data Migration
  • Jira (Agile/Scrum, Sprint Planning, Ticket Management)
  • Confluence (Technical Documentation, Runbooks)
  • Git (Version Control)
  • Agile/Scrum Methodologies

Languages

English
Native or Bilingual

Certification

Associate Cloud Engineer


Microsoft Certified Azure Fundamentals


AWS Cloud Practitioner

Timeline

Software Engineer

Tech Mahindra
03.2022 - Current

Post-Graduate Certificate - Computer Science

Conestoga College Institute of Technology And Advanced Learning
MASTAN BABU SHAIK