Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Parag Kale (SRE/DevSecOps)

Toronto,ON

Summary

Seasoned software engineer with 12+ years' experience adept in cloud and platform engineering, software development and technical leadership. Skilled in strategic thinking, I approach tasks with professionalism and a positive attitude, fostering teamwork and collaboration.

Overview

14
14
years of professional experience
1
1
Certification

Work History

Senior Senior Site Reliability Engineer

Walmart
Toronto
02.2023 - Current
  • I strategize to minimize service disruption in Walmart's Cloud tech platforms through Operational Excellence, Incident, Change Management. Driving RCAs & outcomes.
  • Building GEN AI capabilities catering to SRE by curating data to train models.

Lead Site Reliability Engineer

Coupa
Toronto
03.2019 - 01.2023
  • Tech Lead for implementing SSO (Ping Federate, Keycloack), CI/CD & Platform Operations
  • Program Lead for Cloud Cost Management
  • Built Frameworks for cloud cost analysis & automation using Ruby & AWS to save $1.5 million
  • Built Performance & Capacity planning frameworks for optimized resource utilization of SQL & Ruby on Rails applications on AWS EC2
  • Implemented Cross Region Disaster Recovery for container based micro-services on AWS ECS
  • Led chaos engineering resulting in On Call playbooks & building trust in platform resiliency & reliability
  • Automated tech-debt using Python, Bash, Ruby
  • Executed Platform Migration & Adoption for SSO systems
  • Implemented SLO/SLI in Grafana/Prometheus/AWS Cloud-Watch for delivering 99.99% SLO
  • Tuned Platform Scalability for AWS ECS based services for 99.99% monthly availability.
  • Automated a zero downtime release using Rundeck, Ruby, Python, AWS ECS
  • Executed Zero downtime blue green releases for SSO systems with a zero incident record.
  • Resolved application security threats achieving the committed SLA of zero security vulnerability
  • Architected Jenkins infrastructure for high availability, fault tolerant and system deployments
  • 24*7 On Call. Performed Incident RCA
  • Lead a team of 2 Engineers for 3 years.

Senior Site Reliability Engineer

TomTom
Pune
07.2015 - 03.2019
  • Adoption and Migration from Datacenter to AWS (Lambda, EC2, RDS, S3, VPC, ALB etc.)
  • Implemented Infra as code using Terraform, Packer, Puppet, Python, Shell for AWS
  • Architected Network and Load Balancing for performance and high availability (HAProxy, API Gateway, Kong, AWS LB, VPN, VPC)
  • Architected Network Security (AWS WAF, Security Groups, AWS Shield,AWS Guard Duty, Security Monkey)
  • Contributed to System and Application Monitoring (Kibana, Cloudwatch, ELK, Prometheus, Grafana)
  • Developed Configuration management using Puppet
  • Built Pipelines for CI/CD using Jenkins, Spinnaker
  • Executed Chaos Testing for system reliability (Bash and Python)
  • Operations for ELK platform, Jenkins, PostgreSQL, Java Apps
  • Automation using Python, Bash
  • Built automation for cloud cost management
  • 24*7 On call. Contributor to RCA.
  • Performed production release for Java based Applications

Software Engineer

Symantec
Pune, India
06.2014 - 06.2015
  • Contributed as a python/django backend developer for web applications that dealt with MDM for carriers like IOS, Android, MS.

Software Engineer

Blue Jeans Networks
Banglore
07.2010 - 05.2014
  • Created and managed Microsoft Outlook scheduling Addin in C# and .Net, overseeing end-to-end development, testing, packaging, and release for 50+ customers.
  • Contributor for developing highly available, fault tolerant hybrid cloud for hosting Audio/Video conferencing solutions using C/C++/Python
  • Built and Developed Puppet/Chef platform and scripts
  • Hands On Linux System Administration work
  • On Call and Incident Management

Education

Bachelor of Science - Computer Science

Pune Institue of Computer Technology
India
05-2010

Skills

  • AWS (Expert), Azure (Intermediate)
  • Terraform, Cloud Formation (Infra as Code)
  • Docker, AWS ECS, Kubernetes
  • Grafana, ELK, Prometheus (Observability)
  • Python, Bash, GO (beginner)
  • Chef, Puppet
  • Jenkins, Rundeck (CI/CD)
  • SRE principles (Resiliency, High availability, Stability)
  • Cloud Architecture, Design, Operations
  • Performance and Chaos Engineering
  • Networking and Security
  • Incident, Change, Release & Cost Management
  • Team Leadership, Mentorship
  • Project Management

Certification

  • AWS Certified Solutions Architect

Timeline

Senior Senior Site Reliability Engineer

Walmart
02.2023 - Current

Lead Site Reliability Engineer

Coupa
03.2019 - 01.2023

Senior Site Reliability Engineer

TomTom
07.2015 - 03.2019

Software Engineer

Symantec
06.2014 - 06.2015

Software Engineer

Blue Jeans Networks
07.2010 - 05.2014

Bachelor of Science - Computer Science

Pune Institue of Computer Technology
Parag Kale (SRE/DevSecOps)