Summary
Overview
Work History
Education
Skills
Websites
Languages
Certification
Currentjob
Software
Hobbies
Timeline
Generic
Sarim Ali

Sarim Ali

Site Reliability Engineering Manager
Toronto,Ontario

Summary

Skilled site reliability engineering manager builds and motivates high-performing engineering team. Over 10 years of experience in DevOps/ SRE working with cloud based projects. Committed to rapidly and efficiently completing projects by leveraging team-based frameworks to best leverage available engineering talent.

Overview

14
14
years of professional experience
3
3
Certifications

Work History

Site Reliability Engineering Manager

GroupBy Inc.
Toronto, ON
6 2021 - Current
  • Building & Leading a Global Team: Successfully built and scaled a global SRE team across Toronto, Pakistan, India, and Taiwan, fostering a collaborative and inclusive environment and hiring 100% of the current team
  • Automating Infrastructure Provisioning: one-click onboarding system using Terraform and Python, automating Google Project and Kubernetes cluster creation across multiple regions, saving significant time and resources
  • Implemented best practices like shared VPCs and Private Kubernetes clusters, ensuring cost efficiency, security, and scalability
  • Enhancing System Resilience: Designed systems for global traffic failover and low latency connections, utilizing workload identities, Cloudflare, Cloud Armour and Google Secrets Manager for robust security
  • Promoting Reliability & Collaboration: Introduced SLI, SLO, and SLA frameworks, establishing production readiness reviews with product managers, instilling SRE principles across the organization
  • Successfully implemented Scrum within SRE, balancing operation excellence with agile methodologies
  • Significantly reducing toil
  • Facilitating Hybrid Cloud Success: Orchestrated multi-cloud hybrid setup using Azure and GCP, connected via HA VPN, offering flexibility and access to best technologies from both platforms (e.g., OpenAI, CosmosDB, Spanner, Google Retail AI).
  • Managed and motivated employees to be productive and engaged in work.
  • Accomplished multiple tasks within established timeframes.

Senior Site Reliability Engineer

Loblaw Companies Limited
04.2020 - 06.2021
  • Worked with internal teams to help deliver SRE initiatives
  • Key initiatives included delivery of Cloud Function based applications, Kubernetes applications
  • Worked towards setting SLI, SLO, SLAs for all apps released, following up with stakeholders on said metrics and introduced and drove the conversation for observability
  • Worked fast to deliver K8s applications in record time for Covid-19 vaccines, complete with SRE best practices.

DevOps Lead

VerticalScope Inc.
04.2019 - 04.2020
  • Developed and executed strategic plans for process improvement, being a key member in driving migration from one kubernetes cluster into another (GKE)
  • Reduced costs by 80% from one architectural change change
  • Facilitating migrations from Softlayer into Google cloud using best practices (IAC, CI/CD, GKE and stackdriver)
  • Facilitating migrations from AWS to GCP using best practices
  • Facilitating email migrations from mailgun into sendgrid
  • Developed and executed strategic plans for process improvement, driving system unification across the organization
  • Adding cloudsql inplace of hosted mysql instances
  • Working with google technologies including, GCP memorystore, GCP bigtable, GCP bigquery, GCP pub sub, Cloudflare CDN and Sendgrid Mail platform.

Lead Devops Engineer

FanXchange
08.2017 - 04.2019
  • Transitioning environments to use Kubernetes
  • Worked on stabilizing the infrastructure
  • Complete logging system overhaul to Ansible managed ELK stack
  • Trained and lead the engineering team leads and subordinates on DevOps kungfu practice
  • Transitioned mysql databases to autoscaling and reliable RDS databases with no downtime
  • Worked closely with CTO and Engineering to re-architect python applications for scale and reliability
  • Optimized build job and managed CI with move to jenkins 2.0 using integrations like jira, github, and jenkinsfiles
  • Implemented comprehensive monitoring overhaul effort with Datadog automated via Ansible
  • Spearheaded office move and setup wifi meshed network and firewall setup optimized for cost
  • Worked closely with the finance team to reduce cost
  • Made hardware purchases and architecture recommendations.

Devops Engineer

FanXchange
08.2016 - 08.2017
  • Worked on moving code from hand deployed to automated with jenkins and rundeck
  • Stabilized a messy attempt at automation
  • Empowered teams with a fully automated self creating and destroying environment using Terraform, Ansible and Jenkins
  • Empowered dev teams with local development environments using vagrant shomi.

Devops Engineer

FanXchange
06.2015 - 08.2016
  • Initial main duties is to set up the physical DC ready for a big public launch
  • Ensure that the 3rd party payment page integration is top notch in terms of security and compatibility
  • Used existing SOAP integration
  • Worked on payment overrides with big gateways like Chase
  • Became 'go-to' specialist of the payment system others felt comfortable coming to me for guidance
  • Used chef to install logging (sumologic) and app monitoring (newrelic)
  • For automation of windows deployments using only chef, code deploy and Teamcity
  • For automation of NodeJS using only chef, code deploy and Teamcity.

System Administrator/IT Support

[NAMF, Eyereturn, FutureShop, Thinnox]
06.2010 - 06.2015
  • Included this to show start of career
  • Reduced downtime by proactively identifying and resolving potential issues through thorough system monitoring.
  • Established effective communication channels between IT support staff and end-users, leading to improved issue resolution times overall.


System Administrator

Eyereturn
03.2014 - 04.2015
  • Create a IDS using puppet scripts and deploy across existing systems
  • Assist in a database migration project moving assets from one data center to another
  • Using Amazon EC2 instance as a external interface to check for vulnerabilities in existing systems and for external monitoring
  • Create detailed documentation for both new and existing environments
  • Troubleshoot outages including after-hours and on weekends
  • Author scripts to automate systems administration tasks, networking, monitoring and application deployment
  • Leverage and contribute to in-house developed Puppet scripts for the automation of system and application deployment
  • Collaborate with the team and assist in a NY expansion project; helping from designing to shipping
  • Analyze / Troubleshoot hadoop, hbase, and cloudera implementations and increase performance and manageability
  • Install and stage new servers
  • Created a 2 factor authenticated Wi-Fi system with commodity grade hardware
  • User support across all platforms
  • Helped quickly patch security vulnerabilities like Bashbleed and Heartbleed
  • Helped create a brand new DevOps team at eyereturn by leveraging vagrant and puppet to create a dynamic UAT environment
  • Helped network and wire Markham offices.

Robotics Instructor & System Administrator

The THINNOX Academy
10.2013 - 09.2014
  • Deliver professional and educational presentations on a daily basis to students of ages 7-16, and sometimes parents
  • Conduct classes for basic and intermediate level robotics utilizing NXT Mindstorms 2.0 and NXT Mindstorms IDE
  • Active Directory management and updating many users using batch scripts
  • Optimizing and upgrading server
  • Only IT professional in the entire facility; I have to be easily approachable
  • Maintain all hardware in the whole facility and upgrade if necessary.

Mobile Audio - Product Expert

Future Shop
06.2013 - 09.2013
  • Deliver outstanding service to every customer
  • Troubleshoot & resolve complex technical problems or issues for customers
  • Resolve customer issues and complaints in a positive manner that builds business & customer loyalty while adhering to Company policies & procedures
  • Suggest additional products and services to enhance the customer's car audio experience
  • Learn up to date knowledge about Car Audio from websites, forums, peers and formal training sessions
  • Boosted customer turnout by creating a HTML based email blast during the annual VIP sale event, a first in this locations history.

Education

Bachelor of Science - Computer Science

Toronto Metropolitan University
Toronto, On
05.2001 -

High School Diploma -

Notre Dame Catholic Secondary School
Brampton, ON
05.2001 -

Skills

Team Leadership

Operations Management

Strategic Planning

Performance Management

Technical knowledge

Hiring

Languages

English
Native or Bilingual
Urdu
Professional Working

Certification

VMware Certified Associate: Data Center Virtualization

Currentjob

Site Reliability Engineering Manager, GroupBy Inc., 06/2021, Present, 3 years 2 months, Toronto, Ontario, Canada

Software

IAC (Terraform, Ansible, Chef)

CI/CD ( Gitlab, Github action, Circle CI)

Python

Go

Linux Security

Hobbies

Ping Pong Enthusiast: Developed advanced skills and strategic gameplay through years of practice and competition, demonstrating precision and quick reflexes.

Woodworking Aficionado: Crafted a variety of custom furniture pieces and home decor, showcasing creativity, attention to detail, and hands-on problem-solving abilities.

DIY Projects Enthusiast: Tackled numerous home improvement projects, from minor repairs to major renovations, reflecting a strong aptitude for practical skills and innovative solutions.

Lawn Care Expert: Maintained and improved lawn health and aesthetics through consistent care, landscaping, and gardening, highlighting a commitment to quality and outdoor spaces.

Timeline

Senior Site Reliability Engineer

Loblaw Companies Limited
04.2020 - 06.2021

DevOps Lead

VerticalScope Inc.
04.2019 - 04.2020

Lead Devops Engineer

FanXchange
08.2017 - 04.2019

Devops Engineer

FanXchange
08.2016 - 08.2017

Devops Engineer

FanXchange
06.2015 - 08.2016

System Administrator

Eyereturn
03.2014 - 04.2015

Robotics Instructor & System Administrator

The THINNOX Academy
10.2013 - 09.2014

Mobile Audio - Product Expert

Future Shop
06.2013 - 09.2013

System Administrator/IT Support

[NAMF, Eyereturn, FutureShop, Thinnox]
06.2010 - 06.2015

Bachelor of Science - Computer Science

Toronto Metropolitan University
05.2001 -

High School Diploma -

Notre Dame Catholic Secondary School
05.2001 -

Site Reliability Engineering Manager

GroupBy Inc.
6 2021 - Current
Sarim AliSite Reliability Engineering Manager