Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
References
Timeline
Generic

Chetan Singh

Toronto,Canada

Summary

Dynamic and seasoned Solutions Architect with extensive experience specializing in data platforms, architecture, and integration technology solutions. Proven track record in architecting and implementing secure, scalable cloud infrastructures and driving the integration of security practices throughout the Software Development Life Cycle (SDLC). Demonstrated mastery across multiple disciplines, encompassing database management, application development, client interfaces, software layering, build processes, testing methodologies, and deployment strategies. Adept at creating end-to-end design solutions focusing on reliability, scalability, and availability for complex systems. Strong advocate for observability standards to ensure optimal system operation. Demonstrated partnership skills through collaboration with the product and engineering leadership team and other stakeholders, helping define technical roadmaps and providing a platform-centric perspective on the product roadmap. Possess strong communication and interpersonal abilities, adept at articulating concepts to stakeholders of varying levels, facilitating productive discussions to garner business consensus, and collaborating with cross-functional teams and senior leadership to effectively communicate security initiatives, and resolve challenges. Extensive experience in building strategies, hiring, scaling, coaching and mentoring teams, while also maintaining a technically strong engineering culture that is supportive of the organization’s values and ambitions.

Overview

14
14
years of professional experience

Work History

Principal Software Engineer

Walmart Global Tech
Toronto
08.2023 - Current
  • Designed resilient and reliable Payments Platform across multiple public clouds
  • Architected a next-generation, highly available, scalable, and secure multi-cloud payment platform (AWS and GCP) based on a well-architected cloud framework, utilizing Kubernetes, Cloud Spanner database, Cloud Monitoring, Prometheus, Grafana, and a Golang backend
  • Designed a cloud-based data analytics platform (SLO repository) using BigQuery for analytics and Golang for ETL jobs
  • Led product architecture blueprints and design specifications, enabling application/platform SLO monitoring, error budgeting, and stakeholder notifications
  • Built automated ETL pipelines to convert application metrics into meaningful data insights – whitelisting metrics, exporting them to public cloud storage buckets
  • Developed and maintained architecture frameworks/solutions (golden standards) to ensure efficiency, reliability, scalability, and security for reuse across the organization
  • Established an incident management framework to optimize incident response workflows, enhancing observability coverage and reducing customer-reported incidents, while also implementing blameless incident analysis and post-mortems
  • Provided mentorship and leadership to a diverse team of architects, designers, engineers and business stakeholders fostering a culture of continuous improvement, innovation, and staying updated on emerging technologies and engineering trends
  • New Payments Platform on public cloud achieved 99.99% availability goal with MttX reduction by 18%.

Staff Software Engineer

Nuvalence
Toronto
07.2021 - 08.2023
  • Designed and built a commercial product in Azure cloud for a client venturing into enterprise offering
  • Designed a fine-grained IAM access system using Azure Active Directory
  • Consolidated automation tooling – Strengthened the infrastructure build and configuration management process by using Terraform, Helm and rolling out CI/CD pipelines using Azure DevOps
  • Implemented Azure Well Architected Framework incorporating Azure's five pillars - Reliability, Security, Cost Optimization, Operational Excellence, Performance efficiency
  • Strengthened observability stack to have better insight into the performance, availability, and behavior of applications and infrastructure
  • Promoted knowledge sharing and skill development within the organization – leading Tech Exchange, Lunch & Learn and Office Hours
  • Led a team of engineers from Nuvalence at the client side to deliver the statement of work while abiding by the security principles and well-architected framework
  • Fostered a platform-centric identity mindset and led the team to deliver key client initiatives
  • Containerized the applications using Azure app service and Azure Kubernetes service and made applications highly available and resilient
  • Designed and built fully functional DR site in AWS - Global datastore (Aurora MySQL), global tables (Dynamodb), Redis (replication), EFS (replication)
  • Helped the client embrace their cloud journey following best practices, fostered the environment of change adoption, built SRE practice from scratch and established ownership
  • This initiative enabled the client to accelerate their cloud adoption rate by 1.5x.

Lead DevOps Engineer

Finaptic (Acquired by Bank of Canada)
Toronto
02.2021 - 07.2021
  • Built a Fintech ecosystem (B2B API) from ground-up for an early-stage start-up
  • Technologies: GCP stack – GKE, CloudBuild, Cloud Run for Anthos, Istio, Helm, Terraform, Cloud DNS, Cloud Spanner, Helm, Terraform, Secrets Manager, Binary Authorization, Google Cloud Monitoring, Pager Duty, Cloud Build, IAP, Cloud Functions
  • Architected and built highly available, redundant, scalable, secure, automated, idempotent multi-tenanted systems with out of box observability and self-healing capacity
  • Successfully led a team of engineers to achieve key KPIs around security, reliability, and operational excellence
  • Established a culture of learning by initiating Office Hours and sharing success and failure stories organization-wide
  • Helped the client embrace their cloud journey following best practices, fostered the environment of change adoption, built SRE practice from scratch and established ownership
  • This initiative enabled the client to accelerate their cloud adoption rate by 1.5x.

Senior Site Reliability Engineer

WorkMarket
Toronto
02.2020 - 02.2021
  • Built a Fintech ecosystem (B2B API) from ground-up for an early-stage start-up
  • Technologies: AWS stack – AWS IAM, Terraform, Cloud Watch, Cloud Trail, DataDog, EC2, Lambda, Route53, CodeBuild, Github, Cloudflare, Ansible, Nomad, Consul, Pager Duty
  • Containerized the existing workloads using Hashicorp’s Nomad orchestration technology, Consul service delivery and mesh solutions and Vault for secrets/certs management across the AWS cloud environment
  • Rolled out Datadog across the environment along with PagerDuty for paging and incident management
  • This project helped save ~$5k/month towards infrastructure costs, reduced the toil by having network mi middleware automation; improved scalability and reliability by having service health status and service discovery, and securing the environment by encrypting all traffic in our service mesh.

Site Reliability Engineer

TripStack
Toronto
02.2019 - 02.2020
  • Built hybrid cloud-based systems
  • Technologies: AWS stack – AWS IAM, Terraform, Cloud Watch, Cloud Trail, Prometheus, Grafana, Gitlab-ci, EC2, Lambda, Route53, Gitlab, Ansible, Pager Duty, Istio, VMware, EKS
  • Containerized workloads using Kubernetes and rolled out the use of EKS cluster with Istio integrated in it using an automated build process (Terraform)
  • Enabled the organization to migrate their monolithic workloads to an ecosystem of microservices and achieve reliable, scalable, secure, and highly available systems.

Senior Linux Engineer

Bureau of Meteorology
Melbourne, Australia
01.2017 - 07.2019
  • Spearheaded data centre migration for the Australian weather department
  • Technologies: AWS stack – AWS IAM, Cloud Watch, Cloud Trail, Terraform, Ansible, Nagios, Pager Duty, EC2, Akamai, Github
  • Developed cloud-based solutions and migrated the existing infrastructure to Amazon Web Services and containerized the application using Dockers
  • Built, released, and managed large-scale, mission-critical, resilient, fault-tolerant secure environments both on-premises and public cloud and managed operating system, and infrastructure component configuration
  • Facilitated knowledge-sharing sessions on industry best practices through training and workshops using knowledge gained from community-driven technical meetups and conferences
  • This initiative helped achieve robust and secure systems and helped BOM preserve the data integrity and run seamless operations.

Senior Engineer

IBM and HCL
, India
08.2010 - 04.2016
  • Provided exceptional infrastructure support
  • Successfully led a team of engineers to handle BAU tasks across various facets of engineering – Compute, Network, Database, Security for Australian Telecom Giants
  • Worked closely with cross-functional teams to implement intrusion detection systems – implemented OS based (virtual) firewalls and vulnerability scanners
  • Worked on providing root cause analysis and principles of change, incident, and problem management.

Education

Bachelor of Technology (B. Tech)- Computer Science -

GNDU
05.2010

Skills

  • Solutions Architecture
  • Cloud Infrastructures
  • Security Practices
  • Database Management
  • Application Development
  • Client Interfaces
  • Build Processes
  • Testing Methodologies
  • Deployment Strategies
  • End-to-End Design Solutions
  • Reliability, Scalability, Availability
  • Observability Standards
  • Partnership Skills
  • Technical Roadmaps
  • Interpersonal Abilities
  • Safe Agile Methodology
  • Public Cloud - AWS, Microsoft Azure and Google cloud
  • Orchestration - Kubernetes & Hashicorp Nomad
  • Microservices
  • Incident, Problem & Change Management
  • Tools customization
  • Distributed Systems
  • Project Planning
  • Performance Tuning
  • Project Budgeting
  • Technical leadership
  • Technical Documentation
  • Cross-Functional Collaboration

Accomplishments

  • Organized and executed a C-level offsite for in , aimed at aligning strategies and building consensus for migrating payment workloads to the public cloud. Translated strategic decisions into actionable engineering initiatives, resulting in a 45% traffic migration to the public cloud, enhancing security and resilience while maintaining 99.99% availability
  • Launched brown bag sessions to drive best practices, introduce tools, and foster knowledge sharing among platform teams at Walmart. Cultivated an SRE mindset to boost productivity and establish open feedback loops, aligning with architectural goals.
  • Industry wide collaboration via various meetups, vendor conferences and summits

References

References available upon request.

Timeline

Principal Software Engineer

Walmart Global Tech
08.2023 - Current

Staff Software Engineer

Nuvalence
07.2021 - 08.2023

Lead DevOps Engineer

Finaptic (Acquired by Bank of Canada)
02.2021 - 07.2021

Senior Site Reliability Engineer

WorkMarket
02.2020 - 02.2021

Site Reliability Engineer

TripStack
02.2019 - 02.2020

Senior Linux Engineer

Bureau of Meteorology
01.2017 - 07.2019

Senior Engineer

IBM and HCL
08.2010 - 04.2016

Bachelor of Technology (B. Tech)- Computer Science -

GNDU
Chetan Singh