Summary
Overview
Work History
Skills
Timeline
Generic

Erol Blakely

Toronto,ON

Summary

I am a results-driven leader with strong background in strategic management and organizational growth. Skilled in developing and implementing effective business strategies, optimizing processes, and driving team performance. Known for adaptability, effective collaboration, and delivering measurable outcomes in dynamic environments. Strong communication and problem-solving abilities, coupled with focus on fostering productive and positive team culture.

Overview

26
26
years of professional experience

Work History

Director, Platform Engineering

1Password
10.2022 - Current
  • Planned and led initiatives to ensure 99.99% availability of platform and tier 1 services
  • Developed and executed - ahead of schedule - on a 3 year infrastructure modernization roadmap which ensured the platform was bot useful and used by 1Password developers.
  • Enhanced team collaboration through regular communication, goal setting, and performance evaluations.
  • Forecasted and managed a 20MM AWS , GCP, Datadog and other tooling effectively to ensure optimal use of budget resources
  • Facilitated cross-functional collaboration for improved decision-making processes within Engineering leadership
  • Developed high-performing teams by providing mentorship, guidance, and opportunities for professional growth.

Director, Platform Engineering

CircleCI
09.2020 - 10.2022

I led a globally distributed SRE, Infrastructure Engineering, Developer Tooling & API teams consisting of 40 ICs and Managers at CircleCI. My teams focused on ensuring the 99.99% availability of CircleCIs platform. This was done through:

  • Establishing SLOs that reflected customer usage of our platform
  • Implementing an effective incident management process which led to 65% reduction in overall incidents
  • Tooling and automation to have computers reason for us. This removed the room for human error and allowed ICs to focus on more impactful work

Director, Site Reliability Engineering

Ecobee
05.2019 - 08.2020

My role at Ecobee was focused on two things, people and processes. I was responsible for staffing, managing and mentoring a team of 16 SREs In addition to this, my role was to define and implement a strategic plan specific to the SRE team. My team and I focus on the tasks such as:

  • Using SLIs to establish enforceable internal SLOs and external SLAs
  • Develop and maintain oncall processes, as well as continuing a blameless culture for both oncall and post-mortems
  • Deploy and automate reliable, scalable & secured infrastructure
  • Build common tools for observability and system telemetry
  • Eliminating tech debt and operational toil
  • Define and manage opex and capex budgets
  • Work with the SREs to mentor and define career growth plans

Director, Infrastructure

Mercatus Technologies Inc.
09.2018 - 04.2019
  • My role at Mercatus has been to bring operational maturity to the Infrastructure team:
  • Developing a security roadmap for both Infrastructure and engineering
  • Implemented monitoring and analytics using Splunk and New Relic
  • Began work on migrating existing infrastructure to docker and K8S
  • Furthered implementation of IaaS using AWS and spinnaker

Manager, IT Operations

Points International
01.2017 - 09.2018
  • My role at Points was to manage a team of DBAs, Devops and network admins who were responsible for managing and maintaining a 24x7 infrastructure with 99.99% availability.
  • Deployed redundant DR site leveraging K8S, docker and gitlab
  • Implemented onsite K8S install utilizing Quobyte, docker and gitlab
  • Developed incident response and security runbooks
  • Designed and managed a zero downtime production network overhaul
  • Assisted in building a corporate threat risk assessment

Senior Systems Engineer

OANDA Corp
08.2015 - 12.2016
  • My duties at OANDA include designing and implementing security policies and procedures as well as the implementing the necessary technical solutions to enforce these policies. Additionally I was also responsible for the staging, pre production and production platforms, the associated networks as well as monitoring of all components.
  • Deployed DDoS solution and incident runbook
  • Deployed Cisco IDS in conjunction with F5 devices and associated incident runbook
  • Assisted in the upgrade from Solaris to RHEL7
  • Migrated datacenters and deployed DR offsite datacenter
  • Implemented smart load balancing using F5 devices
  • Helped plan and run company wide vulnerability and security scan, and implemented recommended fixes

Senior Systems Administrator

Tucows.com Inc
07.2014 - 08.2015
  • At Tucows I was part of a 10 person Operations team which is responsible for Tucows custom built xen based cluster. We operated a cluster which processes millions of transactions on a daily basis at both the wholesale and retail level. My role included 24x7 monitoring of all production services.
  • Built out a new DNS cluster both internally and customer facing
  • Mass upgrade of Xen based hosts to newer kernels and Debian versions
  • Prototyped continuous delivery with jenkins, ansible and docker
  • Planned and participated in regular customer facing code rollouts (with zero downtime)
  • Participate in a 24x7 oncall rotation

Team Lead / Senior Systems Administrator

easyDNS Technologies Inc
10.2006 - 07.2014
  • At easyDNS I was responsible for most, if not all, aspects of technology in both operations and development. I was also responsible for managing the operations team and our entire global presence with a presence in 20 global data centres
  • Deployed a global anycast DNS network, which included automated updates
  • Implemented company DDoS mitigation strategy
  • Prototyped and built additional revenue generating services (eg: hosted email)
  • Responsible for 24x7 incident handling
  • Deployed NetApp hardware to consolidate infrastructure and add fault tolerance
  • Built corporate IP network
  • Consolidated data centres and hardware for cost reduction
  • Managed relations with all technical vendors

Team Lead: Systems Administration

MCI Canada
09.2005 - 09.2006
  • My role at MCI was to manage and maintain the shared and dedicated hosting environments. I was also responsible for dealing with customers and ensuring systems were deployed to their required specs.
  • Deployed redundant monitoring system
  • Worked with customers to ensure systems were highly available and scalable
  • Maintained a 100% uptime status
  • Migrated customers to newer hardware platforms - both shared and dedicated customers

Support Analyst / Systems Administrator

Tucows.com Inc
07.1999 - 09.2005
  • During my first tenure at Tucows I went from a position in OpenSRS technical support to Systems Administration. In my support role I was the primary 24x7 contact for our top resellers and customers. When I joined the operations team I was the junior responsible for managing the 200 nodes of the Tucows network
  • Part of a team of admins who migrated data centres
  • Assisted with the overhaul of our IP network
  • Upgraded hardware chassis and OS
  • Implemented centralized authentication as well as centralized monitoring and paging

Skills

  • Organizational strategy development
  • Team leadership
  • Financial budget oversight
  • Strong interpersonal skills
  • Strategic decision-making

Timeline

Director, Platform Engineering

1Password
10.2022 - Current

Director, Platform Engineering

CircleCI
09.2020 - 10.2022

Director, Site Reliability Engineering

Ecobee
05.2019 - 08.2020

Director, Infrastructure

Mercatus Technologies Inc.
09.2018 - 04.2019

Manager, IT Operations

Points International
01.2017 - 09.2018

Senior Systems Engineer

OANDA Corp
08.2015 - 12.2016

Senior Systems Administrator

Tucows.com Inc
07.2014 - 08.2015

Team Lead / Senior Systems Administrator

easyDNS Technologies Inc
10.2006 - 07.2014

Team Lead: Systems Administration

MCI Canada
09.2005 - 09.2006

Support Analyst / Systems Administrator

Tucows.com Inc
07.1999 - 09.2005
Erol Blakely