Summary
Overview
Work History
Education
Skills
Websites
Technical Skills
Timeline
Generic

SURAJ KUMAR

Data Engineer | Solution Architect | Data Warehouse Expert | Crafting Scalable & Efficient Data Solutions For Optimal Business Insights
Brampton,ON

Summary

Dedicated and results-driven Sr.Data Engineer / Solution Architect with 14 years of experience in designing, developing, and maintaining data pipelines and systems. Proficient in data warehousing, ETL processes, and data modelling. Adept at utilizing various technologies to optimize data storage, retrieval, and analysis. Committed to delivering high-quality data solutions that drive business insights and decision-making. Blessed to have had the opportunity to work in diverse geographical locations and cultures, gaining invaluable insights into global business practices and collaboration dynamics.

Overview

14
14
years of professional experience
4
4
years of post-secondary education
1
1
Language

Work History

Senior Data Engineer/Solution Architect

Citibank
7 2021 - Current
  • Lead end to end migration from design, integration, scripting, security and maintainability of CRM based data warehouse from Microsoft suite to Talend for Citi capital markets institutional clients
  • Database migration from SQL Server to private cloud (DBaas) MSaas
  • Built Kafka Consumer and integrated with Talend to execute batch intelligently
  • Designed and built Batch Manager framework using shell scripting to deploy, schedule, run and log batches seamlessly
  • Designing and data modelling for new analytical systems to enhance the warehouse
  • Hive Kerberos connectivity set up with one of the big data source systems from scratch.

Data Engineer

Scotiabank
04.2018 - 07.2021
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
  • Lead a retail banking risk management application with regulatory data reporting
  • Build and automated validation framework by building generic jobs in Talend
  • Designed common data ingestion jobs using Talend earlier the same was a proprietary vendor framework
  • Data replicators and other programs in python to support data movement across EDL
  • Did ingestion POC (replace existing ingestion) using Apache Spark for entire international channels
  • Have worked with structured and non-structured files e.g csv, positional, json, xml formats, parquet, orc, Avro


Lead ETL Consultant

Citibank
10.2015 - 03.2018
  • Lead and managed MDM for Citi capital market / trading data for institutional clients, applications such as Volumes, Balance sheet etc
  • Built end to end framework to onboard data from source cloud environment (Mongo DB) using REST API calls and load data into warehouse
  • Developed the program in Java and parsed incoming complex JSON feed and flatten/normalize the same to an output delimited feed which uses generic jobs built in Talend to get loaded into the data warehouse
  • Designed and developed end to end generic framework which was adopted by other teams as well
  • Created a generic program (PL/SQL code) which takes table name and schema as input and creates and manages an audit /history table for the same
  • SCD type 2 concept was used to build the same
  • Completed POCs on Integration of Talend with Big Data Technologies – Hadoop, Apache Spark, which includes extract files from HDFS, processing the data and push files from the local environment to HDFS and Hive Database.

Software Engineer

Kronos (now UKG)
09.2013 - 10.2015
  • Schedule Effectiveness - The goal of this project is to provide the schedule analytics reports to the Workforce Managers, so that management can take the correct action on employee Shift Scheduling
  • Attendance - Attendance related events, patterns were captured from the source system and balance points attached to such exceptions were computed based on the policies and existing discipline level in which an employee falls
  • A lot of cross-technology developments were also carried to add to the product and few such modules created are ETL job estimations – Based on past runs we could predict the job time for upcoming runs/jobs and generate HTML reports
  • Monitored and supported the Talend jobs scheduling
  • Ad hoc Query Tool – A front-end application capable of accepting user fed queries and returning the records in form of HTML reports, which can be scheduled, managed in groups and sent over mail to the required list
  • Used Java, Talend to build the application and integrated the same with dot net in which the UI was built.

Software Engineer

Mahindra Satyam
03.2010 - 09.2013
  • Project: Credit reconciliation and Chargeback Management CRCM is a front-end application built to capture the bookings done by credit cards and chargeback from various banks
  • This application was designed in such a way that it is extremely configurable and is completely automated built-in DataStage 8.5
  • It does not require any manual intervention thus lowering BAU costs for the client
  • Being configurable, it saves costs on enhancements
  • Project: PDW, FDW, and Teradata Upgrade Payroll and Finance data marts were built to process the pay period payroll and financial GL, AP and AR data from operational Oracle ERP
  • Worked with Teradata folks in Teradata Upgrade across Qantas from version 12 to 13.

Education

Bachelor of Science - Electronics And Communications

RTM Nagpur University
India
07.2005 - 07.2009

Skills

Talend

Data ware housing

ETL Design and Development

Data Integration

Data Quality

Big Data

Spark

Kafka

SSIS

DataStage

Informatica

SQL NO SQL

Technical Skills

Programming Languages: [e.g. Java, Python, SQL, Unix shell scripting]

Big Data Technologies: [e.g. Hadoop, Spark, Kafka]

Database Systems: [e.g. Hive, SQL Server, Oracle, Teradata, Postgresql]

ETL Tools: [e.g.  Talend, SSIS, DataStage, Informatica]

Data Visualization: [e.g. MicroStrategy, Matplotlib ]

Version Control Systems | CI/CD : [e.g. Git, SVN]

Orchestration – [e.g. TAC, Autosys, Cron, Control M, Tidal]

Others - [e.g. System designing, data modelling, Data warehouse designing]

Timeline

Data Engineer

Scotiabank
04.2018 - 07.2021

Lead ETL Consultant

Citibank
10.2015 - 03.2018

Software Engineer

Kronos (now UKG)
09.2013 - 10.2015

Software Engineer

Mahindra Satyam
03.2010 - 09.2013

Bachelor of Science - Electronics And Communications

RTM Nagpur University
07.2005 - 07.2009

Senior Data Engineer/Solution Architect

Citibank
7 2021 - Current
SURAJ KUMARData Engineer | Solution Architect | Data Warehouse Expert | Crafting Scalable & Efficient Data Solutions For Optimal Business Insights