Summary

Overview

Work History

Education

Skills

Websites

Languages

Timeline

Subramanian Venkataraman

Summary

Big Data professional with strong focus on data architecture, data analytics, and ETL processes. Skilled in Hadoop, Spark, Scala ,Kafka, Python and NOSQL DBs, with significant experience in designing and implementing scalable data solutions. Known for effective team collaboration, adaptability to changing project needs, and results-driven approach that consistently drives project success.

Proactive and goal-oriented professional with excellent time management and problem-solving skills. Known for reliability and adaptability, with swift capacity to learn and apply new skills. Committed to leveraging these qualities to drive team success and contribute to organizational growth.

Overview

years of professional experience

Work History

Sr. Big Data Engineer

Citi Bank

10.2019 - 08.2024

Developed an ETL framework in Spark, Scala, Kafka, Oracle and Hive for a complete transformation solution for Mortgage product by developing reusable methods for reading data from Oracle, Writing into Oracle, Reading AVRO files, writing into AVRO files, parsing input
ZIP file, loading the data into Oracle, loading the data into Hive-Staging and loading the data into
Hive-Target environments as part of transformation for Mortgage data
Developed an ETL (Extract, Transform, Load) framework using Spark, Scala, Java, Kafka
Oracle, and Hive to manage data translation for mortgage products
This included creating reusable methods to read and write data into Oracle and AVRO files, parse ZIP file inputs, and load data into both Hive-Staging and Hive-Target environments
It's essential to know that ETL workflows streamline the handling of large data sets in complex data ecosystems
Implemented a comprehensive framework to handle data reconciliation by reading Kafka messages in JSON format, validating and parsing them
This framework responds with
COMPLETED or FAILED messages alongside the reconciliation outcomes using Spark services executed via the Spark-Submit command through a Livy URL
The nuances of using Kafka for real-time messaging and Spark for large-scale data processing underscore the advanced capabilities of modern data ecosystems
Enhanced multi-threading model developed using Scala-Akka to trigger Spark Submit command with set of input parameters and track the status for each execution
Once, the execution is completed or failed, read the corresponding status and returned the results of the execution to the users
Developed Kafka Producer and Consumer modules to manage upstream message intake for the reconciliation framework and communicate execution results back to the original sender
Kafka is often used for its high throughput and fault tolerance in managing real-time data streams
Created a Generic Framework Utility in Scala to read ZIP files within a regression framework
This utility aids in automating repetitive tasks, thus reducing manual intervention and enhancing efficiency
Achieved significant effort savings by implementing the framework in the user acceptance testing(UAT) environment to run GLRS and FLME jobs, which involve extensive data validation of 18and 50 products, respectively
GLRS jobs, spanning 30 minutes, and FLME jobs, taking 3 hours,deal with substantial data volumes up to 10 million and 40 million records
This automation via
Kafka messaging significantly simplifies complex validation processes
Designed an ETL flow in Scala leveraging reusable methods to process mortgage, credit card,and personal loan data, enhancing the flexibility and scalability of data handling
Enhanced the existing reconciliation framework using Scala to read inputs from a JSON file,including schema names, table names, partition columns, partition values, and comparison requirements, to refine data comparison processes
Optimized performance for Scala applications in the UAT environment by tuning memory and resource utilization to manage heavy loads, resolving bottlenecks, reallocating memory parameters, and adjusting executors
Improved the reconciliation framework to generate and email comparison results in Excel files using Scala, facilitating easier interpretation and distribution of data insights
Developed a project integrating Scala, Oracle, and Hive to create dynamic SQL queries for data reconciliation between balance columns, domain columns, and date columns, storing the queries in Hive and Oracle tables for streamlined data comparison
Linked the dynamic queries to Arcadia BI Tool to execute based on filter conditions and display comparison results in the user interface, enhancing the utility of business intelligence tools in data analysis
Created a tool in Scala to convert AVRO format data into Parquet format based on partition keys,leveraging the advantages of Parquet format for efficient query executions
Developed data transfer tool in Python to transfer data between production, UAT, and SIT environments by reading input partition values from a CSV file, constructing ‘distcp' commands,and generating log files to validate data transfer tasks, thus improving productivity and simplifying business deliveries
Developed a ‘FileWatcher' tool in Python to monitor file drops in Unix folders, read, parse, and validate files, and trigger Spark-Submit programs, automating data workflow processes
Developed various utilities in Python scripts for running SQL queries in Hive, data copying, data transferring, generating Spark logs, and sending files as email attachments, showcasing the versatility in handling extensive data tasks
Created PySpark code for a logging module to log user-friendly messages, error messages, and validation messages, and further enhanced proof-of-concept PySpark code for ETL requirements to process DAT files and load data into Hive tables, demonstrating the use of PySpark in effective data management and analysis
Coordinated with business groups for requirement analysis, design, planning, delivery, providing demonstrations to Business Analysts, task allocation, JIRA status tracking, and sending daily/weekly status reports to clients
Effective management of these project aspects ensures alignment with business goals and successful project delivery
Environment: Spark, Scala, Java, Kafka, Hive, Impala, Cloudera, Hadoop, HDFS, Oracle SQL
Server, Arcadia BI tool, Unix, shell scripting

Sr. Machine, Engineering

Capital One

01.2019 - 07.2019

Lead
Project: Linear Regression Model for Auto Loan Default prediction
Project Description: During this project, developed scripts using Python, AWS Lambda, AWS-S3
Scala, Quantum Framework, Scala Test, Jenkins for inserting customer records into customer agreement table as part of approval work flow
The AWS Lambda performs the workflow process
Worked in Linear regression model which predicts vehicle loan outstanding, severity, probability of default using Python, AWS-S3 , Redshift and AWS-EC2 to optimize the execution speed, reducing memory consumption and improving the existing algorithm
Responsibilities:
Using Python programming, validated the presence of customer's car loan records in the source table , verified the approval status in S3 bucket, the number of records present in the data file and number of records in customer PDF files present in S3 bucket
Enhanced Python codes to report error messages as part of loan rejection work flow
Developed test scripts in Scala for performing validations for the presence of Parquet file and‘_SUCESS' file as part of daily execution process which are created by the model
Developed Jenkins pipeline for onboarding the application to production with build preparation,validations as part of deployment process
Performed analysis, traceability of source data and lineage for Front Book Recovery model which forecasts the schedule of payments, Auction Payments, Deficiency amount
Developed flow charts to explain the logical flow of the process for front book recovery
Developed lineage report to explain the flow of data through input, intermediate and output variables that gets created to explain the list of variables needed for final scoring equations
Executed the application and generated reports for memory usage and execution time for each phase of the process
Optimized memory usage by 50% by localizing the data read and processing data within functions in Python
Optimized execution speed for scoring module by 50% in Python
Introduced multi-threading in Python for reading CSV files which reduced the file read in few seconds
Introduced data table assignment statement to increase the speed of execution.

Scala Developer

Walmart

08.2018 - 01.2019

Worked in Bookkeeping project for maintenance and enhancement of Kafka pipelines wheredata originates from Point Of Sales (POS) and ends in General Ledger
Developed Scala akka Micro Services using Futures for sending successful message or errormessage after sending the data through producer and writing the data into Cassandra based on theoutcome
Developed Scala akka micro services for retrying DB connection for DB2 in case of failures
Configured the number of retry the attempts from Type Safe Config
Participated in Scrum Meetings for User, Technical story creation, Retrospective analysis
Environment: Scala, Play, Micro Services, Kafka, Cassandra, DB2.

Data Engineer

Hayward Industries

01.2018 - 06.2018

This project is for a Swimming Pool installation and maintenance-based company who haveapproximately around 10,000 clients in United States
As part of this project developed analytical reportsto get insight into their customer base spread over US, to study the demographics of the client's deviceinstallations and find opportunities for business development
Developed dashboards for displaying customer's device installations, various device typesand PH status of Swimming Pool water using R, R-Shiny Dashboard, Kafka, Scala
Cassandra
Developed codes in Kafka, Scala to read and parse the input data files and load into multiple
Cassandra DB tables
Developed R libraries for Geo-location based analytical reports for displaying customer's deviceinstallations and various device types
Analyzed and developed R codes for time series graphs and scatter plots for PH level data ofswimming pool water along with deviation from recommended levels
This graph supportsmultiple state, city level analysis
Developed analytical product for Warranty Management using Power BI, SQL Server, AZUREand Power BI Gateway
Generated analytical reports using Power-BI for Warranty costs incurred on various parameterssuch as product types, product names, part numbers, warranty timelines, supplier locations,regions, states and customer locations
Developed Python modules NLTK (Natural Language Tool Kit)-Part Of Speech (POS) forcapturing failure types from Problem Description to categorize the problem types and devicetypes to perform root-cause analysis
Performed analysis and designed and developed R programs for decoding Warranty Serialnumbers to find the manufacturing date
This data would help the clients to find out after howmany days, months and years the product has failed at customer location
Performed requirement analysis to study the installation of devices such as heater, pumps, filters,water PH level requirements, log messages
Installed SQL Server DB and Power BI, moved customer data given in CSV format into SQL
Server DB
Performed integration testing between Data base and Power BI Reports to ensure that the data isreported in the reports for various product types, warranty periods, product names, customername and sub-contractor name
Verified data integration between SQL Server and Power BI Reports to ensure that reportingvalues are matching with data base
Published PC and mobile compatible versions for production releases
Environment: R-Shiny Dashboard, R-Studio, R, Python, NLTK, Kafka, Scala, Cassandra, SQL
Server, Microsoft Power BI

MS Analytics Student

Harrisburg University of Science and Technology

03.2016 - 10.2017

Project: Presidential job Approval Rating Analysis through social media
Description: President's Approval Ratings help measure chances of re-election, predict performance inmid-terms of the party in power and generally take stock of the public's approval of the administration'sagenda and performance
The objective of the project is to capture public sentiments from Twitter andcompare it against the results of major approval rating companies such as Gallup Poll, Rasmussen
Reports, Fox News, NBC News, Investor's Business Daily (IBD/TIPP to see is there any correlationbetween Twitter Sentiments and Scientific polling
Developed codes in Spark/Scala to download tweets from Twitter and saved into Cassandra DB
Developed Training Set and Test Sets for performing Sentiment analysis using Naïve Bayes
Classifier
Developed code in Python for Naïve Bayes Classifier for performing Sentiment analysis on
Tweets
Data from polling companies are downloaded manually and correlation graphs are generated
Developed R codes using R-Shiny dashboard to display the analytical graphs
The Twitter data results are normalized between 1% to 100% to match with approval ratinglimits
Environment: Spark/Scala, Python, Sentiment Analysis, Machine Learning, NLP, Naïve Bayes
Classifier, PyCharm, TensorFlow/ Keras, Scikit-Learn, PyTorch, DNNs, GANS, GNNs and Time
Series
Project : Traffic Monitoring Analytix
Description: This project studies the impact and root-cause of Traffic Congestion for 5 chosen cities of
US
It studies the reason for the high and low traffic in the selected US cities by reviewing population,population density, average household income and commutation time to work by framing set of theoriesand tries to prove or disprove based on the actual results
Downloaded Tweets for the 5 chosen selected cities from twitter for the public sentiments
Developed R Codes for Twitter-Sentiment Analysis, R-Shiny Dashboard, loading Twitter
Sentiment results into Mongo DB
Generated correlation graphs, location-based graphs using R-Leaflet and Twitter sentimentanalysis reports
Performed Sentiment Analysis in Twitter for traffic and grouped tweets into various emotionalcategories
Mongo DB is installed in m-lab which uses Amazon Web Services - AWS S3 Instance, werecreated DB, created relations
Developed Analytical application using R-shiny and connected to S3 instance for reading anddisplaying analytics from ShinyApps.io server
Environment: R-Programming, R-Shiny Dashboard, Mongo DB, Sentiment Analysis, AWS
Project: Twitter Analytix
Description: This project performs real time sentiment analysis for the US Presidential Nominees ofelection by reading the tweets from twitter in real time
It reads the twitter using Spark Steaming
Scala and performs analysis using R language in order to generate the Sentiment analytics report
It uses
Cassandra, Mongo DB and AWS for data maintenance
The reports are displayed using R-Shinydashboard
Developed Real time sentiment analysis tool to display instant and continues line graphs forthe positive and negative sentiments of the public for each second based on the data capturedfrom US location using Spark-Scala, Cassandra, R-Programming and Mongo DB installedin m-Lab running under AWS
Installed Mongo DB in m-Lab, running under the hood of AWS, created DB and relations
Developed R Programming to read the twitter data from Cassandra, performed sentimentanalysis using R-libraries and aggregated the data to each second and loaded into Mongo DB
Developed Analytical application using R-Shiny and connected to S3 instance for reading anddisplaying analytics from ShinyApps.io server
The continuous instant line graphs are generated for positive and negative sentiment countsaggregated per second for the presidential candidates for the data downloaded by Spark
Streaming into Cassandra which is subsequently processed by R-Programming for sentimentanalysis and uploaded into the Mongo DB in cloud
Environment: Spark Streaming, Scala, R-Programming, R-Shiny Dashboard, Cassandra
Mongo DB, Sentiment Analysis, Machine Learning, AWS, Maven, Eclipse
Client

Data Engineer

Barclays

11.2016 - 07.2017

Worked in CCAR Group for the projection of 9Q for Capital Funding
The client requires to find out the projection for capital funds for 9 quarters in series in order to identify and mitigate risks involved in capital funds being maintained
Responsibilities:
Worked in Report generation process in Hive and Oracle data base based on various account codes
Developed Avro files for defining data structure
Created tables in Hive using Avro files and partition strategy
Developed ETL-Java code to load data into Hive tables
Developed codes in Spark-Scala to load data from CSV files into HIVE tables to ensure that the data is loaded into Hive tables as per the source files
Developed Data Compare Tool in Scala and Hash map to compare the records between various sources such as CSV, Excel and Hive Tables
Developed Tools in Scala to query Oracle database and retrieve the results into CSV/Excel to analyze the impact in Moniker SQL, designed based on business flow as part of upcoming changes in data files
Developed Tools in Scala to verify the Reports generated from Fixed form Reports against target expected reports for each cell of the matrix as it is complex to verify the output manually
Developed Tools in ‘R' programming to generate SQLs in run time as part of Functional
Reporting requirements to query data from Hive/Oracle Tables
Developed UDFs (User Defined Function) in Hive to query Oracle DB from Hive to integrated from Oracle based on user's requirements
Environment: Cloudera, Hadoop, Unix, Spark, Scala, Hive, Oracle, ‘R', Python, Java, Eclipse
IntelliJ-Idea, Agile, Maven, SBT, Maven, Office Data Administration (ODA) Technical
Environment: ODIN RDBMS (SQL, DB Design), Sun Solaris-Unix, Bashrc, Shell Scripting
UNIX Utility Commands
Responsibilities:
Participated in the analysis of feature specifications, designed, coded, tested and fixed bugs in the application as per the customer's requirements
Coordinated with clients in providing designs, suggestions, status reporting, product release and Configuration Management.

Sr. Consultant

Navy Mutual

08.2015 - 12.2015

Technical Environment: Eclipse, Selenium, Jenkins, Sauce Labs, Java, Webservice, TestNG,Maven, Eclipse, SVN

Sr. Consultant

Starwood Hotels

04.2015 - 08.2015

Technical Environment: Eclipse, Selenium, Java, BDD, Cucumber, Maven for iPhone Devices, Android Devices and Desktop Web Applications

Expert Systems Analyst, Test Automation Architect

Allscripts Healthcare LLC

12.2012 - 05.2014

Technical Environment: HP-QC using Excel Macros and Oracle SQL, HP-QTP 11, VB, VBA - Excel Macro, Hybrid Mode Test Automation Framework Design, Development and Implementation

Sr. Test Analyst

Auto Insurance Domain

10.2011 - 10.2012

Environment: ALM 11.0, QTP-11.0, BPT, Hybrid Model Test Automation Framework, Java

Senior Consultant

Capco

09.2010 - 09.2011

Technical Environment: Agile - Scrum, Quality Center 10.0, Frontier Applications-Recon , Recollect or and Admin

Test Automation Engineer

Geeksoft LLC

07.2010 - 08.2010

Reviewed and provided solutions for the existing test automation framework used for testing a trading application
Prepared requirement study, test plan, test case preparation, test execution, reporting defects for a trading application testing and release.

Test Automation Consultant

Credit Suisse

06.2009 - 07.2010

Technical Environment: QTP 9.5, Descriptive Programming, QC 9.0, BO XI (Business Objects, Desktop Intelligence), SQL Query Analyzer, Embarcadero Rapid SQL 7.5.5, Test Automation, Framework Design and Development

Test Automation Engineer

JetBlue Airways

09.2008 - 05.2009

Technical Environment: .NET, ASP, QTP 9.5, QC 9.0, BizTalk Server, SQL Server 2000, SQL, Query Analyzer, Test Automation Framework maintenance and Enhancement

Test Engineer

Wachovia

05.2008 - 09.2008

Technical Environment: QTP 9.2, QC 9.0, Agile - Scrum, SQL Server 2000, SQL Query Analyzer, Windows-XP, Java, J2EE, .NET, Windows XP, Sun Solaris

Assistant Vice President and Onshore Lead

Merrill Lynch India and US

07.2000 - 02.2008

Worked in India and USA for Merrill Lynch through various organizations.

Environment: NET, ASP, Seagate Crystal Reports, Oracle Toad, SQL Server 2000, SQL Query Analyzer, QTP 9.1, QC 9.0, Rational Functional Tester, Functional Tester, Test, Manager, SQA Basic 7.3 Windows XP, SQL Server, Test Partner, JIRA Administration, DB, Testing, Unix, Shell Scripting, Bourne Shell, Unix-Utility Commands

Senior Software Professional

India Comnet International (P) Ltd

06.1999 - 06.2000

AT&T, USA: Call Detail Data System Technical Environment: Informix, SQL, ESQL-C, C, Sun Solaris Responsibilities:

Member Technical Staff

India Comnet International (P) Ltd

04.1997 - 12.1998

Worked for Lucent Technologies for product development and release through various in-house development tools.

Systems Analyst

Railway Products (India) Ltd

06.1994 - 07.1996

Technical Environment: UNIFY RDBMS, Sybase-10 (SQL, DB Design), Unix, Shell-
Scripting, Bourne Shell, ,C, Unix-Utility Commands, ,Assembly Language, MS-DOS, Unify
RDBMS (SQL, RPT and DB Design)
Responsibilities:
Managed development and maintenance of Daily, Weekly, Monthly and Audit Reportsusing SQL, Unify-C, Report processor programs for Material Information System
Payroll information System, Sales Order Processing, Financial Information System
Developed data base design in Sybase-10 with tables, referential integrity, triggers andstored procedures for migrating Material Information System from Unify Data base to
Sybase.

Education

Master of Science - Analytics

Harrisburg University of Science And Technology

Harrisburg, PA, USA

10.2017

Master of Science - Computer Applications

Annamalai University

India

06.1992

Bachelor of Science - Physics

Barathidhasan University

India

06.1989

Skills

Data Engineering

Scala Programming

Python Programming

Java programming

Spark Development

Apache Kafka

Hadoop Ecosystem

Cloudera Distribution

ETL development

Real-time Processing

NoSQL Databases

RESTful APIs

Data Migration

Machine Learning

Big Data Analytics

RDBMS

Websites

Languages

English

Full Professional

Tamil

Native or Bilingual

Timeline

Sr. Big Data Engineer

Citi Bank

10.2019 - 08.2024

Sr. Machine, Engineering

Capital One

01.2019 - 07.2019

Scala Developer

Walmart

08.2018 - 01.2019

Data Engineer

Hayward Industries

01.2018 - 06.2018

Data Engineer

Barclays

11.2016 - 07.2017

MS Analytics Student

Harrisburg University of Science and Technology

03.2016 - 10.2017

Sr. Consultant

Navy Mutual

08.2015 - 12.2015

Sr. Consultant

Starwood Hotels

04.2015 - 08.2015

Expert Systems Analyst, Test Automation Architect

Allscripts Healthcare LLC

12.2012 - 05.2014

Sr. Test Analyst

Auto Insurance Domain

10.2011 - 10.2012

Senior Consultant

Capco

09.2010 - 09.2011

Test Automation Engineer

Geeksoft LLC

07.2010 - 08.2010

Test Automation Consultant

Credit Suisse

06.2009 - 07.2010

Test Automation Engineer

JetBlue Airways

09.2008 - 05.2009

Test Engineer

Wachovia

05.2008 - 09.2008

Assistant Vice President and Onshore Lead

Merrill Lynch India and US

07.2000 - 02.2008

Senior Software Professional

India Comnet International (P) Ltd

06.1999 - 06.2000

Member Technical Staff

India Comnet International (P) Ltd

04.1997 - 12.1998

Systems Analyst

Railway Products (India) Ltd

06.1994 - 07.1996

Master of Science - Analytics

Harrisburg University of Science And Technology

Master of Science - Computer Applications

Annamalai University

Bachelor of Science - Physics

Barathidhasan University

Summary

Overview

Work History

Sr. Big Data Engineer

Sr. Machine, Engineering

Scala Developer

Data Engineer

MS Analytics Student

Data Engineer

Sr. Consultant

Sr. Consultant

Expert Systems Analyst, Test Automation Architect

Sr. Test Analyst

Senior Consultant

Test Automation Engineer

Test Automation Consultant

Test Automation Engineer

Test Engineer

Assistant Vice President and Onshore Lead

Senior Software Professional

Member Technical Staff

Systems Analyst

Education

Master of Science - Analytics

Master of Science - Computer Applications

Bachelor of Science - Physics

Skills

Websites

Languages

Timeline

Sr. Big Data Engineer

Sr. Machine, Engineering

Scala Developer

Data Engineer

Data Engineer

MS Analytics Student

Sr. Consultant

Sr. Consultant

Expert Systems Analyst, Test Automation Architect

Sr. Test Analyst

Senior Consultant

Test Automation Engineer

Test Automation Consultant

Test Automation Engineer

Test Engineer

Assistant Vice President and Onshore Lead

Senior Software Professional

Member Technical Staff

Systems Analyst

Master of Science - Analytics

Master of Science - Computer Applications

Bachelor of Science - Physics

Similar Profiles

Senthilkumar SSenthilkumar S

null null

Denise GibsonDenise Gibson

Briana SaenzBriana Saenz

Simra AzizSimra Aziz