Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Certification
Additional Information
Timeline
SoftwareEngineer
Mukarram Ali

Mukarram Ali

Dartmouth,NS

Summary

  • 8+ years of experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Big Data Technologies working with Apache Hadoop Eco-components
  • Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Yarn, Spark, Scala.
  • Expertise in data transformation & analysis using Spark, Hive
  • Experience with ETL and Query big data tools like Pig Latin and Hive QL.
  • Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig, Hive.
  • Developed Spark SQL programs for handling different data sets for better performance.
  • Experience in presenting analytical findings, complex data analytics and recommendations to clients.
  • Experience in presenting analytical findings, complex data analytics and recommendations to clients.
  • Proficient in Statistical Modeling and hands on experience in implementing Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/ Predictive Analytics, Regression based models, Hypothesis testing
  • Proficient in Machine learning algorithm like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree, Random Forests
  • Excellent performance in Model Validation and Model Tuning with Model selection, K-fold cross-validation, Hold-Out Scheme.
  • Knowledge and experience in GitHub/Git version control tools.
  • Ability to Work independently or as a part of Team for On Time Delivery of assigned deliverables

Overview

8
8
years of professional experience
1
1
Certification

Work History

Capstone Project

Cape Breton University
Sydney, Nova Scotia
09.2022 - 01.2023
  • Responsible for analysing data to identify patterns and inconsistencies
  • Supported the team in developing plans and strategies to develop various types of models using regression and clustering.
  • Experience in Machine Learning Tools and Algorithms such as Linear Regression and Logistic Regression
  • Performed EDA using python ,Pandas and NumPy for data cleansing, Data Extraction and Feature selection
  • Tracked project and team member performance closely to quickly intervene in mistakes or delays.
  • Maintained open communication by presenting regular updates on project status to customers.

Senior Big Data Engineer

HTC Global Services
Hyderabad, India
12.2021 - 05.2022
  • Responsible for writing the code in Spark using python to excel files with Multiple sheets using data bricks
  • Responsible for writing python script to extract the data from various data sources.
  • Converted the pivoted sheets of file to extract specific data from the columns using python
  • Created new columns from the existing columns as per the Teams requirement
  • Developed a final consolidate file by applying various pyspark transformation and excluded the data from final sheet based on the exclusion data.
  • Developed and maintained end to end operations of ETL data pipelines and worked with large sets of data sets in Azure Data Factory (ADF) using python
  • Implemented various components of ADF like Pipelines, Triggers, Data Flows, Datasets, Linked Services and Control Flows
  • Performed data Validation, Debugging, Error handling mechanism, Transformation types and data cleanup with in large datasets using ADF
  • Developed code in Notebooks using python to create pipelines for data processing.
  • Used python for writing the code to extract data from Datalakes for processing and transformation tasks.
  • Performed the task of file automation to generate consolidated data report weekly, Bi-weekly and Monthly using Python
  • Ensuring data quality, reliability, and fault tolerance in streaming pipelines. Implementing mechanisms for data validation, error handling, and fault recovery to ensure the accuracy and consistency of the processed data.
  • Assisted with day-to-day operations, working efficiently and productively with all team members.

Senior Big Data Engineer

Tech Mahindra
Hyderabad, India
03.2021 - 09.2021
  • Responsible to collect, clean, and store data for analysis using Data Ingestion Framework (DIA) using pyspark
  • Ingested large amount of data from different data sources into HDFS using pyspark
  • Implemented pyspark and performed cleansing of data by applying Spark’s Transformations and Actions by writing the code using python programming
  • Worked on Extracting Voice Biometric and Signature data of the MS customers from Different sources and performed analysis using Spark API’s.
  • Writing Spark Streaming applications to process and analyze real-time streaming data. This involves applying transformations, aggregations, and complex analytics on the streaming data.
  • Collaborated on ETL (Extract, Transform, Load), tasks, maintaining data integrity and verifying Pipeline stability
  • Integrating Spark Streaming with other data sources and systems, such as databases, messaging systems, or external APIs. This includes designing and implementing connectors and adapters for data ingestion and data output
  • Created Azure Blob storage to store raw data sent from different sources
  • Deployed Data factory for creating data pipeline to orchestrate the data in SQL database
  • Wrote Ad hoc SQL queries using Spark SQL and SQL to extract Customers PII details from various data sources.
  • Worked with the BI Team to provide curated and aggregated data for Dashboard development
  • Used JIRA for project tracking and participated in daily scrum meetings
  • Applied effective time management techniques to meet tight deadlines.
  • Managed time efficiently in order to complete all tasks within deadlines.

Big Data Engineer

Ford Motor Private Limited / One Magnify
Chennai, India
08.2020 - 01.2021
  • Responsible for creating a workflow in Alteryx independently by pulling the data from various sources such as SQL, Teradata and Hadoop
  • ·Developed code in pyspark to pull the data from HDFS to be processed in Alteryx
  • ·Worked with various Data Preparation tools and In Database Tools of Alteryx to cleanse the data within the Database and filtered it based on the requirement.
  • ·Used Alteryx to predict the consumption of parts by the Dealer and the Customer and Ranked each part consumption based on the customer inputs.
  • Designed and developed analytical data structures.
  • Coordinated with the Manager and explained the flow of data between various systems with in the workflow.
  • Validated the data from collected from different sources and transforming it based on the requirement
  • Calculated the Overall Accuracy of the model, Returned Cost of the Product and Net dealer Net for the parts that were consumed and returned.
  • Involved in writing SQL queries to measure revenue generated by different campaign activities carried out by the organization.
  • Developed multiple Spark jobs in PySpark for data cleaning and Pre-processing
  • Created a Workflows to generate reports for the new dealers that were added to the dealers list on a Weekly Basis.
  • Accomplished the task of created and delivering the Workflow which calculated the Dealer Net for collision parts on Weekly, Monthly and Quarterly basis.
  • Automated the workflow so that it is able to generate Weekly reports for various dealers that are parts of the Dealers data.
  • Responsible to maintaining, updating and delivering the Recurring reports to the Revenue Team for providing Incentives to the dealers based on their weekly sales data.
  • Maintained good relations with the various Teams for smooth and easy flow of reports and information needed by them to calculate the incentives for dealers.

Data Engineer

Verizon / Techstar Group
Hyderabad, India
07.2019 - 09.2020
  • Ingested data from various sources like Text files, csv files and RDBMS sources using pyspark
  • Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
  • Designing and implementing scalable and efficient data pipelines for processing streaming data using Apache Spark Streaming
  • Implemented pyspark and performed cleansing of data by applying Transformations and Actions
  • Used Oozie workflow to automate and schedule jobs
  • Strong experience in Big Data technologies like Spark 1.6, Spark SQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X.
  • Worked on Talend Open Studio to Migrate Data from Amazon Redshift into Teradata using Redshift and Teradata Connectors.
  • Worked on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.
  • Extensively perform large data read/writes to and from csv and excel files using pandas.
  • Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
  • Extensively used Python's multiple data science packages like Pandas, NumPy, Matplotlib, Seaborn, SciPy, Scikit-Learn for developing machine learning models
  • Developed, implemented and maintained data analytics protocols, standards, and documentation.
  • Used JIRA for project tracking and participated in daily scrum meetings

Big Data Hadoop Developer

VGsoft
Hyderabad, India
03.2015 - 05.2019
  • Responsible to collect, clean, and store data for analysis using, Sqoop, Spark, HDFS using pyspark
  • Used python and Spark framework for real time and batch data processing
  • Ingested large amount of data from different data sources into HDFS using pyspark
  • Implemented Spark using python and performed cleansing of data by applying Transformations and Actions
  • Processed and Analyzed data in stored in HBase and HDFS
  • Developed Sqoop scripts for importing and exporting data into HDFS and Hive
  • Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
  • Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses for use of included legacy assets.
  • Developed highly maintainable Hadoop code and followed all best practices regarding coding.

Education

Post-Baccaleurate Diploma - Business Analytics

Cape Breton University
Sydney
05.2023

Master of Science - Computer Applications

Osmania University
Hyderabad
06.2009

Skills

    Pyspark

    Hadoop

    Hive

    Spark

    Azure Data Factory (ADF)

    Azure Data Bricks (ADB)

    Python

    Github

    Alteryx

    HDFS

    ADLS

    Pandas

    Numpy

    Teradata

Accomplishments

    Multiple Linear Regression Analysis of Canada’s Freight Transportation Framework:

    by Logistics | 2023-05 | Journal article | Author |DOI: 10.3390/logistics7020029 |CONTRIBUTORS: Jamileh Yousefi; Sahand Ashtab; Amirali Yasaei; Allu George; Mukarram Ali; Satinderpal Singh Sandhu

    URL: https://www.mdpi.com/2305-6290/7/2/29

Certification


  • Microsoft Azure AI Fundamentals (AI-900)
  • Microsoft Azure Fundamentals (AZ-900)

Additional Information


Academic Projects :

1. House Price Prediction using Business Analytics Tools and Excel

2. E-Commerce Database Management System using DBMS, MySQL

3. United States Gun Violence Analysis using Tableau

4. Healthcare Provider Fraud Detection using Logistic regression, Decision Trees and Random Forest Machine Learning Models



References:

1. Abdul Alali, MBA

Assistant to the Dean/Program Manager, Shannon School of Business

Cape Breton University

Tel: (902) 563-1447

2. Jamileh Yousefi, Ph.D.

Assistant Professor, Shannon School of Business, Cape Breton University

Adjunct Professor, Faculty of Computer Science, Dalhousie University

Tel: (902)563-1227

Timeline

Capstone Project

Cape Breton University
09.2022 - 01.2023

Senior Big Data Engineer

HTC Global Services
12.2021 - 05.2022

Senior Big Data Engineer

Tech Mahindra
03.2021 - 09.2021

Big Data Engineer

Ford Motor Private Limited / One Magnify
08.2020 - 01.2021

Data Engineer

Verizon / Techstar Group
07.2019 - 09.2020

Big Data Hadoop Developer

VGsoft
03.2015 - 05.2019

Post-Baccaleurate Diploma - Business Analytics

Cape Breton University

Master of Science - Computer Applications

Osmania University
Mukarram Ali