Summary

Overview

Work History

Education

Skills

Websites

Accomplishments

Certification

Additional Information

Timeline

Mukarram Ali

Dartmouth,NS

Summary

8+ years of experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Big Data Technologies working with Apache Hadoop Eco-components
Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Yarn, Spark, Scala.
Expertise in data transformation & analysis using Spark, Hive
Experience with ETL and Query big data tools like Pig Latin and Hive QL.
Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig, Hive.
Developed Spark SQL programs for handling different data sets for better performance.
Experience in presenting analytical findings, complex data analytics and recommendations to clients.
Experience in presenting analytical findings, complex data analytics and recommendations to clients.
Proficient in Statistical Modeling and hands on experience in implementing Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/ Predictive Analytics, Regression based models, Hypothesis testing
Proficient in Machine learning algorithm like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree, Random Forests
Excellent performance in Model Validation and Model Tuning with Model selection, K-fold cross-validation, Hold-Out Scheme.
Knowledge and experience in GitHub/Git version control tools.
Ability to Work independently or as a part of Team for On Time Delivery of assigned deliverables

Overview

years of professional experience

Certification

Work History

Capstone Project

Cape Breton University

Sydney, Nova Scotia

09.2022 - 01.2023

Responsible for analysing data to identify patterns and inconsistencies
Supported the team in developing plans and strategies to develop various types of models using regression and clustering.
Experience in Machine Learning Tools and Algorithms such as Linear Regression and Logistic Regression
Performed EDA using python ,Pandas and NumPy for data cleansing, Data Extraction and Feature selection
Tracked project and team member performance closely to quickly intervene in mistakes or delays.
Maintained open communication by presenting regular updates on project status to customers.

Senior Big Data Engineer

HTC Global Services

Hyderabad, India

12.2021 - 05.2022

Responsible for writing the code in Spark using python to excel files with Multiple sheets using data bricks
Responsible for writing python script to extract the data from various data sources.
Converted the pivoted sheets of file to extract specific data from the columns using python
Created new columns from the existing columns as per the Teams requirement
Developed a final consolidate file by applying various pyspark transformation and excluded the data from final sheet based on the exclusion data.
Developed and maintained end to end operations of ETL data pipelines and worked with large sets of data sets in Azure Data Factory (ADF) using python
Implemented various components of ADF like Pipelines, Triggers, Data Flows, Datasets, Linked Services and Control Flows
Performed data Validation, Debugging, Error handling mechanism, Transformation types and data cleanup with in large datasets using ADF
Developed code in Notebooks using python to create pipelines for data processing.
Used python for writing the code to extract data from Datalakes for processing and transformation tasks.
Performed the task of file automation to generate consolidated data report weekly, Bi-weekly and Monthly using Python
Ensuring data quality, reliability, and fault tolerance in streaming pipelines. Implementing mechanisms for data validation, error handling, and fault recovery to ensure the accuracy and consistency of the processed data.
Assisted with day-to-day operations, working efficiently and productively with all team members.

Senior Big Data Engineer

Tech Mahindra

Hyderabad, India

03.2021 - 09.2021

Responsible to collect, clean, and store data for analysis using Data Ingestion Framework (DIA) using pyspark
Ingested large amount of data from different data sources into HDFS using pyspark
Implemented pyspark and performed cleansing of data by applying Spark’s Transformations and Actions by writing the code using python programming
Worked on Extracting Voice Biometric and Signature data of the MS customers from Different sources and performed analysis using Spark API’s.
Writing Spark Streaming applications to process and analyze real-time streaming data. This involves applying transformations, aggregations, and complex analytics on the streaming data.
Collaborated on ETL (Extract, Transform, Load), tasks, maintaining data integrity and verifying Pipeline stability
Integrating Spark Streaming with other data sources and systems, such as databases, messaging systems, or external APIs. This includes designing and implementing connectors and adapters for data ingestion and data output
Created Azure Blob storage to store raw data sent from different sources
Deployed Data factory for creating data pipeline to orchestrate the data in SQL database
Wrote Ad hoc SQL queries using Spark SQL and SQL to extract Customers PII details from various data sources.
Worked with the BI Team to provide curated and aggregated data for Dashboard development
Used JIRA for project tracking and participated in daily scrum meetings
Applied effective time management techniques to meet tight deadlines.
Managed time efficiently in order to complete all tasks within deadlines.

Big Data Engineer

Ford Motor Private Limited / One Magnify

Chennai, India

08.2020 - 01.2021

Responsible for creating a workflow in Alteryx independently by pulling the data from various sources such as SQL, Teradata and Hadoop
·Developed code in pyspark to pull the data from HDFS to be processed in Alteryx
·Worked with various Data Preparation tools and In Database Tools of Alteryx to cleanse the data within the Database and filtered it based on the requirement.
·Used Alteryx to predict the consumption of parts by the Dealer and the Customer and Ranked each part consumption based on the customer inputs.
Designed and developed analytical data structures.
Coordinated with the Manager and explained the flow of data between various systems with in the workflow.
Validated the data from collected from different sources and transforming it based on the requirement
Calculated the Overall Accuracy of the model, Returned Cost of the Product and Net dealer Net for the parts that were consumed and returned.
Involved in writing SQL queries to measure revenue generated by different campaign activities carried out by the organization.
Developed multiple Spark jobs in PySpark for data cleaning and Pre-processing
Created a Workflows to generate reports for the new dealers that were added to the dealers list on a Weekly Basis.
Accomplished the task of created and delivering the Workflow which calculated the Dealer Net for collision parts on Weekly, Monthly and Quarterly basis.
Automated the workflow so that it is able to generate Weekly reports for various dealers that are parts of the Dealers data.
Responsible to maintaining, updating and delivering the Recurring reports to the Revenue Team for providing Incentives to the dealers based on their weekly sales data.
Maintained good relations with the various Teams for smooth and easy flow of reports and information needed by them to calculate the incentives for dealers.

Data Engineer

Verizon / Techstar Group

Hyderabad, India

07.2019 - 09.2020

Ingested data from various sources like Text files, csv files and RDBMS sources using pyspark
Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
Designing and implementing scalable and efficient data pipelines for processing streaming data using Apache Spark Streaming
Implemented pyspark and performed cleansing of data by applying Transformations and Actions
Used Oozie workflow to automate and schedule jobs
Strong experience in Big Data technologies like Spark 1.6, Spark SQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X.
Worked on Talend Open Studio to Migrate Data from Amazon Redshift into Teradata using Redshift and Teradata Connectors.
Worked on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.
Extensively perform large data read/writes to and from csv and excel files using pandas.
Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
Extensively used Python's multiple data science packages like Pandas, NumPy, Matplotlib, Seaborn, SciPy, Scikit-Learn for developing machine learning models
Developed, implemented and maintained data analytics protocols, standards, and documentation.
Used JIRA for project tracking and participated in daily scrum meetings

Big Data Hadoop Developer

VGsoft

Hyderabad, India

03.2015 - 05.2019

Responsible to collect, clean, and store data for analysis using, Sqoop, Spark, HDFS using pyspark
Used python and Spark framework for real time and batch data processing
Ingested large amount of data from different data sources into HDFS using pyspark
Implemented Spark using python and performed cleansing of data by applying Transformations and Actions
Processed and Analyzed data in stored in HBase and HDFS
Developed Sqoop scripts for importing and exporting data into HDFS and Hive
Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
Inspected and analyzed existing Hadoop environments for proposed product launches, producing cost/benefit analyses for use of included legacy assets.
Developed highly maintainable Hadoop code and followed all best practices regarding coding.

Education

Post-Baccaleurate Diploma - Business Analytics

Cape Breton University

Sydney

05.2023

Master of Science - Computer Applications

Osmania University

Hyderabad

06.2009

Skills

Pyspark

Hadoop

Hive

Spark

Azure Data Factory (ADF)

Azure Data Bricks (ADB)

Python

Github

Alteryx

HDFS

ADLS

Pandas

Numpy

Teradata

Websites

Accomplishments

Multiple Linear Regression Analysis of Canada’s Freight Transportation Framework:

URL: https://www.mdpi.com/2305-6290/7/2/29

Certification

Microsoft Azure AI Fundamentals (AI-900)
Microsoft Azure Fundamentals (AZ-900)

Additional Information

Academic Projects :

1. House Price Prediction using Business Analytics Tools and Excel

2. E-Commerce Database Management System using DBMS, MySQL

3. United States Gun Violence Analysis using Tableau

4. Healthcare Provider Fraud Detection using Logistic regression, Decision Trees and Random Forest Machine Learning Models

References:

1. Abdul Alali, MBA

Assistant to the Dean/Program Manager, Shannon School of Business

Cape Breton University

Tel: (902) 563-1447

2. Jamileh Yousefi, Ph.D.

Assistant Professor, Shannon School of Business, Cape Breton University

Adjunct Professor, Faculty of Computer Science, Dalhousie University

Tel: (902)563-1227

Timeline

Capstone Project

Cape Breton University

09.2022 - 01.2023

Senior Big Data Engineer

HTC Global Services

12.2021 - 05.2022

Senior Big Data Engineer

Tech Mahindra

03.2021 - 09.2021

Big Data Engineer

Ford Motor Private Limited / One Magnify

08.2020 - 01.2021

Data Engineer

Verizon / Techstar Group

07.2019 - 09.2020

Big Data Hadoop Developer

VGsoft

03.2015 - 05.2019

Post-Baccaleurate Diploma - Business Analytics

Cape Breton University

Master of Science - Computer Applications

Osmania University

Mukarram Ali

Summary

Overview

Work History

Capstone Project

Senior Big Data Engineer

Senior Big Data Engineer

Big Data Engineer

Data Engineer

Big Data Hadoop Developer

Education

Post-Baccaleurate Diploma - Business Analytics

Master of Science - Computer Applications

Skills

Websites

Accomplishments

Certification

Additional Information

Timeline

Capstone Project

Senior Big Data Engineer

Senior Big Data Engineer

Big Data Engineer

Data Engineer

Big Data Hadoop Developer

Post-Baccaleurate Diploma - Business Analytics

Master of Science - Computer Applications

Similar Profiles

Kassandra JabaleeKassandra Jabalee

Shivansh Om Parkash : Full/Part Time :Shivansh Om Parkash : Full/Part Time :

Alyssa RoseAlyssa Rose

Victoria MuryVictoria Mury

Dorothy TaiwoDorothy Taiwo