Data Scientist with 10 years of experience, familiar with gathering, cleaning, and organizing data for use by technical and non-technical personnel. Advanced understanding of statistical, algebraic, and other analytical techniques.
Overview
9
9
years of professional experience
1
1
Certification
Work History
Senior Data Scientist
360 EEC
10.2023 - 01.2024
Spearheaded critical data extraction from PDF documents related to environmental safety assessments of oil wells, employing advanced Natural Language Processing (NLP) techniques
Leveraged Hugging Face's cutting-edge NLP models and Transformers (BERT) for sentiment analysis on extracted data, contributing pivotal insights for environmental risk evaluation
Demonstrated adaptability to industry-specific tasks and AI technologies, showcasing proficiency in Language Model (LLM) implementation, Generative Adversarial Networks (GANs), and expert data extraction within the challenging context of oil well safety assessments
Implemented Generative chat functionality for asking questions on PDFs using the OpenAI API ChatGPT
Applied fine-tuning methodologies using LORa and QLora to customize models according to project requirements, optimizing performance and ensuring robust results
Orchestrated Falcon 7B dataset structuring and library setup, incorporating HuggingFace Transformers, Datasets, and WandB for streamlined training progress monitoring
Selected and configured Falcon 7B LLM model, defined PEFT parameters for LoRA, and implemented quantization strategies, balancing memory efficiency with acceptable error rates
Defined training arguments, including batch size, optimizer, learning rate scheduler, and checkpoints, for the fine-tuning process
Executed fine-tuning using the HuggingFace Trainer with PEFT configuration, monitored training progress with WandB, and maintained a vigilant approach to prevent overfitting through continuous validation of both training and validation loss.
Practice Lead AI/ML
Infovision Inc.
06.2022 - 06.2023
Led a team of 5+ data scientists and data engineers in developing a time series forecasting model for predicting the demand patterns of products at Henry Schein's warehouse over time
Utilized Google Cloud Platform tools, including BigQuery, SQL, and Python libraries such as Keras and TensorFlow, to implement an efficient algorithm
This significantly improved inventory management by providing accurate forecasts of product demand, aiding in proactive decision-making
Conducted Pareto analysis on a comprehensive dataset comprising 400K products, customers, and suppliers
Identified high-demand and high-profit items for targeted optimization and Combinatorial Optimization, enhancing overall warehouse efficiency
Investigated influential factors affecting specific variables over different time periods, contributing to a nuanced understanding of the product time series forecasting model
Executed regression analysis to refine and optimize the predictive model, emphasizing precision and reliability in forecasting demand patterns for 400K products across various customers and suppliers
Analyzed historical data and patterns to comprehend the underlying dynamics influencing product demand over time
Aligned current demand situations with patterns derived from historical data, ensuring accuracy by synchronizing trends from past data with real-time information
Contributed to a deeper understanding of product demand, streamlined inventory operations, and refined business strategies
The project ultimately improved Henry Schein's overall operational efficiency and customer satisfaction through proactive inventory management.
Team Lead/Senior Data Scientist
HTC Global Services
04.2021 - 06.2022
Led and coordinated two teams of data engineers and ML engineers in the development of a sophisticated customer segmentation algorithm using K-Means clustering within the Google Cloud Platform (GCP) infrastructure
Implemented advanced techniques within the K-Means clustering process to enhance accuracy and granularity in customer segmentation
Managed large-scale customer datasets, overseeing projects focused on Motor Insurance cross-selling, Customer Segmentation based on spending patterns, and Revenue Growth prediction through the utilization of various bank products
Conducted data preprocessing using Google BigQuery and Vertex AI, implementing Python libraries such as Pandas and NumPy for efficient data handling and manipulation
Developed and fine-tuned machine learning algorithms, incorporating regression models for Revenue Growth prediction
Applied statistical methods and feature engineering to optimize the accuracy and reliability of the models
Spearheaded data migration initiatives to streamline processes and enhance overall data efficiency
Designed and implemented an interactive Google Analytics dashboard, providing stakeholders with a visually intuitive platform for data exploration and insights
Employed advanced ML techniques to enhance Revenue Growth prediction models, leveraging insights derived from spending patterns and customer segmentation
Collaborated with business stakeholders and cross functional teams to understand objectives and ensure ML algorithms aligned with strategic goals, facilitating effective cross-functional teamwork
Contributed to Hong Leong Bank Malaysia's data-driven decision-making process, enabling personalized customer engagement, targeted marketing efforts, and sustainable revenue growth.
Senior Data Scientist
Techno Brain Group
03.2020 - 04.2021
Developed ML-Spark scripts processing thousands of images for prediction algorithm
Created synthetic data using Keras data augmentation for better training of the model
Developed a computer vision algorithm with Convolution Neural Networks (CNN's) TensorFlow and Keras for cattle recognition using the SIFT (Scale-Invariant Feature Transform) technique
Conducted experimental evaluations, demonstrating the superior performance of the proposed algorithm
Developed a generative AI model using GAN's which can generate high-quality natural images that develop gradually to generate more and more realistic looking data by coupling with an adversarial network
This framework not only has the possibility of generating very high-quality synthetic data but also it can be used to enhance pixels in photos, conversion of images from one domain to another
Achieved a high identification accuracy of 93.3% within a reasonable processing time
Outperformed traditional identification approaches, which achieved an identification accuracy of 84%.
Data Scientist
Avows Group
11.2019 - 03.2020
Developed a CNN project using Python to identify optimal tree crowns for paper-making in a paper mill
Applied convolutional neural network algorithms to determine suitable cutting points, enhancing efficiency in paper production
Implemented a precise algorithm, enhancing the paper-making process by automating the identification of ideal tree crown cutting points using convolutional neural networks and Tensorflow in Python
Employed a Python-based machine learning recommender model using Artificial Neural Networks and Random Forest algorithm to predict KAPPA values for RGE Group Indonesia, incorporating TensorFlow and Keras frameworks to address the complexity of the task
Collected and cleansed data using SQL(ETL), conducted a decade-spanning data analysis to anticipate optimal KAPPA values based on diverse parameters
Enhanced data visualization through a Tableau dashboard, while also crafting a deep learning model for precise KAPPA number predictions
Collaborated with business stakeholders and cross-functional teams to understand objectives and ensure ML algorithms aligned with strategic goals, facilitating effective cross-functional teamwork
Contributed to RGE Group Indonesia's data-driven decision-making process, enabling personalized customer engagement, targeted marketing efforts, and sustainable revenue growth.
Data Scientist
Automotive Robotics India Pvt Ltd
02.2017 - 11.2019
In the project "Caterpillar Engine Image Detection Using CNN" (Convolutional Neural Networks), my tasks included Exploratory data analysis for pipeline establishment, Choosing a network architecture and experimenting with design, Exploring pre-processing techniques to enhance model performance and Utilizing mind maps for process optimization and continuous improvement
Created synthetic data(Engine Images) using Keras data augmentation for better training of the model
The main achievement of the project was developing an algorithm to predict engine model numbers from provided images
For the same client, Caterpillar, I developed an additional machine learning algorithm for predictive maintenance of boat engines
This involved: Collecting data from various Engine Control Units (ECUs) installed on the engine, Employing Artificial Neural Networks to address this intricate challenge, Implemented predictive maintenance analytics using machine learning models built with Python.
Data Analyst
Infogem Web Solutions pvt ltd
11.2014 - 01.2017
As a data analyst for the project "Demand Planning and Inventory Management" at Siloam Hospitals, my responsibilities included: Extracting raw data and developing a Data Discrepancy report across different data sources, Migrating data from MySQL to Microsoft Excel Sheets and further processing it in Python for analysis
Used NLP in demand forecasting by analyzing text data for future demand
Utilizing the data discrepancy report to perform Pareto Analysis, classifying SKUs into top 70%, mid 20%, and low 10% categories based on their value and profitability, using SQL, Python, Scikit-Learn, Pandas, Matplotlib, Seaborn, and Power BI.