Summary
Overview
Work History
Education
Skills
Timeline
Generic

Arshdeep Singh Chudhey

Toronto,ON

Summary

Experienced ML Engineer with over 2 years of expertise in developing and deploying NLP-based systems, including advanced chatbots and deep learning models, to enhance accuracy, reduce processing time, and boost customer engagement. Proven success in implementing RAG architecture, PEFT techniques, and deploying scalable, low-latency ML systems for finance, healthcare, and environmental applications. Highly proficient in cloud deployment, data preprocessing, and optimizing processes through advanced tuning techniques for real-time applications and large-scale data processing.

Overview

5
5
years of professional experience

Work History

Data Scientist

Oracle Canada INC
Canada
09.2023 - Current
  • Deployed and tuned multiple language translation models, such as M2M-100 and DeepL translator, to enhance a virtual meeting tool. Facilitated real-time translation for speakers of over 25 languages.
  • Developed and implemented an NLP system utilizing Transformer-based techniques such as topic modeling and LDA, enhancing content navigation for a 20% increase in user engagement.
  • Designed and implemented a highly efficient Retrieval-Augmented Generation (RAG) system that utilized large language models (LLMs) to accurately generate responses from over 100,000 words of meeting content.
  • Enabled users to achieve a 75% reduction in post-meeting recap time by leveraging prompt engineering methods to incorporate LLM models, such as GPT, within virtual meeting environments.
  • Implemented FAISS to expedite the retrieval of meeting databases, resulting in a 60% reduction in query response time and increased overall user efficiency.
  • Architected an effective live streaming solution using AWS Kinesis Video Streams and created an audio channel in AWS Elemental Media Live for real-time audio retrieval from meetings.
  • Integrated Lang Chain to manage data flow and enhance the processing efficiency of language models for tasks such as summarization and question answering from the meeting content.
  • Processed and merged data from four different sources into one comprehensive database, performing data cleaning and preparation tasks for over 50 company-wide datasets.
  • Iterated and prototyped various system design architectures to enhance virtual meetings, using Python and R to evaluate and select optimal LLM models like GPT-4 and BERT for improved project performance.
  • Presented research findings and authored comprehensive documents and reports to effectively communicate complex project results to peers, management, and the wider public, enhancing organizational understanding of the data.

Junior Data Scientist

Dastute Consulting
Canada
11.2022 - 08.2023
  • Implemented an internal Optical Character Recognition (OCR) model using convolutional and recurrent neural networks for feature extraction and sequence prediction, streamlining document processing and saving over 20 hours weekly for approximately 10 members of the finance team, significantly improving operational efficiency and productivity.
  • Implemented transfer learning using pre-trained models like ResNet and integrated attention mechanisms to enhance the accuracy and robustness of the OCR model.
  • Compared the results of the developed OCR model with existing models such as Tesseract and PaddleOCR, achieving a 15% higher accuracy as measured by character error rate (CER) and word error rate (WER), significantly improving overall efficiency.
  • Designed and implemented an NLP-based system for extracting structured data from OCR model outputs, enhancing data quality and reducing processing time by 25%.
  • Implemented Spacy and Regex-based NLP solutions to analyze texts from diverse invoices and documents, ultimately boosting accuracy by 20% while minimizing false positive rates by 15%
  • Cleaned and organized large Invoice datasets, employing noise reduction and skew correction to optimize pattern and trend recognition for over 50,000 unique documents.
  • Developed a streamlined AWS pipeline utilizing Amazon EC2 instances with NVIDIA GPUs, AWS Lambda, and Amazon S3, achieving processing of over 10,000 documents per hour while minimizing costs through efficient resource allocation and autoscaling.
  • Developed and implemented machine learning models to improve customer retention for a trucking client, identify at-risk customers, and tailor retention efforts, resulting in a 15% increase in customer retention.
  • Created and implemented predictive models to improve customer acquisition, which identified potential leads, resulting in a 10% increase in new customer acquisition.
  • Developed and implemented NLP models to enhance customer service interactions, automate support processes, and improve response times.

Data Analyst

Infosys
India
02.2021 - 08.2022
  • Achieved a 30% improvement in the integrity of large-scale transaction datasets through data cleaning and preprocessing, resulting in enhanced accuracy of fraud detection models for MUFG Union Bank.
  • Identified and engineered key features including transaction frequency, average transaction amount, and deviations from typical behavior resulting in a 25% increase in model precision and recall.
  • Developed and implemented advanced machine learning models, like Random Forests, Gradient Boosting Machines (GBMs), and hybrid models with supervised and unsupervised approaches such as Isolation Forests and Autoencoders resulting in a 98% accuracy rate for detecting fraudulent transactions.
  • Implemented fraud detection model deployment in a real-time processing environment utilizing AWS EC2, Lambda, and Kinesis. Resulted in the efficient processing of 1 million daily transactions with sub-200 millisecond latency.
  • Continuously monitored and optimized model performance using live feedback loops and periodic retraining, resulting in a 20% decrease in false positives while adapting to changing fraud patterns.
  • Employed Python and Tableau to assess user behavior on the NPTEL platform and successfully reduced user dropouts by 30%, while evaluating the learning impact for over 20,000 users.
  • Utilized Python libraries to enhance user engagement by 15% and boost conversion rates by 10%, by refining user journey maps and prototypes based on A/B test insights.
  • Enhanced system performance by implementing rigorous RESTful API monitoring and optimization methods, resulting in a 35% improvement, while also reducing support tickets by utilizing user feedback.

Data Science Intern

AIG Hospitals
India
09.2020 - 01.2021
  • Conducted a study analyzing time saved and efficiency improvements with pneumatic shoots for sample transfer in hospitals. Highlighted a 15% reduction in sample transfer time, optimizing workflow, and accelerating diagnostic processes.
  • Compiled and modeled discharge time delay data of patients, identifying trends and key factors contributing to delays. Utilized Tableau for data visualization, resulting in actionable insights that improved hospital discharge processes.
  • Directed field studies and data collection for a sophisticated analysis of heart disease patients suffering from Rheumatic Fever.
  • Employed machine learning models, such as Logistic Regression and Random Forest, to enhance early detection and treatment strategies by identifying critical predictors of disease progression
  • Automated scoring process for Liver-Steatosis Score data by applying transfer learning and deep learning techniques. Utilized Inception-ResNet V2 model to extract features, followed by application of Lasso regression for accurate Steatosis scores, enhancing diagnostic precision and efficiency.

Data Science Intern

Defense Research Development Organization
India
02.2020 - 08.2020
  • Successfully extracted features such as natural vegetation, average rainfall, and average slope using remote sensing techniques from ArcGIS and QGIS. Analyzed over 500 region-specific maps across 10 different terrains, enhancing dataset richness by 35%.
  • Modeled environmental and geographical data to predict land suitability for agriculture, employing regression analysis and machine learning algorithms. This modeling effort provided actionable insights for land management and agricultural planning.
  • Implemented advanced hyperparameter tuning and data extraction techniques, resulting in a 15% enhancement in model accuracy and a 20% reduction in prediction errors.

Education

Masters in Applied Data Science -

Carleton University
Ottawa, ON
03-2024

Bachelors of Computer Science -

Jaypee University of Information Technology
Solan, HP
05.2021

Skills

  • Programming Languages: Python, R, MATLAB, SAS, C#, JavaScript
  • Machine Learning & AI: Large Language Models (GPT-3, GPT-4, T5, Llama, TAPEX, BERT), Generative AI, Prompt Engineering, Neural Networks, Ensemble Models, Multimodal ML, Reinforcement Learning, Explainable AI (MuZero, CLIP, Wave2Vec)
  • NLP: Natural Language Processing (NLP), Computer Vision (OpenCV, Pillow), Speech Recognition (Whisper), Text Processing (NLTK, SpaCy, Gensim)
  • Deep Learning Frameworks: PyTorch, TensorFlow, Keras, HuggingFace, LangChain
  • Data Science Tools: Pandas, NumPy, Scikit-Learn, Matplotlib, Seaborn, SciPy, XGBoost, LightGBM
  • Cloud & Deployment: AWS (S3, EC2, Lambda, SageMaker), Azure (Azure ML, Databricks), Google Cloud Platform (BigQuery, Dataflow), Kubernetes, Amazon ECS, MLFlow
  • Databases: MySQL, PostgreSQL, Cassandra, MongoDB
  • Data Visualization: Power BI, Tableau
  • Big Data & Distributed Computing: Pyspark, Hadoop
  • DevOps & Project Management: Git, Bitbucket, Jira

Timeline

Data Scientist

Oracle Canada INC
09.2023 - Current

Junior Data Scientist

Dastute Consulting
11.2022 - 08.2023

Data Analyst

Infosys
02.2021 - 08.2022

Data Science Intern

AIG Hospitals
09.2020 - 01.2021

Data Science Intern

Defense Research Development Organization
02.2020 - 08.2020

Masters in Applied Data Science -

Carleton University

Bachelors of Computer Science -

Jaypee University of Information Technology
Arshdeep Singh Chudhey