
Machine Learning Engineer with a strong background in data engineering and cloud solutions. Proven ability to architect scalable ML pipelines and implement advanced analytics, driving impactful insights and decision-making for diverse business needs.
· Led the design and implementation of an enterprise-scale data lake on Microsoft Azure, consolidating structured and unstructured data into Azure Data Lake Storage Gen2 (ADLS) and Azure Synapse Analytics, establishing a single source of truth for analytics and AI workloads.
· Architected and productionized machine learning and NLP models using Python, TensorFlow, and Scikit-learn, deploying batch scoring pipelines and real-time inference APIs through Azure Machine Learning and containerized services.
· Built scalable data pipelines using Azure Data Factory and Azure Databricks (Spark) to support data ingestion, transformation, feature engineering, and model training across multiple AI initiatives.
· Implemented end-to-end MLOps pipelines leveraging Azure ML pipelines, Docker, and CI/CD workflows, enabling automated model training, validation, deployment, and continuous monitoring in production environments.
· Standardized JupyterLab-based ML experimentation workflows within Azure Databricks and Azure ML, improving reproducibility, collaboration, and experimentation velocity for data science teams.
· Managed secure cloud infrastructure using Azure RBAC, Managed Identities, and ARM/Terraform-based infrastructure-as-code, ensuring controlled access across development, staging, and production environments.
· Designed optimized data schemas, feature stores, and analytical modelsusing Azure Synapse SQL pools and Databricks, reducing model inference latency and improving downstream analytics performance.
· Delivered executive-facing KPI, model performance, and risk analytics dashboards using Power BI, enabling data-driven decision-making for internal stakeholders and leadership teams.
· Developed end-to-end data pipelines on Microsoft Azure using Azure Data Factory and Azure Data Lake Gen2, integrating policy, claims, and customer interaction data from 10+ source systems, reducing data availability latency by ~50%.
· Built and trained machine learning models in Python (logistic regression, random forest, gradient boosting) using Azure Machine Learning, improving policy lapse and risk propensity prediction accuracy by ~15%.
· Implemented feature engineering and data transformation logic on large-scale insurance datasets (5M+ records), enhancing model stability and reducing feature drift during retraining cycles.
· Designed and deployed model inference pipelines using REST-based endpoints, enabling real-time scoring for underwriting and customer retention use cases.
· Performed exploratory data analysis (EDA) and cohort analysis to identify behavioral patterns across customer segments, supporting actuarial and marketing teams with data-driven insights.
· Designed and implemented end-to-end ML and analytics pipelines on AWS, using Python, AWS Lambda, Glue, EC2, and Redshift, reducing manual reporting and data processing effort by 60%+across multiple enterprise clients.
· Performed large-scale exploratory data analysis (EDA) and feature engineering on datasets exceeding 20+ million records, uncovering behavioral patterns that improved targeted marketing effectiveness by ~25%for retail and banking clients.
· Built and optimized machine learning models using regression, classification, clustering (K-Means), Random Forest, and anomaly detection techniques, enabling early fraud detection and reducing revenue leakage by ~12% for a global financial client.
· Developed automated data ingestion and transformation workflows using advanced SQL, AWS RDS, Redshift, and AWS Glue, improving query performance and data refresh cycles by ~40% while maintaining high data integrity.
· Applied statistical analysis and hypothesis testing (A/B testing, regression analysis) to evaluate product and campaign performance, driving a 15% lift in campaign ROI through data-driven recommendations.
· Designed and deployed cloud-based dashboards using AWS QuickSight, transforming legacy Excel reports into real-time analytics solutions and increasing business adoption by ~80%.
Implemented data quality frameworks and automated validation checks, reducing data inconsistencies by 35% and maintaining 98%+ data accuracy in production ML pipelines.
· Ensured data governance and compliance by implementing access controls, validation rules, and audit-ready workflows, reducing audit preparation time by ~40%.
· Delivered 15+ analytics and ML initiatives on or ahead of schedule by coordinating with engineering, QA, and product teams under Agile delivery models.
· Collaborated with data warehouse and analytics teams to design dimensional data models (Star & Snowflake schemas) for enterprise reporting, supporting 10+ business subject areas and improving analytical query performance by 30–40%.
· Analyzed business requirements and stakeholder data needs, translating them into logical and physical data models by identifying key entities, attributes, and relationships across functional domains.
· Designed and maintained logical and physical data models using ERwin, performing forward and reverse engineering to restructure schemas and apply optimized DDLs to production databases.
· Built Azure-based ETL pipelines using Azure Data Factory, ingesting data from Oracle, SQL Server, and flat files, processing millions of records per day into curated analytics layers.
· Performed data profiling, validation, and quality checks, identifying anomalies and improving data accuracy by 25%, which increased reliability of downstream ML models and reports.
· Developed feature-ready datasets by applying transformations, aggregations, and joins, enabling data scientists to reduce model preparation time by 40%.
· Created and optimized materialized views and indexing strategies, reducing complex query execution times from minutes to seconds for BI and ML workloads.
· Assisted in data collection, cleaning, and preprocessing using Python (Pandas, NumPy) and SQL, preparing structured datasets for machine learning experiments.
· Supported the development of basic machine learning models(regression and classification) using Scikit-learn, helping evaluate model performance through standard metrics.
· Performed exploratory data analysis (EDA) to identify trends, patterns, and data quality issues, contributing insights for feature selection and model improvement.
· Worked with senior engineers to integrate ML outputs into backend workflows, gaining exposure to how models are consumed by applications and reports.
· Documented experiments, data assumptions, and results, following Agile practices and collaborating with cross-functional teams.
Programming & Query Languages: Python (NumPy, Pandas, Scikit-learn, TensorFlow, Keras), SQL, Scala, R, Bash, Shell Scripting, UNIX
Big Data & Distributed Processing: Apache Spark, PySpark, Spark Streaming, Apache Flink, Kafka, Kafka Streams
Data Engineering & Pipelines: ETL / ELT Pipelines, Batch & Streaming Processing, dbt, Data Profiling, Data Lineage, Schema Evolution
Cloud & DevOps: AWS, Azure & Azure DevOps (CI/CD),Docker
Data Architecture & Modeling: Lakehouse Architecture, Event-Driven Architecture, OLAP & OLTP Systems, Star & Snowflake Schemas, Dimensional Modeling, Partitioning & Compression, Materialized Views
Databases & Warehousing: Snowflake, Amazon Redshift, Azure Synapse, Oracle, MySQL, PostgreSQL, MongoDB, AWS RDS
File Formats & Storage: Parquet, Delta Lake, Avro, JSON, CSV, XML
Machine Learning & AI: Feature Engineering, Regression, Classification, Clustering (K-Means), Random Forest, SVM, Bayesian Models, Neural Networks, NLP, Model Training & Evaluation
ML Platforms & MLOps: TensorFlow, MLflow, Databricks Feature Store, Model Scoring Pipelines, Model Monitoring
Generative AI & LLMs: LangChain (RAG), Embeddings, Prompt Engineering, REST-based LLM Integration
Visualization & BI Tools: Tableau, Power BI, Amazon QuickSight, Looker, Mode Analytics, Athena, Matplotlib,Plotly