Summary
Overview
Work History
Education
Skills
Languages
Certification
Timeline
Generic

Assane Niang

Montreal

Summary

Proven Big Data Engineer with a track record of deploying complex data pipelines and AI solutions. Expert in Pyspark, Scala, SQL and data visualization, demonstrating strong analytical skills and a knack for optimizing data flow and machine learning model efficiency. Achieved significant improvements in data processing and business intelligence reporting.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Big Data Engineer and AI Engineer

AgileDSS
09.2023 - Current

creation of a RAG-based chatbot to find the candidates who most closely match a job description.

  • Scrapping of job offer data
  • Retrieval of candidate CVs from sharepoint
  • Document creation and tokenization
  • Embedding documents with text-embedding-3-small
  • Vectorization and storage of vectors in Qdrant vector database
  • Use of a RAG to retrieve the documents most similar to the offers

Job migration from Jar to Notebook SQL under Databricks.

  • Using a medallion architecture for data pipelines
  • Converting Scala job specs to SQL
  • Defining unit tests
  • Deploying jobs with Databricks asset bundle
  • Creation of service principal token and his spark configuration
  • Creation of workflows and tasks associated with each pipeline
  • Orchestration of pipelines with Airflow
  • Define the architecture diagram for each pipeline
  • Activating and using unity catalogue for data governance
  • Metadata management during merge


Consultant Big Data Engineer

Amadeus IT Group
10.2021 - 09.2023

Implementation of data pipelines on large datasets to build performance indicators (KPIs) for travel agencies and airlines.

  • Creation of definition specifications in Confluence
  • Data transformation: Scala, Spark SQL
  • Deployment of jobs using MapR (OnPremise)
  • Orchestration of workflows with Apache Oozie
  • Display of KPIs in QlickSense for visualization purposes
  • Storing data tables on Hive and MongoDB
  • Setting up dashboards in prometheus and Grafana for alerting purposes
  • Pod management (worker and driver) in OpenShift for pipelines orchestrated with Airflow
  • Maintaining the CICD for versioning (bitbucket, artifactory, Jenkins)
  • Creation of unit tests and functional unit tests (scala test)
  • Defining test plans for testing API rest from Postman collections
  • Knowledge transfer to new comers

Migrating pipelines to the Azure cloud and developing new Jobs/Dashboards on large datasets in a Scrum/Safe environment.

  • Participating in the collection and analysis of customer requirements
  • Development of data processing pipelines in batch and streaming mode
  • Industrialization (CI/CD, tests, monitoring)
  • Migration of pipelines for building 'KPIs' to an AZURE Cloud environment
  • Data mapping on large volumes of data
  • Building the architecture for data integration and cleansing in a big data platform
  • Test connectivity between a kafka cluster in Azure and Databricks (scope creation, ACL management with Databricks CLI)
  • Deployment of configuration files under DBFS
  • Orchestration of jobs with Airflow (DatabricksRunNowOperator) and Databricks Scheduler
  • Job management with Databricks workflow and Databricks api 2.1
  • Creation of datasets and containers in azure data lake gen2
  • Data management with Collibra
  • Dashboard development for data visualization and BI reporting
  • Maintaining the data management platform
  • Analysis of Back-End Applications

Data Scientist

Danone
03.2021 - 09.2021

Classification of customers according to their consumption of Danone products and their professional social profiles

  • Study of clustering algorithms for streamed data on Big Datasets
  • Extracting and joining data from different databases
  • Mapping and development of data pipelines on Big Datasets
  • Analyzing and visualizing data for BI reporting
  • Defining an application scenario for machine learning models
  • Implementing algorithms for customer classification (streamingKmeans, Birch, DenStream)
  • Testing the algorithms on simulated and real data

Data Scientist

ENGIE
04.2020 - 09.2020

Implementation of a machine learning model to monitor the performance of wind turbines

  • Extraction of wind turbine data (ODBC)
  • Exploratory data analysis
  • Data transformation and cleansing
  • Data projection using PCA
  • Feature Engineering
  • Testing different machine learning model (RandomForest, Xgboost, SVM, ...)
  • Choosing the best model regarding metrics (RMSE, MAE,..)
  • Model Validation with business requirement
  • Building dashboard to follow model metrics during the re-training (Shiny, D3JS)

Education

Master of Science - Statistics And Data Sciences

Université Grenoble-Alpes
Grenoble, France
09.2021

Bachelor of Science - Applied Mathematics And Computer Science

Univesité Montpellier
Montpellier, France
07.2019

Skills

  • Language: Pyspark, Scala, SQL, NoSQL, Python, R, JavaScript, Bash, Shell, Xml, Php, Css
  • Big Data Hadoop, HDFS, Hive, Spark, Hue, Oozie, Airflow, Yarn, DBFS, HDFS
  • Database: SGBDR MySQL, SQL Server, PostgreSQL, MongoDB
  • Cloud: Databricks, Azure, Microsoft Fabric, Azure Synapse, Azure Datafactory
  • Devops & CICD: Jenkins, Maven, Git, Docker, Kubernetes, Azure Devops, Dabs, OpenShift, Terraform
  • Datavisualization: QlickSense, R Shiny, Grafana, Power BI
  • Web Service: REST(Flask, Fast API), Postman
  • Data Gourvernance: Collibra, Unity Catalog
  • Genrative AI & LLM: OpenAI, Azure OpenAI, LangChain, Vector Store
  • Mlops: Feature Engineering, Mlflow, Monitoring
  • Data Warehousing

Languages

English
Full Professional
French
Native or Bilingual

Certification

  • Microsoft Azure Certified: Fabric Analytics Engineer Associate – 2024
  • Databricks Certified: Data Engineer Associate Lakehouse – 2023
  • Microsoft Azure Certified: Data Engineer Associate – 2023
  • SAFE Scalable Agile Framework – 2022

Timeline

Big Data Engineer and AI Engineer

AgileDSS
09.2023 - Current

Consultant Big Data Engineer

Amadeus IT Group
10.2021 - 09.2023

Data Scientist

Danone
03.2021 - 09.2021

Data Scientist

ENGIE
04.2020 - 09.2020

Master of Science - Statistics And Data Sciences

Université Grenoble-Alpes

Bachelor of Science - Applied Mathematics And Computer Science

Univesité Montpellier
Assane Niang