Summary
Overview
Work History
Education
Skills
ACADEMIC PROJECTS
Timeline
Generic

Manish Karri

London,Ontario

Summary

  • Over 5 years of experience in Data Engineering with expertise in complex SQL and Python data analysis.
  • Strong proficiency in ETL (Extract, Transform, Load) processes, ensuring efficient and accurate data movement and transformation.
  • Skilled in Ruby programming, adding versatility to the skill set and enabling automation of data engineering tasks.
  • Proven record of designing, building, and optimizing large-scale data pipelines and processing frameworks.
  • Expertise in data modeling and warehousing concepts, working with tools such as Redshift, Snowflake, and BigQuery.
  • Adept at working with both SQL and NoSQL databases, including MySQL, MongoDB, and Cassandra.
  • Proficient in data visualization tools like Tableau and Power BI, including data cleansing, transformation, and data integrity.
  • Experienced in statistical methods and machine learning algorithms for data analysis and interpretation.
  • Strong communicator, facilitating effective collaboration with cross-functional teams and stakeholders.
  • Proficient in Apache Airflow for orchestrating data workflows and pipeline scheduling.
  • Active participant in code reviews, fostering a culture of collaboration among teams.
  • Knowledgeable in cloud-based data platforms like AWS, Azure, and GCP, with hands-on experience in services like S3, EC2, and EMR.
  • Proficient in Unix/Linux operating systems, efficiently navigating, administering, and optimizing data-related tasks.
  • Strong command-line expertise, enabling efficient data manipulation and system management.
  • Skilled in shell scripting using Bash and Python to automate data processing and ETL tasks.
  • In-depth knowledge of Unix-based tools and utilities like sed, awk, grep, and cron for effective data extraction, transformation, and scheduling.
  • Implementation of data quality checks to ensure data accuracy, completeness, and anomaly detection.
  • Experience in machine learning model deployment using tools like TensorFlow, PyTorch, and MLflow.
  • Knowledge of Agile development methodologies, DevOps practices, and CI/CD pipelines for efficient and scalable data engineering solutions.
  • Proficient in conducting A/B testing and statistical analysis to assess the effectiveness of marketing campaigns.
  • Strong problem-solving skills and a creative approach to developing data solutions.
  • Familiarity with data security and privacy measures, ensuring regulatory compliance.
  • Ability to work independently and effectively manage multiple projects.
  • Skilled in using R and Excel for predictive modeling and forecasting.

Overview

5
5
years of professional experience

Work History

Senior Data Engineer

Royal Bank of Canada
01.2020 - Current
    • Designed and implemented data ingestion pipelines using Hadoop and Spark to extract, transform, and load data from various sources, including structured, semi-structured, and unstructured data
    • Developed and maintained data models and architectures using SQL and NoSQL databases such as MySQL, MongoDB, and Cassandra, and optimized query performance using indexing and partitioning techniques
    • Implemented data processing and streaming pipelines using Kafka to enable real-time data processing and analytics and designed and implemented custom Kafka connectors to integrate with other systems
    • Developed and maintained custom Python libraries and packages for data processing and machine learning, and deployed machine learning models to production using Python and Spark
    • Designed and developed custom Power BI and Tableau dashboards to provide real-time insights and visualizations and integrated them with the data processing pipelines using REST APIs and webhooks
    • Designed and implemented data warehousing systems using Snowflake, Redshift, and BigQuery, and optimized query performance using techniques such as star schema design and query optimization
    • Deployed data processing and analytics systems on AWS and Azure using services such as EMR, HDInsight, and Data Factory, and implemented security measures such as VPCs, NSGs, and RBAC
    • Conducted data modelling and schema design using ER diagrams and UML and implemented schema evolution and data partitioning to enable scalability and flexibility
    • Conducted data profiling and analysis using tools such as Pandas, Dask, and PySpark, and developed custom Python scripts and SQL queries to automate data analysis and reporting
    • Developed and maintained disaster recovery and business continuity plans for data processing and analytics systems and conducted regular disaster recovery tests and simulations to ensure system availability and readiness
    • Implemented change data capture (CDC) mechanisms to capture and process real-time data changes, improving data accuracy and timeliness
    • Developed and maintained a data lake architecture to store and process large volumes of data, enabling efficient data storage and processing
    • Implemented data partitioning techniques to improve data processing efficiency and reduce query response times
    • Developed and maintained data pipelines to integrate third-party data sources, improving data completeness and providing additional insights for business stakeholders
    • Developed and maintained a data visualization platform to provide real-time insights into key performance metrics, improving decision-making processes for business stakeholders
    • Conducted data profiling and analysis to identify and prevent fraudulent activities, resulting in a reduction in fraudulent transactions
    • Developed and maintained a disaster recovery plan to ensure data availability in case of system failure, ensuring business continuity
    • Conducted performance tuning and optimization of data pipelines, resulting in a reduction in processing times and improved overall system performance
    • Provided technical guidance and mentorship to junior data engineers, ensuring best practices and high-quality code standards are maintained
    • Worked with business stakeholders to identify and prioritize data needs, resulting in the development of data-driven solutions to address key business challenges.

Data Engineer

Wells Fargo
08.2018 - 07.2019
    • Conducted data analysis using SQL and Excel to support business operations and developed and maintained dashboards and reports to provide insights to stakeholders
    • Conducted data quality assessments to identify and resolve data quality issues, resulting in a reduction in data errors
    • Developed and maintained predictive models using R and Excel to forecast trends and identify patterns
    • Conducted A/B testing and statistical analysis to evaluate the effectiveness of marketing campaigns, resulting in an increase in conversion rates
    • Developed and maintained data visualization dashboards using Tableau and Excel to provide real-time insights into key performance metrics, improving decision-making processes for business stakeholders
    • Developed and maintained data pipelines using SQL and Excel to extract, transform, and load data from various sources
    • Conducted ad-hoc data analysis and reporting to support business operations and identify areas for improvement
    • Worked with business stakeholders to identify key metrics and develop custom reports and dashboards to track business performance
    • Conducted data quality assessments and implement data cleansing and transformation techniques to improve data accuracy and completeness
    • Developed predictive models using R and Excel to forecast trends and identify patterns and present findings to business stakeholders
    • Developed and maintain data visualization dashboards using Tableau and Excel to provide real-time insights into key performance metrics, enabling stakeholders to make data-driven decisions
    • Worked with cross-functional teams to identify business requirements and develop data solutions to address those needs
    • Developed and maintained data dictionaries and data documentation to ensure consistency and accuracy across different data sources
    • Stayed up to date with the latest data analysis techniques and tools, and continually seek to improve data analysis and reporting processes
    • Developed and maintained automated reporting processes using VBA and other tools to streamline reporting workflows
    • Conducted root cause analysis to identify the underlying causes of business problems and develop recommendations to address those issues
    • Provided regular updates and presentations on data insights and trends to business stakeholders and communicate data findings to non-technical audiences
    • Conducted data segmentation and clustering to identify customer segments and develop targeted marketing campaigns
    • Analyzed and interpreted data to identify opportunities for process improvement, cost reduction, and revenue growth.

Education

BACHELOR OF ENGINEERING - COMPUTER SCIENCE

Jawaharlal Nehru Technological University
Hyderabad, India
06.2018

Skills

    Technical Skills:

    • Python
    • Java
    • Scala
    • Ruby
    • SQL (Structured Query Language)
    • NoSQL Databases (eg, MongoDB, Cassandra)
    • Apache NiFi
    • Apache Spark
    • Talend
    • StreamSets
    • Amazon Redshift
    • Snowflake
    • Google BigQuery
    • Entity-Relationship Diagrams (ERD)
      • Star Schema Design
      • Hadoop
      • Apache Kafka
      • Apache Hive
      • Apache HBase
      • AWS (Amazon Web Services)
      • Azure
      • GCP (Google Cloud Platform)
      • Tableau
      • Power BI
      • Git
      • Apache Airflow
      • Data Security and Compliance:

        • Understanding data privacy regulations (eg, GDPR, HIPAA)
        • Implementing security measures

ACADEMIC PROJECTS

Predictive analytics for customer churn: April 2018


- Analyzed customer data for a telecom company to identify customers who are likely to churn.

- Used techniques such as logistic regression, decision trees, or random forests to build a predictive model.

- Evaluated the model's performance using metrics such as accuracy, precision, recall, and F1 score.

- The project required data collection, cleaning, preprocessing, and analysis skills.

- It also required knowledge of machine learning and statistical techniques.

- Programming skills in languages such as Python, R, or Java are needed, as well as proficiency in data analysis and visualization tools.



Sentiment Analysis of Social Media Data Using NLP and Machine Learning: May 2017


- Collected data from social media platforms such as Twitter and analyzed it to determine the sentiment of users towards a particular topic.

- Used techniques such as natural language processing (NLP), machine learning, and text analytics to classify tweets or posts as positive, negative, or neutral.

- Visualized the sentiment distribution using tools such as Matplotlib or Tableau.

- The project required data collection, cleaning, preprocessing, and analysis skills.

- It also required knowledge of NLP, machine learning, and statistical techniques.

- Programming skills in languages such as Python, R, or Java are needed, as well as proficiency in data analysis and visualization tools.

Timeline

Senior Data Engineer

Royal Bank of Canada
01.2020 - Current

Data Engineer

Wells Fargo
08.2018 - 07.2019

BACHELOR OF ENGINEERING - COMPUTER SCIENCE

Jawaharlal Nehru Technological University
Manish Karri