Summary
Overview
Work History
Education
Skills
Timeline
AccountManager

Jingxuan Hou

Student
Santa Clara

Summary

Currently a third-year student from Santa Clara University with a declared major of Computer Science and minor of Mathematics since freshman year. Looking to declare an emphasis in Data Science. Have developed a strong interest in programming, math, and working with data throughout years of systematic training as a computer science student, and consequently possess the patience and enthusiasm for delving further into these realms in real-world context.

Proficient with python, fluent in C++, familiar with Java, and have experience with MATLAB and Scala. Have rich programming experience in simple game design, web app development, data analysis.

Looking forward to both strategy- and diagnosis-oriented tasks, and to working on projects that could build towards the data scientist profession.

Overview

3
3
years of post-secondary education

Work History

Speech Emotion Recognition (SER)

Class Project

Explores deep-learning in ML that utilizes the Multilayer Perceptron model to study how the 180 extracted features from audio files map to 8 categories of emotions. The database used was the RAVDESS database which contains 1435 samples of speeches performed by 24 actors, each labeled (numerically) with a particular type of emotion. The features were extracted using the librosa library and were mainly characterized by three dominant audio features: MFCC, chroma, and MEL, each containing its own array of sub-features and together summing up to a total of 180 features. Achieved an accuracy of 56%.

Anime Recommendation Engine

Class Project
2023.02 - 2023.03

Led a team project for the Web and Data Mining class that builds an anime recommendation engine with the aid of class association rule mining on user data.

  • Dataset retrieved from kaggle and contains 1048576 rows of user rating history on different animes.
  • Proposed to develop the recommendation model based on association rule mining, which relates each user to his/her highest-rated animes, an approach not explored yet in existing models.
  • Coded in python, the project is divided into three main parts: data transformation (create user-anime baskets), applying mining algorithm (find most frequent user-anime combinations), and mapping (maps retrieved anime ids to their titles based on available dataset).
  • Assistive visualizations included to display overall distributions on anime ratings across the user pool, with statistical tools applied for mathematical analysis.

Dynamics in the World Stock Exchange Market

Independent Project

Conducts PCA on international stock exchanges that seeks to discover dynamics within the world stock market. Database contains stock exchange data for 9 countries across 536 time points from January 5, 2009 to February 22, 2011.

  • Conducted EDA on extracted data matrix: visualized correlation and covariance matrices using matplotlib and created pairplot using seaborn to conduct model-free observation on covariance among variables.
  • Conducted SVD to see if the variance of the stock market could be accounted for by one dominant factor or if stock in different countries functioned independently. Final displayed scree plot indicated the presence of a dominant component with significantly greater value than other components, showcasing the interactive
    nature in the world stock market.
  • Manually designed PCA algorithm in consideration of the limitation of sklearn PCA class in producing weights for variables in each
    component.
  • The two primary components were visualized in a time-series plot that reveals how they vary through time.

Link to database: https://archive.ics.uci.edu
/ml/machine-learning-databases/00247
/data_akbilgic.xlsx

Seoul Bike Sales Prediction

Independent Project

This project applies the general linear model (GLM) to a database on the sales of bikes in Seoul with the aim of predicting future sales based on values for selected features including seasons, rainfall, and temperature in the region. The highlights of the project can be summarized as follows:

  • From a theoretical perspective, the use of GLM in this project provides deeper insight into the application of econometric model in machine learning.
  • From an engineering perspective, a major part of the project is feature engineering: the contribution of different features to the overall variance of the dependent variable was evaluated through the R-squared value; helps accentuate the importance of effective feature selection.
  • From the perspective of EDA, it provides a new understanding on the common visual representation of a design matrix as observation against regressor, which adds on to data visualization skills.
  • Compares the GLM and OLS though the statsmodels library.

Link to database: https://archive.ics.uci.edu/ml/machine-learning-databases/00560/SeoulBikeData.csv

Denoising: Stravinsky Portrait Image Compression

Independent Project

This project revolves around image extraction that seeks for an optimal way of filtering a portrait of Stravinsky by Picasso through 2 techniques: image convolution and low-rank approximation.

  • The convolution algorithm was encoded manually that improves upon the convolve2d function in scipy by allowing for an arbitrary number of layers of the input image. In this way it better approximates the multilayered reality in most image processing applications.
  • The kernel matrix used for the convolution was also created manually through an algorithm that allows for arbitrary size and width values; since large size blurs the image and small size leads to time-consuming calculations, the challenge here was to achieve a balance through an appropriate choice of argument values. The size and width of the kernel were in this application specified as 55 and 23, respectively (the numerical values were obtained empirically).
  • The low-rank approximation technique reconstructed the filtered image with the first 80 layers of the original image matrix calculated through SVD.
  • The results of the two techniques were compared by displaying the two error matrices in which each grid represents the squared error between the original and the filtered image. It turns out that convolution produced an error significantly less than the low-rank approximation.

The second part of the project seeks to evaluate the effectiveness of SVD in denoising (projecting out the noise from an image) that involves the following steps:

  • A noise matrix was created using the sine wave grating function with a frequency of 0.02 and a rotation by 30 degrees.
  • The contaminated image was constructed by adding the noise to the original matrix with appropriate scaling applied.
  • SVD was conducted to the collapsed image to find the components that contribute to the noise
  • The reconstruction took place by leaving out the noise components to achieve a relatively clean image. However, the presence of imperfection of the cleaned image shows that SVD is limited in fully revealing the components that contribute to the noise of a matrix.

Source of image: https://upload.wikimedia.org/wikipedia/en/1/1c/Stravinsky_picasso.png

Web Application Development

Independent Project

Developed an interactive web application that simulates a pizzeria using Django and deployed it using Heroku app. The website models on the Uber Eats platform and allows users to create and edit their own pizza menus and enrich their menus by filling out details about toppings for each pizza.

The main challenge of the project lies in its overall architectural design, which could be summarized as follows:

  • Initializing the project app by setting up virtual environment and migrating the sqlite database; the virtual environment allows for a cross-platform environment in which to run the web app.
  • Layout of web page, form, resources through various modules, including settings, apps, models, forms, views.
  • Creation of forms and the interaction generated through get and post requests.
  • Customized web design using the bootstrap framework that helped render the templates in HTML.
  • Enhancing web security by protecting user privacy and defending against hackers by raising 404 and 500 errors when appropriate and hide debugging information.
  • Final deployment of the web app using Heroku and git command; learned about how git keeps track of snapshots of the deploying process.

Link to website: https://centaur-pizzeria.herokuapp.com/

Game: Alien Invasion

Independent Project

The Alien Invasion project is a python-coded game developed using pygame. The player controls a ship that shoots bullets towards falling alien fleets. While relying on the classical Nintendo game Galaxian as its benchmark, the game project also displayed some improvements through the following aspects:

  • Added on to the gamification that enhances user experience by introducing intelligence elements such as allowing for virtual communications among the alien agents.
  • Enhanced dynamics of the gaming environment through additional interactive elements such as falling balloons that could be shot to explode into bombs that further destroy the alien objects.
  • Introduced reward system through scoring.

The overall game design follows the OOP methodology.

Office Hour Simulation

Class Project

Data structure class project that develops a simulation program in C++ for office hour visit based on assumptions of the order of student arrival, duration of office hours as dependent on presence of students, arrival rate, and service rate. The
project may be summarized as follows:

  • Framed based on operations research problem that models on the M/M/1 queuing system (students stored in queue; professor as server), randomly generating values for arrival rate based on Poisson distribution and service rate based normal distribution.
  • Simulation embedded in a for loop with 100 iterations and outputs the average time a student spends waiting during an office hour visit.
  • Automatically generates a report stored as a map structure that associates each student with the topic he/she asks, sorted alphabetically using insertion sort algorithm based on student name.
  • Improved abstract thinking by applying various data structures and algorithms as part of problem-solving skills.

Puzzle Solving

Class Project

Data structure class project that develops a C++ program that searches from a given word puzzle for all words of length 6 or more contained in a given dictionary:

  • The puzzle contains a string of 120,000 lower-case letters that were extracted from a text file; the challenge of the process was to convert the extracted content into a multidimensional array with appropriate dimensions (set to be 40x3000) that optimizes the searching process.
  • The dictionary words were stored in an unordered_set.
  • Develops a searching algorithm for multidimensional arrays that uses recursion to locate all dictionary words with 6 or more letters that appear in the puzzle grid horizontally, vertically, or diagonally and either forwards or backwards (but no wrap-arounds).
  • The challenge of the algorithm design was to effectively translate abstract thinking process into line-to-line codes; exercised the mind for developing detailed line of thinking and breaking complex problem into multiple cases. An overall good exercise on a programmer's mindset.

Doublet Puzzle Solving

Class Project

A data structure class project that develops a C++ program that solves doublet puzzles by representing given information as a graph. A doublets puzzle is a pair of words of the same length. To solve the puzzle, one must produce a sequence of words such as hate have lave love. In this sequence, the first and last words are the given words, and adjacent words differ at exactly one position (so all words must have the same length). The project was based on Donald Knuth's dictionary which contains 4500 common 5-letter words. The problem was solved by representing the given dictionary as a graph with 4500 vertices (representing the words) and whose edges connect two words differing at exactly one position. The algorithm performed a breadth-first search starting with the first word until the second word is found. The project demonstrates the importance of data structures in terms of algorithm complexity, for by the introduction of graph representation and BFS the time was reduced from factorial order to O(V+E).

Tic Tac Toe: Computer Simulation

Independent Project

A Java program that simulates the tic-tac-toe game which when run automatically produces a sequence of XO tables that display each move of two players. The challenge of the project lies in the following three aspects:

  • Dynamic handling of multidimensional arrays, which is the most extensively used structure throughout the program; this includes determining whether collinearity occurs horizontally, vertically, or diagonally.
  • Developing adaptive algorithm that allows for an arbitrary size of the XO table (not limited to the traditional 3x3) specified through system variables at runtime. Enhances the level of challenge of the game.
  • Developing three contest scenarios human-to-human, human-to-computer, computer-to-computer (in which the moves are randomized) as a way to create a user-defined game that allows for flexibility.

Education

Bachelor of Science - Computer Science And Mathematics

Santa Clara University
500 El Camino Real, Santa Clara, CA-95053
2020.09 - Current

Skills

Python/C++/Java/MATLAB/Scala

undefined

Timeline

Anime Recommendation Engine

Class Project
2023.02 - 2023.03

Bachelor of Science - Computer Science And Mathematics

Santa Clara University
2020.09 - Current

Speech Emotion Recognition (SER)

Class Project

Dynamics in the World Stock Exchange Market

Independent Project

Seoul Bike Sales Prediction

Independent Project

Denoising: Stravinsky Portrait Image Compression

Independent Project

Web Application Development

Independent Project

Game: Alien Invasion

Independent Project

Office Hour Simulation

Class Project

Puzzle Solving

Class Project

Doublet Puzzle Solving

Class Project

Tic Tac Toe: Computer Simulation

Independent Project
Jingxuan HouStudent