Yelp Insights : Data Visualization

Providing Business Owners with a Better Way to Compare Yelp Information

This project aims to provide a visualization tool that illustrates the correlations between the ratings of a particular business and the secondary features that it offers. We plan on using data from Yelp to provide insightful data about what makes a business successful. The primary purpose is to inform prospective business owners about what their business requires to be "successful" in a particular city or area.

This project is an entry for Yelp data contest and is created with the visualisation software Tableau.

Info Visualization
HCED 511
9 weeks

My Roles
Data Visualisation
Interface Design
User Testing

Collaborators
Ying Zheng
Nikhil Venkatesh
Wei-Hung Hsieh


Research


Hypothesis 1: Different kinds of businesses have different secondary features that contribute towards higher star ratings.

Hypothesis 2: For a given business type, the secondary features contributing towards higher ratings vary between different cities / locations.

These two hypotheses are the core of our project. For the purpose of this project, we plan to use the yelp academic challenge dataset. This gives us enough json data to work with. It is formatted well enough to be easy to deal with, but still requires cleaning and pre-formatting before it can be effectively used.

The final product will be a quick visual way to provide our users with a way to get a sense of the market in their location of interest. We believe this will be immensely useful to make business decisions at an early planning stage and will cut the time and resources necessary to conduct market analysis.





Key research question:

1. Is there a correlation between the extra features that a business offers and the ratings it receives on yelp? (eg. Free WiFi, coffee, quiet environments, easy parking, etc.)

2. What are these extra features that highly rated restaurants have in common?

3. Are these features universal or do they vary based on the type of location? Are some features more universal than others?

4. Are these features specifically tied to a certain type of restaurant or are they universal?




Data Profile

We analyzed the dataset to profile the data and identify what sort of formatting is necessary, and what potential challenges we would face. The data was provided as a json file. After converting the file to CSV format, we identified useful fields for our analysis.

NOTE: The points outlined below were gathered from a sample set of the two cities we are focusing on (Urbana-Champaign and Las Vegas) - approx. 24400 Rows.



Data Visualization


We listed all the variables available from the yelp dataset, and then identified the key variables that could help answer our research questions. We spent a few minutes brainstorming and sketching on paper. These ideas include using a map to show the clustering of different type of restaurants in the two cities, bar charts to illustrate restaurants' average ratings by attributes, and packed bubble graph to display the importance of the attributes, etc.



We started by designing the basic layout of the visualization in wireframe.



Based on the feedback from the first user test, we used Tableau to create the second iteration of the interface. In this version, we included household income data from Zillow into Tableau map to show more neighborhood information. We also eliminated attributes that our user said were not very necessary. To show the distribution of rating, we added a boxplot layered on top of individual restaurant rating above the map view. We also use different levels of red density to show the variance in star ratings.















Our final prototype includes a map view of the distribution of restaurants in the city, a scatter plot of restaurants aggregated by cuisine type, a stacked horizontal bar chart of WiFi availability of restaurants, and a line graph illustrating the relationship between price range and star rating moderated by noise level. We used color to represent different type of cuisines, so that users can easily compare the distribution of two or more cuisine types on the map.






Final Thoughts


We followed human-centered design process to create this visualization. In the ideation phase, we used descriptive analysis to examine the data profile, found out suitable variables for business decision making, and sketched paper prototypes. We conducted four interviews with our target users and 3 usability studies. Our prototypes evolved from paper drawings to clickable InVision mock-up, and eventually high fidelity prototype on Tableau. Following is some detailed analysis of our visualization, its limitations, and possible direction for future iterations. Therefore, this visualization is meant to serve as a starting point for market analysts use as a guide and build on.