The Stickiness of Restaurant Lists



Eater has a great article on a Yale-matriculated pair, Jane and Michael Stern, who traveled the United States reviewing regional and authentic food before the era of the Fieri. It's an interesting read, and it talks a lot about how food is viewed in the United States and what we value. One of my favorite passages is at the end,

"We see the nation's diet very much like its language," Jane and Michael write in The Lexicon of Real American Food, their 2011 guide to the strange words and beautiful phrases that American cuisine has introduced to the world, from Biloxi bacon to hanky panky sausages. The culinary idioverse of the United States is, "sometimes vexing for its vulgarity and its disdain of high-minded principles, but endlessly, endearingly exuberant."

America is a land of y'alls and yousguyses, of fresh ingredients and canned staples sitting side-by-side. It is, as the Sterns have chronicled, dynamically messy, and there's nothing more singularly wonderful than this patchworked quilt of flavors and textures, threaded with the stories of the waves upon waves of immigrants that make up our dining horizon. For those who look for the glory in the edible chaos, Jane and Michael Stern stand as a reminder that a cuisine that's rough around the edges is nothing short of beautiful.

Their website, based off of their books and travels, is an interesting list of restaurants known for their authenticity and for providing regional foods. This definition of 'regional' has really come at the expense of the 'immigrant America' talked about in the article. There are few ethnic food joints in the list, which favors traditional over new. I would even go as far as to say that some of the restaurants listed are vastly over-rated and over-priced.

What I also find disappointing is how similar all of these lists are to each other. In Diners, Drive-ins, and Dives, Guy Fieri visits many of the same restaurants from roadfood.com. In a city as large as Los Angeles it's baffling how these are the only restaurants selected, every time. My own experience shows that these places also aren't the best (subjectively speaking, of course).

Take for example Pie 'n Burger in Pasadena. A trip there is equivalent to paying quadruple the price for an In-and-Out burger (which surprisingly and thankfully is also listed by the Sterns). Pie 'n Burger is often a member of LA’s best burger lists and has a rich history. As a Pasadena native though, it's incredibly disappointing from an objective standpoint, and I wonder if these lists are just sticky, nostalgic. Pie 'n Burger takes only cash. It has 3.5 stars on Yelp.

How many other places like Pie 'n Burger are there? I wanted to break down their list and compare the restaurants based on their performance on Yelp. I don't think I have time in this segment for a detailed analysis of cuisine and how that varies, but the data is there, so that's a topic for another procrastination session. For this analysis, two main criteria stood out, the rating and the number of reviews. On roadfood's list there are 1,842 restaurants with an average rating of 3.65 with an average of 254.23 reviews per restaurant out of a total of 468,294 total reviews. Below is arguably the most important descriptor, the rating. You can search by state to see a chart of the distribution for that state.


The above charts don't make it easy to compare the distributions of states though, so I made the following graph, reminiscent of this New York Times graphic on 2016 candidates and their truthfulness.


I've sorted the states by the sum of their 4.5 and 5 star-rated restaurants. This gives one perspective, but perhaps another sorting algorithm would have been to just compare the five-star restaurants, or the number below three stars. One immediately sees that some states have a weird looking distribution... this is because for a few states there are few data points. The number of restaurants per state is an important metric. We can compare the number of people in each state by its population and how many restaurants there are per person. Below are the states ordered by how many restaurants they have, and you can hover over them to see the exact figure.


And these are the states ordered by population:


It's difficult to actually figure out anything from those two figures, so let's combine them. We know that there are about 322,363,000 people in the United States and 1,842 roadfood restaurants. This gives us an average of one restaurant per 175,007 people. Below is a graph of the average number of people per restaurant, with the national average the dividing line. You can hover over each bar to learn which state it represents.


Kansas barely registers, but it's there and is the closest to the national average. Looking at the more densely restaurant populated states, it seems as though many of them are in New England (Connecticut, Maine, and Vermont are the top three). Is that the case?

The explanation for this is that the Sterns are from the Nutmeg state (Connecticut), and it's easier to visit the restaurants that are closest to them. The larger states, such as California, New York, and Texas still finish strong in the total number of restaurants.

Let's get back to the original aim of figuring out the ratings of these restaurants. The state with the highest average rating is Hawaii, with an average rating of about 4.14 stars, and the state with the lowest average is Alaska, with only 2.25 stars. We can see this in the map below (hover over each state for the average rating). Most states have a pretty decent average, which is why I didn't separate the states by quantiles this time.

We can also take a look at the number of reviews each restaurant has. A large part of a restaurant being authentic is how hole-in-the-wall it is. Of course, as it gets media traction this reputation and description diminishes. And becoming popular isn't necessarily a bad thing either! The line graph below represents 1,842 restaurants. If you mouse over the chart, it'll be divided by state, so one can see the relative number of restaurants by state (as well as the average number of reviews).

The states with the highest average number of reviews are Washington DC with 1,227 reviews (DC is ignored for most of the rest of the analysis), Hawaii with 1,004 reviews, and then finally California with an average of 994 reviews. North Dakota in contrast only had an average of 7 Yelp reviews per restaurant.

Combining these two we have a scatter-plot. I wanted to investigate whether the two were correlated, whether a hole-in-the-wall restaurant (fewer reviews) tended to have a higher rating, and whether an over-rated restaurant (many reviews) tended to have a lower rating. Making the opacity low for each restaurant and then comparing the patches was supposed to lead to an answer.



There actually isn't much of a correlation, but it's awesome that Bi-Rite Creamery has so many reviews and yet still maintains a 4.5 star rating.

So out of all these suggested restaurants, I also think that Roadfood.com ignored some otherwise highly recommended places, like Chez Panisse. I personally haven't eaten at Chez, but it's known for starting California cuisine and is as important of a regional restaurant if any. Another snub is Cheeseboard, a Berkeley and Bay Area trend-setter. The point of roadfood.com though isn't to simply mimic the Michelin list, but I can't seem to figure out their system.

All of this really just brings me to my own recommendations for Berkeley. These were chosen because they had interesting flavors and a certain charm. There are a lot of great restaurants in my rotation, and a bunch of places excellent for lunch, but the ones selected below are for visitors in the area. I could list more, but here are my current top six (all vegetarian-friendly!):

  1. Easy Creole — An excellent creole place, and parent-approved! I'd make sure to try all of their hot sauces while there. You really can't go wrong with any of the options.
  2. Rangoon Superstar — Burmese food is nice because it's a mix between Indian, Chinese, and Thai. That's not to say that Burma (Myanmar) doesn't have it's own cuisine! My last two experiences with this restaurant have been lackluster, but I still return for the possibility of a repeat of my first taste of their samosa soup.
  3. Cheeseboard / Sliver — I’m sure somebody would be upset that I lumped these two together. Both are pizza shops with a special daily, vegetarian pizza.
  4. Makris Cafe — Now we get to the hole-in-the-wall places. I enjoy Makris because it's a Korean-Breakfast fusion joint. Their bulgogi burrito is probably the best in the world, but probably only because it's the only one that exists.
  5. Namaste Pizza — Indian pizza, my personal favorite is the aloo gobi.
  6. Ann's (now Bleeker Bistro) — This will forever be Ann's to me. The homefries here are the best I've ever had.

You can also check out the Yelp rating of any of the restaurants in the roadfood list below.



Notes on this Analysis

In comparison to my previous projects, this one required a bit more work on the programming side. I started off with the list of the restaurants that roadfood recommends (they provide that list here, conveniently available in .csv format). Then I used the Yelp API to try to find the Yelp rating of each one. Sometimes the API found it, but sometimes there were some hiccups, and I had to do a few things manually.

The roadfood list had come with a few details, albeit in a weird format, and so I had to use some regex to clean it up. Once I did, I had the lat/lon, the phone number, and the address of the restaurant, in addition to the name. I queried the Yelp API using the name and street address, and matched the restaurant if had it had the same phone number or the same name. I also had to train the program a little bit, based off of the first hundred or so results. One change was that I changed all instances of "Bar-B-Q" to just "barbeque." Another was for the names where I compared the versions without spaces (there was a case where it was "R D's drive-in" versus "RD's drive-in"). At this point I was hitting about 10% imperfect matches, but most were actually false negatives. I even discovered a typo in the phone numbers, but I have no idea which one of the two was actually accurate.

It would have been a lot of work to correct each mistake and hope that it also solved future problems, so I decided to implement a command-line question that asked me to manually confirm each time the Yelp API didn't have an exact match. This would increase the amount of work for me, but it just meant sitting at the terminal for a bit (this part actually was a lot more manual than I would have liked, and also took longer than expected). I didn't manually look up the restaurants on Yelp though, that would have been too much work. Through my manual process, I had to exclude 73 'bad' restaurants, or restaurants that didn't have a match on Yelp. I also had to exclude Canadian restaurants unfortunately. I manually approved about 80 or so restaurants, so in total my Python script found matches 92% of the time.

Once I had the data, I used d3.js to create the charts, as usual. I utilized some useful new libraries to help me. One of these was a suggestion engine for my input boxes called typeahead.js. Another was underscore.js, which was useful for finding specific items and sorting. Oftentimes throwing something into import.io speeds up the process of scraping a website. This only works for very simple pages though, and I wouldn't recommend their desktop application. For those looking for a good tutorial for learning d3.js, I used Scott Murray's for reference throughout this project.

This project was really to help me practice my d3.js skills, so I realize that a few of the visualizations are useless or inelegant. Building off on this, I'll spend more time on the design portion of the project, instead of building any and all visualizations. I wanted to add on to this even more, actually, but I felt that it was time to move on. For those of you interested in the data-set, please let me know. Some suggestions for visualizations are doing a K-S test, plotting individuals restaurants (the latitude and longitude are given), and adding box-and-whisker plots to the scatter plot.

This website looks best on the screen I used to code it (Chrome + Yosemite + 13"). Please send all comments / questions / critiques / suggestions to jmahabal@berkeley.edu. You can check out my Python code here. Thanks for reading!