In this notebook we analize the city neighbourhoods of Rome(Italy) for trying to find the best places for opening a new Pizza Shop.
As suggested by many resources about this subject, three of the most important things for choosing a new restaurant location are:
So, considering the Rome neighbourhoods informations we can retrieve from online sources, it is needed to discover insights for answering the following question:
where is it possible to find a place in Rome that it is at the same time easy to access, with a lot of visitors and where there is not too much competitions for a Pizza Shop?
To simplify the analysis we will consider tube stops only as places that realize the points 1 and 2 above. So the question to answer to is:
where is it possible to find a place in Rome that it is near a tube stop and where there is not too much competitions for a Pizza Shop?
The Foursquare API will be used to explore neighbourhoods in Rome. The explore function to get the most common venue categories in each neighbourhood, and then it will be used this feature to group the neighbourhoods into clusters.
The k-means clustering algorithm will let to create clusters of neighbourhoods and the Folium library to visualize the neighbourhoods their clusters.
Neighbourhoods information will be collected from wikipedia:
Tube stops informations will be collected from wikipedia:
The above pages scraping with BeautifulSoup library, and geopy library for converting an address into latitude and longitude, will provide the informations needed for using Foursquare API.
After creating the clusters of neighbourhoods we will be able to get the needed insights.
Both for neighbourhoods and tube stops dataset it is needed to convert “Latitude” and “Longitude” columns type to a numerical one (this is required by the Folium visualisation library)
Using the folium library we visualize the two datasets above on a geographical map of Rome (where the blue circles are the neighbourhoods locations and the red ones are the tube stops locations):
We start getting informations about neighbourhood venues related with food (restaurants, cafes, pizza places, etc…) using Foursquare API:
And we notice, grouping results per neighbourhood, that there are no Foursquare venues informations about three Rome neighbourhoods (related to the executed search of food related venues in a 500 meters radius of the neighbourhoods location):
So we won’t consider the three neighbourhoods written above for the rest of the data analysis and results.
After that we analize the frequency of venues categories per neighbourhoods finding the most common food related venue category for each neighbourhood:
We need to find similarities among the neighbourhoods based on food venues categories, so we cluster (that is element in the same cluster are more similar than elements in different clusters) them using the k-means clustering algorithms applied to the following dataset (obtained from the last table in the previous point):
The k-means clustering results are the following clusters (shown in the map below):
Cluster 1 (Red):
Cluster 2 (Purple):
Cluster 3 (Blue):
Cluster 4 (Light Blue):
Cluster 5 (Green):
Cluster 6 (Light Green):
Cluster 7 (Light Yellow):
Cluster 8 (Orange):
Using the Folium library we can visualize the above cluster in a map:
Now we use the haversine formula (from https://kanoki.org/2019/02/14/how-to-find-distance-between-two-points-based-on-latitude-and-longitude-using-python-and-sql/) to calculate the distance between each neighbourhoods (considered in the clusters informations) and the tube stops locations. For each neighbourhood we consider the tube stops in a radius of 500 meters only (the same radius used for the venues search with Foursquare API).
The following tables are the result of the merge between the neighbourhoods clusters by venues and the nearby tube stops:
Cluster 1 (Red):
Cluster 2 (Purple):
Cluster 3 (Blue):
Cluster 4 (Light Blue):
Cluster 5 (Green):
Cluster 6 (Light Green):
Cluster 7 (Light Yellow):
Cluster 8 (Orange):
An analysis of Rome neighbourhoods food related venues and tube stops has been done to identify the best places for opening a Pizza Shop.
where cluster 3, blue color, is composed of neighbourhoods with Pizza Place as the top common(food related) venues category.
With respect to the question
"where is it possible to find a place in Rome that it is near a tube stop and where there is not too much competitions for a Pizza Shop?"
it was found that there are some neighbourhoods that satisfies these requirements (see the last table in the "Results" section above).
These results should be interpreted with caution for the following reason:
The reliability of the results could be improved removing one of more of the limits written above for doing further research.
In the "Introduction" section of this report there is a hypothesis about the fact that the best place for a Pizza Shop is the one with less competitors and a lot of people passing nearby. After some analysis of the data (see "Data" section above) the best choices for a Pizza Shop (see the last table in the "Results" section above) seem to be the following neighborhoods:
As stated in the "Discussion" section It has been set strong limits to the hypothesis for simplifying the data analysis. I recommend to remove these limits for further research.