How to Improve Holiday Itinerary with Machine Learning
In October 2019, my friends and I are planning our awesome holiday in Greece and Amalfi. The way we plan our perfect holiday itinerary is by using Google Map. We search for the places to visit and pin markers on Google Map for attractions that might interest us.
For me, to optimise our vacation to the fullest, as to cover as much ground as possible; I have three questions:
- How many days should I spend at one location?
- Which attractions/locations should I visit each day?
- What is the most optimal route from a place to another?
Let me show you the itinerary my algorithms have proposed for us. It is recommending the best sequence of places to visit, day by day.
- Start from Athens, visit the Acropolis and other historic archaeological sites. That’s a great start!
- Travel to Santorini, a beautiful island with whitewashed homes and picture-perfect sunsets.
- Sail to Crete, for a road trip adventure of a lifetime, by exploring hidden corners and tasting the best of Greek cuisine.
- Fly to Italy and travel to Amalfi and Ravello where the terraces are blending along the cliffs with the most spectacular coastal scenery.
- Travel to Sorrento and Naples, indulge in the best pasta, pizza, limoncello, and gelato.
Great holiday itinerary isn’t it?! This sequence of locations is fully generated by the algorithm. Would you like to try this for your next holiday? I have the codes ready for you.
In this article, I am going to show you exactly how you can do it too for your next vacation!
View of Atrani, while hiking to Ravello [photo by Jingles]
Preparation
Pin locations on Google Maps. Visit My Maps and create a new map for your next holiday destination. Look for places you want to visit. Attractions? Theme parks? Restaurants? Fill up your map with markers by searching for these places and “Add to map”. Keep doing this until you have all the places you want to visit on the map. I will wait for you.
Export map and upload into Colab. After all the hard work of researching and pinning places to visit, we are ready to export the map. Select “Export to KML/KMZ”, and remember to check “Export as KML”. You will download a KML file.
Export KML file from Google Maps and upload it in Colab
Next, go to Colab, an excellent Notebook environment maintained by Google. Open the File drawer on the left and upload the KML file which you have downloaded. We will be loading this KML file with BeautifulSoup later.
Get the Google API key. We will be plotting markers on Google Maps, so we need an API key. You can get it from the developers API page. Follow the instructions and you should get a key that looks like this:
ZIzkSyDwhp5B0_tH1$Is@fAkeAp1keY3eLYmPLY
Psst: this is a fake API key 🙃
Amalfi, Italy [photo by Tan Ying Ying]
Code walkthrough on Colab
You can get the codes and run it on Colab.
Define the parameters. Let’s set the API key, KML filename, and the desired number of days for your long-awaited holiday.
The API key is for plotting interactive Google Maps on Colab. The KML file contains the places of interests you have pinned on Google Maps. And lastly, the algorithm will determine where you should visit for each day based on the number of days you have set.
Load the data. KML files are XML, and we can parse it with BeautifulSoup. Our places of interests are inside the “Placemark” tag, so we will extract the “name”, and “coordinates” from each “Placemark”.
places = []
with open(kml_filename, "r") as file:
content = file.readlines()
content = "".join(content)
bs_content = BeautifulSoup(content, "xml")
placemarks = bs_content.findAll('Placemark')
for placemark in placemarks:
coordinates = placemark.find('coordinates').text.strip()
long = coordinates.split(',')[0]
lat = coordinates.split(',')[1]
places.append({
'name': placemark.find('name').text.strip(),
'lat': lat,
'long': long
})
places = pd.DataFrame(places)
places['lat'] = places['lat'].astype(float)
places['long'] = places['long'].astype(float)
mean_lat = places['lat'].mean()
mean_long = places['long'].mean()
places
Let’s see if the DataFrame contains places we are planning to visit.
Name | Latitude | Longitude |
---|---|---|
Athens International Airport | 37.9356467 | 23.9484156 |
Ancient Agora of Athens | 37.9746507 | 23.7219716 |
Tzistarakis Mosque | 37.9759204 | 23 |
Roman Forum | 37.9743749 | 23.7255435 |
Theatre of Dionysus | 37.9703658 | 23.7278553 |
Parthenon | 37.9715285 | 23.7267166 |
Acropolis Museum | 37.9684499 | 23.7285227 |
Temple of Olympian Zeus | 37.9693 | 23 |
Plaka | 37.9725529 | 23.7303363999999 |
Temple of Zeus | 37.9692838 | 23.7331012 |
... | ... | ... |
Group locations by proximity. With the coordinates for each location in the DataFrame, we can group them into clusters. If two places are near each other, they will be in the same cluster. There are a few methods for clustering, I will introduce K-Means, spectral and mean-shift clustering.
K-Means clustering aims to partition data points into a specified number of clusters. In which each data point belongs to the cluster that it is nearest to.
Spectral clustering is useful when the structure of the individual clusters is highly non-convex. It performs dimensionality reduction before clustering in fewer dimensions.
Mean shift clustering is a centroid-based algorithm which aims to discover blobs in a smooth density of data points. It works by updating candidates for centroids to be the mean of the points within a region.
## KMeans
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=int(num_days), random_state=0).fit(places_lat_long)
group = list(kmeans.labels_)
## MeanShift
from sklearn.cluster import MeanShift, estimate_bandwidth
bandwidth = estimate_bandwidth(places_lat_long, quantile=0.2)
clustering = MeanShift(bandwidth=bandwidth).fit(places_lat_long)
group = clustering.labels_
## SpectralClustering
from sklearn.cluster import SpectralClustering
clustering = SpectralClustering(n_clusters=9,
assign_labels="discretize",
random_state=0).fit(places_lat_long)
group = clustering.labels_
Which one did I end up using? K-Means. Because it is straight forward, and the most common one.
Plot it on Google Maps! Now, we are ready to plot it on Google Maps to visualise the 12 main areas (because we are planning a holiday for 12 days). If you are running this on Colab, this map is interactive.
Athens. Santorini and Crete. Amalfi and Naples.
The clustering algorithm did a great job grouping locations by proximity. Next, I am interested in finding out the most optimal route from a region to another region.
Find the middle of each cluster. We need to find the middle point of each cluster. We do this with Pandas groupby
function:
places.groupby(‘cluster’).mean()
That will give us a DataFrame, where each row represents a cluster of locations, and it’s latitude and longitude.
Cluster | Latitude | Longitude |
---|---|---|
0 | 40.846395 | 14.281423 |
1 | 36.411881 | 25.418702 |
2 | 37.958535 | 23.747617 |
3 | 35.359073 | 24.500997 |
4 | 35.449291 | 23.570038 |
5 | 40.645347 | 14.610990 |
6 | 35.325403 | 25.158296 |
7 | 35.517232 | 24.019294 |
8 | 40.627345 | 14.400411 |
9 | 35.051455 | 24.814546 |
10 | 35.204336 | 25.454937 |
11 | 35.220067 | 24.019934 |
Find the distance between cluster. We need to find the distance between each cluster. We use the middle point of each cluster and calculate the distance between clusters with Scipy cdist
function.
from scipy.spatial.distance import cdist
distance_matrix = cdist(
mean_lat_long_by_group.values,
mean_lat_long_by_group.values
)
df_distance_matrix = pd.DataFrame(distance_matrix)
df_distance_matrix
That will give us a DataFrame, with the distance of each cluster to every other cluster. We will call this matrix the distance matrix. Each value represents the distance between two clusters. A smaller value means the cluster is closer to each other.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.000000 | 11.987656 | 9.896897 | 11.599586 | 10.742770 | 0.386050 | 12.197858 | 11.100726 | 0.249281 | 12.021981 | 12.517198 | 11.246962 |
1 | 11.987656 | 0.000000 | 2.276986 | 1.396634 | 2.084259 | 11.607277 | 1.117249 | 1.660945 | 11.797155 | 1.488543 | 1.208088 | 1.837654 |
2 | 9.896897 | 2.276986 | 0.000000 | 2.706433 | 2.515520 | 9.523493 | 2.987206 | 2.456373 | 9.720741 | 3.096684 | 3.240456 | 2.751974 |
3 | 11.599586 | 1.396634 | 2.706433 | 0.000000 | 0.935321 | 11.214140 | 0.658161 | 0.507003 | 11.391950 | 0.439252 | 0.966409 | 0.500744 |
4 | 10.742770 | 2.084259 | 2.515520 | 0.935321 | 0.000000 | 10.356811 | 1.593083 | 0.454365 | 10.530636 | 1.306551 | 1.900750 | 0.504926 |
5 | 0.386050 | 11.607277 | 9.523493 | 11.214140 | 10.356811 | 0.000000 | 11.813022 | 10.715118 | 0.211347 | 11.636331 | 12.132428 | 10.861026 |
6 | 12.197858 | 1.117249 | 2.987206 | 0.658161 | 1.593083 | 11.813022 | 0.000000 | 1.155043 | 11.993443 | 0.439558 | 0.320395 | 1.143226 |
7 | 11.100726 | 1.660945 | 2.456373 | 0.507003 | 0.454365 | 10.715118 | 1.155043 | 0.000000 | 10.892023 | 0.921615 | 1.469345 | 0.297166 |
8 | 0.249281 | 11.797155 | 9.720741 | 11.391950 | 10.530636 | 0.211347 | 11.993443 | 10.892023 | 0.000000 | 11.812906 | 12.313065 | 11.035120 |
9 | 12.021981 | 1.488543 | 3.096684 | 0.439252 | 1.306551 | 11.636331 | 0.439558 | 0.921615 | 11.812906 | 0.000000 | 0.658387 | 0.812305 |
10 | 12.517198 | 1.208088 | 3.240456 | 0.966409 | 1.900750 | 12.132428 | 0.320395 | 1.469345 | 12.313065 | 0.658387 | 0.000000 | 1.435090 |
11 | 11.246962 | 1.837654 | 2.751974 | 0.500744 | 0.504926 | 10.861026 | 1.143226 | 0.297166 | 11.035120 | 0.812305 | 1.435090 | 0.000000 |
Find the shortest route. With that distance matrix, we are ready to find the shortest path. Here is the code to compute the shortest path.
starting_point = 2 #@param {type:"integer"}
cur_index = starting_point
seq = [cur_index]
while len(seq) < len(list(df_distance_matrix.keys())):
nearest_clusters = list(df_distance_matrix[cur_index].sort_values().index)
for cluster_id in nearest_clusters:
if cluster_id != cur_index and cluster_id not in seq:
seq.append(cluster_id)
cur_index = cluster_id
break
replace_group_to_day = {}
for i in range(0, len(seq)):
replace_group_to_day[seq[i]] = i
print(' -> '.join(str(x) for x in seq))
places['days'] = places['cluster']
places['days'].replace(replace_group_to_day, inplace=True)
places['days'] += 1
Initially, I was using the travelling salesman algorithm, but I find that the code’s a little overkill and overcomplicated, so I wrote my own instead. This is the result based on the shortest path algorithm, a sequence from cluster to cluster.
2 -> 1 -> 6 -> 10 -> 9 -> 3 -> 11 -> 7 -> 4 -> 5 -> 8 -> 0
Show the recommended itinerary. Now we are ready to plot our markers on Google Map. It has recommended us to go from Athens, Santorini, and Crete. Then to Amalfi, Ravello, Sorrento and Naples. If you are running this on Colab, this map is interactive.
Start with Athens, Santorini, and then Crete. To Amalfi, Ravello, Sorrento and Naples.
Start with Athens, Santorini, and then Crete. To Amalfi, Ravello, Sorrento and Naples.
If you prefer to see it in tabular form, we can display it with Pandas.
pd.set_option(‘display.max_rows’, None)
places.sort_values(by=[‘days’])
Oia, Santorini, Greece [photo by Clement Lim]
What can be improved?
This algorithm does not take the number of attractions in one region into consideration. For instance, we absolutely didn’t want to explore everything in Athens in one day! Neither is just one day at Santorini enough.
We need a type of clustering algorithm that constraint on the maximum number of points in a cluster. Such that, every cluster should have about the same number of locations. If a cluster is becoming too dense, it will split the cluster into two or more clusters.
But if you are travelling in one city, the result can be promising. This is the generated itinerary for New York City. Feel free to download my itinerary and try it for yourself.
New York City [photo by Jingles]
Resources
Do you want to try this for your next vacation? Here’s the code. You’re welcome. 😉
Not sure where to visit in Greece or Amalfi? Go ahead and use our Greece x Amalfi itinerary.
Not sure where to visit in New York City?
Thanks for reading. 😃
Hope you enjoyed reading this article as much as I did preparing it.