How to Improve Holiday Itinerary with Machine Learning

Perfect your travel plans with this strategy - use machine learning to optimize your holiday itinerary!

In October 2019, my friends and I are planning our awesome holiday in Greece and Amalfi. The way we plan our perfect holiday itinerary is by using Google Map. We search for the places to visit and pin markers on Google Map for attractions that might interest us.

For me, to optimise our vacation to the fullest, as to cover as much ground as possible; I have three questions:

How many days should I spend at one location?
Which attractions/locations should I visit each day?
What is the most optimal route from a place to another?

Let me show you the itinerary my algorithms have proposed for us. It is recommending the best sequence of places to visit, day by day.

Start from Athens, visit the Acropolis and other historic archaeological sites. That’s a great start!
Travel to Santorini, a beautiful island with whitewashed homes and picture-perfect sunsets.
Sail to Crete, for a road trip adventure of a lifetime, by exploring hidden corners and tasting the best of Greek cuisine.
Fly to Italy and travel to Amalfi and Ravello where the terraces are blending along the cliffs with the most spectacular coastal scenery.
Travel to Sorrento and Naples, indulge in the best pasta, pizza, limoncello, and gelato.

Great holiday itinerary isn’t it?! This sequence of locations is fully generated by the algorithm. Would you like to try this for your next holiday? I have the codes ready for you.

In this article, I am going to show you exactly how you can do it too for your next vacation!

View of Atrani, while hiking to Ravello [photo by Jingles]

Preparation

Pin locations on Google Maps. Visit My Maps and create a new map for your next holiday destination. Look for places you want to visit. Attractions? Theme parks? Restaurants? Fill up your map with markers by searching for these places and “Add to map”. Keep doing this until you have all the places you want to visit on the map. I will wait for you.

Export map and upload into Colab. After all the hard work of researching and pinning places to visit, we are ready to export the map. Select “Export to KML/KMZ”, and remember to check “Export as KML”. You will download a KML file.

Export KML file from Google Maps and upload it in Colab

Next, go to Colab, an excellent Notebook environment maintained by Google. Open the File drawer on the left and upload the KML file which you have downloaded. We will be loading this KML file with BeautifulSoup later.

Get the Google API key. We will be plotting markers on Google Maps, so we need an API key. You can get it from the developers API page. Follow the instructions and you should get a key that looks like this:

ZIzkSyDwhp5B0_tH1$Is@fAkeAp1keY3eLYmPLY

Psst: this is a fake API key 🙃

Amalfi, Italy [photo by Tan Ying Ying]

Code walkthrough on Colab

You can get the codes and run it on Colab.

Define the parameters. Let’s set the API key, KML filename, and the desired number of days for your long-awaited holiday.

The API key is for plotting interactive Google Maps on Colab. The KML file contains the places of interests you have pinned on Google Maps. And lastly, the algorithm will determine where you should visit for each day based on the number of days you have set.

Load the data. KML files are XML, and we can parse it with BeautifulSoup. Our places of interests are inside the “Placemark” tag, so we will extract the “name”, and “coordinates” from each “Placemark”.

places = []

with open(kml_filename, "r") as file:
    content = file.readlines()
    content = "".join(content)
    bs_content = BeautifulSoup(content, "xml")

    placemarks = bs_content.findAll('Placemark')
    for placemark in placemarks:

        coordinates = placemark.find('coordinates').text.strip()
        long = coordinates.split(',')[0]
        lat = coordinates.split(',')[1]

        places.append({
            'name': placemark.find('name').text.strip(),
            'lat': lat,
            'long': long
        })

places = pd.DataFrame(places)
places['lat'] = places['lat'].astype(float)
places['long'] = places['long'].astype(float)

mean_lat = places['lat'].mean()
mean_long = places['long'].mean()

places

Let’s see if the DataFrame contains places we are planning to visit.

Name	Latitude	Longitude
Athens International Airport	37.9356467	23.9484156
Ancient Agora of Athens	37.9746507	23.7219716
Tzistarakis Mosque	37.9759204	23
Roman Forum	37.9743749	23.7255435
Theatre of Dionysus	37.9703658	23.7278553
Parthenon	37.9715285	23.7267166
Acropolis Museum	37.9684499	23.7285227
Temple of Olympian Zeus	37.9693	23
Plaka	37.9725529	23.7303363999999
Temple of Zeus	37.9692838	23.7331012
...	...	...

Group locations by proximity. With the coordinates for each location in the DataFrame, we can group them into clusters. If two places are near each other, they will be in the same cluster. There are a few methods for clustering, I will introduce K-Means, spectral and mean-shift clustering.

K-Means clustering aims to partition data points into a specified number of clusters. In which each data point belongs to the cluster that it is nearest to.

Spectral clustering is useful when the structure of the individual clusters is highly non-convex. It performs dimensionality reduction before clustering in fewer dimensions.

Mean shift clustering is a centroid-based algorithm which aims to discover blobs in a smooth density of data points. It works by updating candidates for centroids to be the mean of the points within a region.

## KMeans
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=int(num_days), random_state=0).fit(places_lat_long)
group = list(kmeans.labels_)

## MeanShift
from sklearn.cluster import MeanShift, estimate_bandwidth
bandwidth = estimate_bandwidth(places_lat_long, quantile=0.2)
clustering = MeanShift(bandwidth=bandwidth).fit(places_lat_long)
group = clustering.labels_

## SpectralClustering
from sklearn.cluster import SpectralClustering
clustering = SpectralClustering(n_clusters=9,
         assign_labels="discretize",
         random_state=0).fit(places_lat_long)
group = clustering.labels_

Which one did I end up using? K-Means. Because it is straight forward, and the most common one.

Plot it on Google Maps! Now, we are ready to plot it on Google Maps to visualise the 12 main areas (because we are planning a holiday for 12 days). If you are running this on Colab, this map is interactive.

Athens. Santorini and Crete. Amalfi and Naples.

Athens. Santorini and Crete. Amalfi and Naples.

The clustering algorithm did a great job grouping locations by proximity. Next, I am interested in finding out the most optimal route from a region to another region.

Find the middle of each cluster. We need to find the middle point of each cluster. We do this with Pandas groupby function:

places.groupby(‘cluster’).mean()

That will give us a DataFrame, where each row represents a cluster of locations, and it’s latitude and longitude.

Cluster	Latitude	Longitude
0	40.846395	14.281423
1	36.411881	25.418702
2	37.958535	23.747617
3	35.359073	24.500997
4	35.449291	23.570038
5	40.645347	14.610990
6	35.325403	25.158296
7	35.517232	24.019294
8	40.627345	14.400411
9	35.051455	24.814546
10	35.204336	25.454937
11	35.220067	24.019934

Find the distance between cluster. We need to find the distance between each cluster. We use the middle point of each cluster and calculate the distance between clusters with Scipy cdist function.

from scipy.spatial.distance import cdist

distance_matrix = cdist(
    mean_lat_long_by_group.values,
    mean_lat_long_by_group.values
)

df_distance_matrix = pd.DataFrame(distance_matrix)
df_distance_matrix

That will give us a DataFrame, with the distance of each cluster to every other cluster. We will call this matrix the distance matrix. Each value represents the distance between two clusters. A smaller value means the cluster is closer to each other.

	0	1	2	3	4	5	6	7	8	9	10	11
0	0.000000	11.987656	9.896897	11.599586	10.742770	0.386050	12.197858	11.100726	0.249281	12.021981	12.517198	11.246962
1	11.987656	0.000000	2.276986	1.396634	2.084259	11.607277	1.117249	1.660945	11.797155	1.488543	1.208088	1.837654
2	9.896897	2.276986	0.000000	2.706433	2.515520	9.523493	2.987206	2.456373	9.720741	3.096684	3.240456	2.751974
3	11.599586	1.396634	2.706433	0.000000	0.935321	11.214140	0.658161	0.507003	11.391950	0.439252	0.966409	0.500744
4	10.742770	2.084259	2.515520	0.935321	0.000000	10.356811	1.593083	0.454365	10.530636	1.306551	1.900750	0.504926
5	0.386050	11.607277	9.523493	11.214140	10.356811	0.000000	11.813022	10.715118	0.211347	11.636331	12.132428	10.861026
6	12.197858	1.117249	2.987206	0.658161	1.593083	11.813022	0.000000	1.155043	11.993443	0.439558	0.320395	1.143226
7	11.100726	1.660945	2.456373	0.507003	0.454365	10.715118	1.155043	0.000000	10.892023	0.921615	1.469345	0.297166
8	0.249281	11.797155	9.720741	11.391950	10.530636	0.211347	11.993443	10.892023	0.000000	11.812906	12.313065	11.035120
9	12.021981	1.488543	3.096684	0.439252	1.306551	11.636331	0.439558	0.921615	11.812906	0.000000	0.658387	0.812305
10	12.517198	1.208088	3.240456	0.966409	1.900750	12.132428	0.320395	1.469345	12.313065	0.658387	0.000000	1.435090
11	11.246962	1.837654	2.751974	0.500744	0.504926	10.861026	1.143226	0.297166	11.035120	0.812305	1.435090	0.000000

Find the shortest route. With that distance matrix, we are ready to find the shortest path. Here is the code to compute the shortest path.

starting_point = 2 #@param {type:"integer"}
cur_index = starting_point

seq = [cur_index]
while len(seq) < len(list(df_distance_matrix.keys())):
    nearest_clusters = list(df_distance_matrix[cur_index].sort_values().index)
    for cluster_id in nearest_clusters:
        if cluster_id != cur_index and cluster_id not in seq:
            seq.append(cluster_id)
            cur_index = cluster_id
            break

replace_group_to_day = {}

for i in range(0, len(seq)):
    replace_group_to_day[seq[i]] = i

print(' -> '.join(str(x) for x in seq))

places['days'] = places['cluster']
places['days'].replace(replace_group_to_day, inplace=True)
places['days'] += 1

Initially, I was using the travelling salesman algorithm, but I find that the code’s a little overkill and overcomplicated, so I wrote my own instead. This is the result based on the shortest path algorithm, a sequence from cluster to cluster.

2 -> 1 -> 6 -> 10 -> 9 -> 3 -> 11 -> 7 -> 4 -> 5 -> 8 -> 0

Show the recommended itinerary. Now we are ready to plot our markers on Google Map. It has recommended us to go from Athens, Santorini, and Crete. Then to Amalfi, Ravello, Sorrento and Naples. If you are running this on Colab, this map is interactive.

Start with Athens, Santorini, and then Crete. To Amalfi, Ravello, Sorrento and
Naples.

Start with Athens, Santorini, and then Crete. To Amalfi, Ravello, Sorrento and Naples.

If you prefer to see it in tabular form, we can display it with Pandas.

pd.set_option(‘display.max_rows’, None)
places.sort_values(by=[‘days’])

Oia, Santorini, Greece [photo by Clement Lim]

What can be improved?

This algorithm does not take the number of attractions in one region into consideration. For instance, we absolutely didn’t want to explore everything in Athens in one day! Neither is just one day at Santorini enough.

We need a type of clustering algorithm that constraint on the maximum number of points in a cluster. Such that, every cluster should have about the same number of locations. If a cluster is becoming too dense, it will split the cluster into two or more clusters.

But if you are travelling in one city, the result can be promising. This is the generated itinerary for New York City. Feel free to download my itinerary and try it for yourself.

New York City

New York City [photo by Jingles]

Resources

Do you want to try this for your next vacation? Here’s the code. You’re welcome. 😉

Not sure where to visit in Greece or Amalfi? Go ahead and use our Greece x Amalfi itinerary.

Not sure where to visit in New York City?

Thanks for reading. 😃
Hope you enjoyed reading this article as much as I did preparing it.