Wenyan Deng

Ph.D. Candidate

Massachusetts Institute of Technology

GeoPandas: Static Maps and Spatial Join

July 20, 2017

The case study

My first major mapping project at the University of Chicago is a choropleth map of Sri Lankan Army (SLA) officer deaths between 1981 and 1999. The data is collected painstakingly by my colleague, Winston Berg. I am deeply grateful to Professor Paul Staniland, who made the project possible by hiring me. Any mistakes remain my own. You can see a copy of the codes here and the final products on my GitHub repository. The final product is shown in Figure 1.

The data

The data table includes the names of army officers who died as well as their place of death, which may be a village or city. In order to assign each place of death a larger geographical category — province, district, or division — we must attach a longitude and a latitude to each place in the dataset. Thankfully, GeoPandas has just the function.

import geopandas as gpd

import pandas as pd

df1=gpd.tools.geocode(['Ampara', 'Anuradhapura', 'Badulla', 'Batticaloa', 'Colombo', '...'])

df1['geometry'] = df1['geometry'].astype(str)

df2 = df1['geometry'].str.strip('POINT ( )').str.split(' ', expand=True).rename(columns={0:'Longitude',1:'Latitude'})

df3=df1.merge(df2, left_index=True, right_index=True) df3.to_csv("out.csv")

"out.csv" describes deaths data in which every place name is matched with a longitude and latitude.

Figure 1: Static map with GeoPandas

Basemap Shapefile

Go to http://www.diva-gis.org/gdata and download the shapefile for Sri Lanka. Province level, district level, and administrative division levels are available. A later blog will explain how to make shapefiles based on existing ones.

Get Mapping

Now that we have both the data and the shapefile base map, we can start mapping! In this example, I use Sri Lankan districts.

Open a jupyter notebook in your working directory and import the following:

%matplotlib inline

import geopandas as gpd

import pandas as pd

import matplotlib.pyplot as plt

Read your shapefile into a geo-dataframe:

geo_df = gpd.read_file("data1/LKA_adm1.shp")

geo_df.set_index(geo_df["ID_1"].astype(int), inplace = True)

geo_df.head()

Plot it:

geo_df.plot()

Make a new column called coordinates, which is based on the "geometry" column:

geo_df['coords'] = geo_df['geometry'].apply(lambda x: x.representative_point().coords[:])

geo_df['coords'] = [coords[0] for coords in geo_df['coords']]

Label the districts:

print(geo_df.crs)


#try 2163 (albers), 3857 (web), 4269 (plate)

ax = geo_df.to_crs(epsg=4269).plot()

ax.set_axis_off()

for idx, row in geo_df.iterrows():

plt.annotate(s=row['NAME_1'], xy=row['coords'],

horizontalalignment='center')

Spatial Join

To make a heat/choropleth map, we have to count how many deaths there are per district. For this, we use GeoPanda's spatial join function. The spatial join function matches points in the deaths dataset to the polygons in the geo-dataframe. Here, I use "within". You can also use "contains " or "intersects".

from shapely.geometry import Point


deaths_df = pd.read_excel("SLA_RoH_Deng.xlsx", usecols = [8, 9])

deaths_df.dropna(inplace = True)


geometry = [Point(xy) for xy in zip(deaths_df.Longitude, deaths_df.Latitude)]

deaths_coords = gpd.GeoDataFrame(deaths_df, crs = geo_df.crs, geometry=geometry)


located_deaths = gpd.sjoin(deaths_coords, geo_df, how = 'left', op = 'within')

located_deaths = located_deaths.dropna(axis=1, how='all')

Plot it:

located_deaths.plot()

Rename the spatially joined table so the column heads make sense. Next, count and sum up the deaths in each district. Save it as an Excel sheet and check for formatting and errors:

located_deaths.rename(columns = {"NAME_1" : "Districts", "index_right" : "deaths_per_district"}, inplace = True)


deaths_geo_count = located_deaths.groupby("Districts").count()[["deaths_per_district"]] deaths_geo_count.to_excel("deaths_geo_count.xlsx")

Import the Excel sheet and draw a bar graph with Pandas:

deaths_geo_count = pd.read_excel("deaths_geo_count.xlsx")

deaths_geo_count.set_index("Districts")["deaths_per_district"].sort_values(ascending = False).plot(kind = "bar", figsize = (15, 3))

Rename the column in the geo-dataframe so it matches with the deaths dataframe:

geo_df.rename(columns = {"NAME_1" : "Districts"}, inplace = True)

Merge the geo-dataframe with the deaths dataframe:

geo_merge = geo_df.merge(deaths_geo_count, on='Districts', how='left')

geo_merge col_name =geo_merge.columns[0]

geo_merge.head()

Figure 2: SLA officer deaths by year

Now we can produce the map. The default for "scheme" is quantiles. You can also use "equal_interval" or "fisher_jenks". Fisher_jenks sets categories by minimizing in-group variance and maximizing inter-group variance.

ft = "deaths_per_district"

plate = geo_merge.to_crs(epsg=4269)


ax = plate.plot(column = ft, scheme = "quantiles", k = 9, cmap = "Reds", legend = True, alpha = 0.7, linewidth = 0.5, figsize = (50, 25))

ax.set_title("Sri Lankan Army Fatalities by District", fontsize = 30)

ax.set_axis_off()


for idx, row in geo_df.iterrows():

plt.annotate(s=row['Districts'], xy=row['coords'], horizontalalignment='center')

Your final product would look like Figure 1. "Alpha" in the codes above changes the shade of colors. Of course, you can also count the deaths by year and create a series of maps like Figure 2 shown on the left.


Happy mapping!