Wenyan Deng

Ph.D. Candidate

Massachusetts Institute of Technology

Pandas and Plotly: Interactive Bubble Plots

July 22, 2017

The case study

If there are more than one variable, you might want to do a scatterplot of the two that changes over time. For instance, how might Sri Lanka's electoral turnout relate to the the Sri Lankan Army's fatalities by province? In this blog, I demonstrate how to make a bubble plot that reflects the number of registered voters, death count, and voter turnout. The layout of this site may mess up the loops in this set of codes, so make sure your indentations are correct! I've also posted a copy of the codes in .html format (and with correct loop indentation) on my GitHub repository.

The final product looks something like this: https://plot.ly/~wdeng1/132.embed.

Getting started

Get an account (free or otherwise) with plotly. Remember your username and write down your API key somewhere. Open a Jupyter Notebook and import the following:

import plotly.plotly as py

from plotly.grid_objs import Grid, Column

from plotly.tools import FigureFactory as figure_factory


import pandas as pd

import time


import plotly

import json

import requests

from requests.auth import HTTPBasicAuth


username = '...' # Replace with your username

api_key = '...' # Replace with your API key

auth = HTTPBasicAuth(username, api_key)

headers = {'Plotly-Client-Platform': 'python'}

The Data

Read your data file. Here, I combined my SLA deaths file, mentioned previously, with data on electoral turnout and number of registered voters, by electoral district. I call my file "gapminder" because it looks like a fake gapminder plot.

plotly.tools.set_credentials_file(username=username, api_key=api_key)

dataset = pd.read_excel("SLA_electoral.xls")

dataset.head()

table = figure_factory.create_table(dataset.head(10))

py.iplot(table, filename='animations-gapminder-data-preview')

Your dataset would look something like this, with year, deaths, district, province, registered voters, and turnout:

Plotting

Sort by year:

years_from_col = set(dataset['year'])

years_ints = sorted(list(years_from_col))

years = [str(year) for year in years_ints]

Make a list of provinces:

provinces = []

for province in dataset['province']:

if province not in provinces:

provinces.append(province)

Make the plotly grid:

columns = []

for year in years:

for province in provinces:

dataset_by_year = dataset[dataset['year'] == int(year)]

dataset_by_year_and_cont = dataset_by_year[dataset_by_year['province'] == province]

for col_name in dataset_by_year_and_cont:

column_name = '{year}_{province}_{header}_gapminder_grid'.format( year=year, province=province, header=col_name )

a_column = Column(list(dataset_by_year_and_cont[col_name]), column_name)

columns.append(a_column)

Upload the grid:

grid = Grid(columns)

url = py.grid_ops.upload(grid, 'gapminder_grid'+str(time.time()), auto_open=False)

Make the figure:

figure = { 'data': [], 'layout': {}, 'frames': [], 'config': {'scrollzoom': True} }

Fill in the layout:

figure['layout']['xaxis'] = {'range': [-10, 150], 'title': 'SLA Fatalities', 'gridcolor': '#FFFFFF'}

figure['layout']['yaxis'] = {'range': [-10, 100], 'title': 'Electoral Turnout (%)', 'gridcolor': '#FFFFFF'}

figure['layout']['hovermode'] = 'closest'

figure['layout']['plot_bgcolor'] = 'rgb(223, 232, 243)'

Make the slider and set values for the slider:

figure['layout']['slider'] = {

'args': [

'slider.value', {

'duration': 400,

'ease': 'cubic-in-out'

}

],

'initialValue': 'first-value-for-slider',

'plotlycommand': 'animate',

'values': [1988, 1989, 1994, 1999],

'visible': True

}


figure['layout']['slider'] = {

'args': [

'slider.value', {

'duration': 400,

'ease': 'cubic-in-out'

}

],

'initialValue': '1988',

'plotlycommand': 'animate',

'values': years,

'visible': True

}


figure['layout']['updatemenus'] = [

{

'buttons': [

{

'args': [None, {'frame': {'duration': 500, 'redraw': False},

'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],

'label': 'Play',

'method': 'animate'

},

{

'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',

'transition': {'duration': 0}}],

'label': 'Pause',

'method': 'animate'

}

],

'direction': 'left',

'pad': {'r': 10, 't': 87},

'showactive': False,

'type': 'buttons',

'x': 0.1,

'xanchor': 'right',

'y': 0,

'yanchor': 'top'

}

]


figure['layout']['sliders'] = {

'active': 0,

'yanchor': 'top',

'xanchor': 'left',

'currentvalue': {

'font': {'size': 20},

'prefix': 'text-before-value-on-display',

'visible': True,

'xanchor': 'right'

},

'transition': {'duration': 300, 'easing': 'cubic-in-out'},

'pad': {'b': 10, 't': 50},

'len': 0.9, 'x': 0.1, 'y': 0,

'steps': [{

'args': [

[1988],

{'frame': {'duration': 300, 'redraw': False},

'mode': 'immediate',

'transition': {'duration': 300}}

],

'label': "Year: 1988",

'method': 'animate'

}]

}


sliders_dict = {

'active': 0,

'yanchor': 'top',

'xanchor': 'left',

'currentvalue': {

'font': {'size': 20},

'prefix': 'Year:',

'visible': True,

'xanchor': 'right'

},

'transition': {'duration': 300, 'easing': 'cubic-in-out'},

'pad': {'b': 10, 't': 50},

'len': 0.9,

'x': 0.1,

'y': 0,

'steps': []

}

Set colors for the legend and define size reference for bubbles:

custom_colors = {

'Eastern': 'rgb(51, 153, 255)',

'Central': 'rgb(255, 51, 255)',

'Northern': 'rgb(153, 51, 255)',

'North Central': 'rgb(102, 178, 255)',

'Western': 'rgb(204, 153, 255)',

'Sabaragamuwa': 'rgb(255, 153, 255)',

'North Western': 'rgb(255, 102, 255)',

'Southern': 'rgb(178, 102, 255)',

'Uva': 'rgb(153, 204, 255)'

}


col_name_template = '{year}_{province}_{header}_gapminder_grid'

year = 1988

for province in provinces:

data_dict = {

'xsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='deaths')),

'ysrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='turnout')),

'mode': 'markers',

'textsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='district')),

'marker': {

'sizemode': 'area',

'sizeref': 1.5,

'sizesrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='regvoter' )), 'color': custom_colors[province]

},

'name': province

}

figure['data'].append(data_dict)

Plot:

frame = {'data': [], 'name': "1988"}

figure['layout']['sliders'] = [sliders_dict]

for year in years:

frame = {'data': [], 'name': str(year)}

for province in provinces:

data_dict = {

'xsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='deaths')),

'ysrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='turnout')), 'mode': 'markers',

'textsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='district')),

'marker': {

'sizemode': 'area',

'sizeref': 1.5, 'sizesrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='regvoter')),

'color': custom_colors[province]

},

'name': province

}

frame['data'].append(data_dict)

figure['frames'].append(frame)

slider_step = {'args': [

[year],

{'frame': {'duration': 300, 'redraw': False},

'mode': 'immediate',

'transition': {'duration': 300}}

],

'label': year,

'method': 'animate'}

sliders_dict['steps'].append(slider_step)


figure['layout']['sliders'] = [sliders_dict]


graph = py.icreate_animations(figure, 'SLA_fatalities_turnout'+str(time.time()))

graph

We're done! See the completed plot below. Hover to see the labels. Drag on chart area to zoom, double click to zoom out. Drag on slider to see different years, or click play. Click on legend to isolate provinces. Happy plotting!