Practical 1

Jumping Rivers

Setup

To begin with, load the plotly.express module and course package

import plotly.express as px
import jrpyvisualisation

Then we will load the gapminder data set

gapminder = jrpyvisualisation.datasets.load_gapminder()

When loading in data, it’s always a good idea to carry out a sanity check. I tend to use commands like

gapminder.shape
gapminder.head()
gapminder.columns

Scatter plots

Scatter plots can be created using the plotly.express.scatter function. Let’s start with a basic scatter plot

fig = px.scatter(gapminder, x='gdpPercap', y='lifeExp')

To view this plot, we can call the .show() method on the Figure object

fig.show()

The arguments x and y map variable names in the pandas.DataFrame object to visual elements of the chart. You can also map variables to color, symbol and size amongst others.

fig = px.scatter(
  gapminder,
  x='gdpPercap', y='lifeExp',
  color='continent'
)

or

fig = px.scatter(
  gapminder,
  x='gdpPercap', y='lifeExp',
  size='pop'
)

Through the plotly.express module, some aesthetic properties must be numeric, some only make sense on a discrete variable and some can be used for either.

Box plots

A box plot can be credated using plotly.express.box

fig = px.box(
  gapminder,
  x='year', y='gdpPercap'
)

Similar to scatter plots, we can add other visual elements mapped to variables

fig = px.box(
  gapminder,
  x='year', y='gdpPercap',
  color='continent'
)

Bar charts

Most of the plotly.express functions have the same arguments, but some arguments are unique. For example bar charts and box plots have an orientation argument which allows us to lay the plot out horizontally, or vertically. For example we could create a horizontal bar chart of average life expectancy for the different continents in 2007 using the code below

sub = gapminder.query('year == 2007').\
  groupby('continent').mean().\
  reset_index().\
  sort_values('lifeExp')

fig = px.bar(sub, 
  y='continent', x='lifeExp',
  orientation='h'
)

A neat trick in notebooks

Jupyter notebooks are a great way to explore data. One neat trick that you might like when exploring plots of subsets of data in notebooks is ipywidgets.interact. Try the following code in a jupyter notebook cell

import plotly.express as px
import jrpyvisualisation
from ipywidgets import interact

gapminder = jrpyvisualisation.datasets.load_gapminder()

@interact
def plot(year = gapminder['year'].unique()):
    df = gapminder.query('year == @year')
    px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
              log_x=True, template='plotly_dark', size_max=75).show()

There are a couple of other things going on here that we haven’t look at yet

Changing the default labels

By default plotly.express chooses the labels of the axes, legend entries and hover text based on the names of variables in the DataFrame. These can be overridden with the labels argument in each plotting function. The labels argument takes a dictionary whose keys are the names of the variables and values give the desired label. We can also add a title with title=<string>

labels = {
  'pop': 'Population',
  'gdpPercap': 'GDP per Capita',
  'year': 'Year',
  'lifeExp': 'Life Expectancy',
  'continent': 'Continent'
}

@interact
def plot(year = gapminder['year'].unique()):
    df = gapminder.query('year == @year')
    title = 'Life Expectancy in ' + str(year)
    px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
              log_x=True, template='plotly_dark', size_max=75,
              title=title, labels=labels).show()