To begin with, load the plotly.express
module and course package
import plotly.express as px
import jrpyvisualisation
Then we will load the gapminder data set
gapminder = jrpyvisualisation.datasets.load_gapminder()
When loading in data, it’s always a good idea to carry out a sanity check. I tend to use commands like
gapminder.shape
gapminder.head()
gapminder.columns
Scatter plots can be created using the plotly.express.scatter
function. Let’s start with a basic scatter plot
fig = px.scatter(gapminder, x='gdpPercap', y='lifeExp')
To view this plot, we can call the .show()
method on the Figure
object
fig.show()
The arguments x
and y
map variable names in the pandas.DataFrame
object to visual elements of the chart. You can also map variables to color, symbol and size amongst others.
fig = px.scatter(
gapminder,
x='gdpPercap', y='lifeExp',
color='continent'
)
or
fig = px.scatter(
gapminder,
x='gdpPercap', y='lifeExp',
size='pop'
)
Through the plotly.express
module, some aesthetic properties must be numeric, some only make sense on a discrete variable and some can be used for either.
A box plot can be credated using plotly.express.box
fig = px.box(
gapminder,
x='year', y='gdpPercap'
)
Similar to scatter plots, we can add other visual elements mapped to variables
fig = px.box(
gapminder,
x='year', y='gdpPercap',
color='continent'
)
Most of the plotly.express
functions have the same arguments, but some arguments are unique. For example bar charts and box plots have an orientation
argument which allows us to lay the plot out horizontally, or vertically. For example we could create a horizontal bar chart of average life expectancy for the different continents in 2007 using the code below
sub = gapminder.query('year == 2007').\
groupby('continent').mean().\
reset_index().\
sort_values('lifeExp')
fig = px.bar(sub,
y='continent', x='lifeExp',
orientation='h'
)
barmode
argumentJupyter notebooks are a great way to explore data. One neat trick that you might like when exploring plots of subsets of data in notebooks is ipywidgets.interact
. Try the following code in a jupyter notebook cell
import plotly.express as px
import jrpyvisualisation
from ipywidgets import interact
gapminder = jrpyvisualisation.datasets.load_gapminder()
@interact
def plot(year = gapminder['year'].unique()):
df = gapminder.query('year == @year')
px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
log_x=True, template='plotly_dark', size_max=75).show()
There are a couple of other things going on here that we haven’t look at yet
log_x=True
- set the x-axis to be on the log scale. There is a corresponding log_y
template='plotly_dark'
- set the overall template theme for the plot. See the documentation for other possible values heresize_max=75
- set the maximum marker size, defaults to 20
Create an interactive series of box plots of the populations of countries in the different continents over time with populations on the log scale. Make continent the variable for which you get a dropdown option
By default plotly.express
chooses the labels of the axes, legend entries and hover text based on the names of variables in the DataFrame
. These can be overridden with the labels
argument in each plotting function. The labels argument takes a dictionary whose keys are the names of the variables and values give the desired label. We can also add a title with title=<string>
labels = {
'pop': 'Population',
'gdpPercap': 'GDP per Capita',
'year': 'Year',
'lifeExp': 'Life Expectancy',
'continent': 'Continent'
}
@interact
def plot(year = gapminder['year'].unique()):
df = gapminder.query('year == @year')
title = 'Life Expectancy in ' + str(year)
px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
log_x=True, template='plotly_dark', size_max=75,
title=title, labels=labels).show()