This data set consists of seven observations on cotton aphid counts on twenty randomly chosen leaves in each plot, for twenty-seven treatment-block combinations. The data were recorded in July 2004 in Lamesa, Texas. The treatments consisted of three nitrogen levels (blanket, variable and none), three irrigation levels (low, medium and high) and three blocks, each being a distinct area. Irrigation treatments were randomly assigned within each block as whole plots. Nitrogen treatments were randomly assigned within each whole block as split plots.
See if you can recreate the plot below. Some hints
.astype('category')
category_orders
argument can be used to specify the order of the facetsupdate_traces(mode='lines+markers')
import jrpyvisualisation as jr
import plotly.express as px
aphids = jr.datasets.load_aphids()
aphids['Block'] = aphids['Block'].astype('category')
fig = px.scatter(
aphids, x='Time', y='Aphids',
color='Block', facet_col='Water',
facet_row='Nitrogen',
template='plotly_white',
category_orders={
'Water' : ['Low', 'Medium', 'High'],
'Nitrogen': ['Block', 'Variable', 'Zero']
}
)
fig = fig.update_traces(mode='lines+markers')
We can start by loading the movies data and plotly modules
import jrpyvisualisation as jr
import plotly.express as px
movies = jr.datasets.load_movies()
We will also get rid of all rows that contain missing data. This gives us a much smaller data set to work with, but ensures that every row has all variables recorded.
movies = movies.dropna()
fig = px.scatter(
movies,
x='length', y='budget'
)
trendline
argument to add a line of best fitfig = px.scatter(
movies,
x='length', y='budget',
trendline='ols'
)
comment = """
Positive relationship between the length of movies and their budgets.
Line predicts negative budgets for 'outlier' films so we should be
careful with interpretation.
"""
fig = px.scatter(
movies,
x='length', y='budget',
trendline='ols',
marginal_x='histogram',
marginal_y='histogram'
)
df['column_name'].str.len()
can be used to count the number of characters of string of a column)movies['title_length'] = movies['title'].str.len()
fig = px.scatter(
movies,
x='title_length', y='length'
)
fig = px.scatter(
movies,
x='title_length', y='length',
labels = {
'length': 'Length of movie (mins)',
'title_length' : 'Characters in title of movie.'
}
)
fig = px.scatter(
movies,
x='title_length', y='length',
labels = {
'length': 'Length of movie (mins)',
'title_length' : 'Characters in title of movie.'
},
marginal_x='box',
marginal_y='box'
)
fig = px.scatter(
movies,
x='title_length', y='length',
labels = {
'length': 'Length of movie (mins)',
'title_length' : 'Characters in title of movie.'
},
marginal_x='box',
marginal_y='box',
color='mpaa',
trendline='ols'
)
comment = """
Most groups do not show much of a relationship between the length
of a movie in minutes and the number of characters in the title.
However there does appear to be a fairly strong positive relationship
for films rated as NC-17, but, there are few films with this rating so
we should be careful about any conclusions we draw.
"""
fig = px.scatter(
movies,
x='title_length', y='length',
labels = {
'length': 'Length of movie (mins)',
'title_length' : 'Characters in title of movie.'
},
marginal_x='box',
marginal_y='box',
color='mpaa',
trendline='ols',
hover_name='title'
)