This data set consists of seven observations on cotton aphid counts on twenty randomly chosen leaves in each plot, for twenty-seven treatment-block combinations. The data were recorded in July 2004 in Lamesa, Texas. The treatments consisted of three nitrogen levels (blanket, variable and none), three irrigation levels (low, medium and high) and three blocks, each being a distinct area. Irrigation treatments were randomly assigned within each block as whole plots. Nitrogen treatments were randomly assigned within each whole block as split plots.
See if you can recreate the plot below. Some hints
.astype('category')
category_orders
argument can be used to specify the order of the facetsupdate_traces(mode='lines+markers')
We can start by loading the movies data and plotly modules
import jrpyvisualisation as jr
import plotly.express as px
movies = jr.datasets.load_movies()
We will also get rid of all rows that contain missing data. This gives us a much smaller data set to work with, but ensures that every row has all variables recorded.
movies = movies.dropna()
Produce a plot of the budget for films against their length.
Try using the trendline
argument to add a line of best fit
Comment on the line of best fit
Add marginal histograms to the x and y axis
Produce a scatter plot of the length of a movie against the title length of the film. (Hint: df['column_name'].str.len()
can be used to count the number of characters of string of a column)
Give the axes more informative labels
Add marginal box plots to the axes
Mark each point with a different colour, depending on its mpaa rating. Also add a line of best fit for each rating.
Comment on the lines of best fit added
Fix your plot such that as you hover over points you also get shown the title of the film.