Practical 2

Jumping Rivers

Copy cat

This data set consists of seven observations on cotton aphid counts on twenty randomly chosen leaves in each plot, for twenty-seven treatment-block combinations. The data were recorded in July 2004 in Lamesa, Texas. The treatments consisted of three nitrogen levels (blanket, variable and none), three irrigation levels (low, medium and high) and three blocks, each being a distinct area. Irrigation treatments were randomly assigned within each block as whole plots. Nitrogen treatments were randomly assigned within each whole block as split plots.

See if you can recreate the plot below. Some hints

Movies data

We can start by loading the movies data and plotly modules

import jrpyvisualisation as jr
import plotly.express as px

movies = jr.datasets.load_movies()

We will also get rid of all rows that contain missing data. This gives us a much smaller data set to work with, but ensures that every row has all variables recorded.

movies = movies.dropna()
    • Produce a plot of the budget for films against their length.

    • Try using the trendline argument to add a line of best fit

    • Comment on the line of best fit

    • Add marginal histograms to the x and y axis

    • Produce a scatter plot of the length of a movie against the title length of the film. (Hint: df['column_name'].str.len() can be used to count the number of characters of string of a column)

    • Give the axes more informative labels

    • Add marginal box plots to the axes

    • Mark each point with a different colour, depending on its mpaa rating. Also add a line of best fit for each rating.

    • Comment on the lines of best fit added

    • Fix your plot such that as you hover over points you also get shown the title of the film.