matminer.figrecipes package¶
Submodules¶
matminer.figrecipes.plot module¶
-
class
matminer.figrecipes.plot.
PlotlyFig
(df=None, mode='offline', title=None, x_title=None, y_title=None, colorbar_title='auto', x_scale='linear', y_scale='linear', ticksize=25, fontscale=1, fontsize=25, fontfamily='Courier', bgcolor='white', fontcolor=None, colorscale='Viridis', height=None, width=None, resolution_scale=None, margins=100, pad=0, username=None, api_key=None, filename='temp-plot', show_offline_plot=True, hovermode='closest', hoverinfo='x+y+text', hovercolor=None)¶ Bases:
object
-
__init__
(df=None, mode='offline', title=None, x_title=None, y_title=None, colorbar_title='auto', x_scale='linear', y_scale='linear', ticksize=25, fontscale=1, fontsize=25, fontfamily='Courier', bgcolor='white', fontcolor=None, colorscale='Viridis', height=None, width=None, resolution_scale=None, margins=100, pad=0, username=None, api_key=None, filename='temp-plot', show_offline_plot=True, hovermode='closest', hoverinfo='x+y+text', hovercolor=None)¶ Class for making Plotly plots
Args:
- Data:
- df (DataFrame): A pandas dataframe object which can be used to
- generate several plots.
- mode: (str)
- ‘offline’: creates and saves plots on the local disk
- ‘notebook’: to embed plots in IPython/Jupyter notebook,
- ‘online’: save the plot in your online plotly account,
(iv) ‘static’: save a static image of the plot locally NOTE: Both ‘online’ and ‘static’ modes require either ‘username’ and ‘api_key’ or Plotly credentials file.
- Axes:
title: (str) title of plot x_title: (str) title of x-axis y_title: (str) title of y-axis colorbar_title (str or None): the colorbar (z) title. If set to
“auto” the name of the third column (if pd.Series) is chosen.- x_scale: (str) Sets the x axis scaling type. Select from
- ‘linear’, ‘log’, ‘date’, ‘category’.
- y_scale: (str) Sets the y axis scaling type. Select from
- ‘linear’, ‘log’, ‘date’, ‘category’.
ticksize: (int) size of ticks in px
- Fonts:
- fontscale (int/float): The relative scale of the font to the
- rest of the plot
fontsize: (int) size of text of plot title and axis titles fontfamily: (str) The HTML font family to use in browser - for
example, “Arial”, or “Times New Roman”. If multiple passed, the list is an order of preference in case fonts are not found on the system.- Colors:
bgcolor: (str) Sets the background color. For example, “grey”. fontcolor: (str) Sets all font colors. For example, “black”. colorscale: (str/list) Sets the colorscale (colormap). See
https://plot.ly/python/colorscales/ for details on what data types are acceptable for color maps. String names of colormaps can also be used, e.g., ‘Jet’ or ‘Viridis’. A useful list of Plotly builtins is: Greys, YlGnBu, Greens, YlOrRd, Bluered, RdBu, Reds, Blues, Picnic, Rainbow, Portland, Jet, Hot, Blackbody, Earth, Electric, Viridis.- Formatting:
height: (float) output height (in pixels) width: (float) output width (in pixels) resolution_scale: (float) Increase the resolution of the image
by scale amount, eg: 3. Only valid for PNG and JPEG.- margins (float or [float]): Specify the margin (in px) with a
- list [top, bottom, right, left], or a number which will set all margins.
- pad: (float) Sets the amount of padding (in px) between the
- plotting area and the axis lines
- Plotly:
- username: (str) plotly account username api_key: (str) plotly account API key
- Offline:
- filename: (str) name/filepath of plot file show_offline_plot: (bool) automatically opens the plot offline
- Intreractivity:
- hovermode: (str) determines the mode of hover interactions. Can
- be ‘x’/’y’/’closest’/False
- hoverinfo: (str) Determines displayed information on mouseover.
- Any combination of “x”, “y”, “z”, “text”, “name” with a “+” OR “all” or “none” or “skip”. Examples: “x”, “y”, “x+y”, “x+y+z”, “all”
- hovercolor: (str) The color to set for the hover background.
- If None, uses the trace color.
Returns: None
Attributes: These are either fields that Plotly’s ‘layout’ cannot work with directly or are managerial values PlotlyFig uses separate from PlotlyDict.
- df (DataFrame): The dataframe which can be used to generate multiple
- plots.
mode (str): The plot mode, specified above in the argument. show_offline_plot (bool): If True, opens up plot offline. username (str): The Plotly username api_key (str): The Plotly api key resolution_scale (int/float): Scale up the resolution of static
images proportionally using this parameter.- layout (dict): The dictionary passed to Plotly which specifies
- the PlotlyDict ‘layout’ value.
font_style (dict): The general font style, in Plotly syntax. plot_counter (int): The number appended onto generated offline plots colorbar_title (str): The title of the colorbar colorscale (str): See argument documentation above. hoverinfo (str): See argument documentation above. ticksize (int): See argument documentation above.
-
bar
(data=None, cols=None, x=None, y=None, labels=None, barmode='group', colors=None, bargap=None, return_plot=False)¶ Create a bar chart using Plotly.
Can be used with x and y arguments or with a dataframe (passed as ‘data’ or taken from constructor).
- Args:
- data (DataFrame): The column names will become the ‘x’ axis. The
- rows will become sets of bars (e.g., 3 rows = 3 sets of bars for each x point).
- cols ([str]): A list of strings specifying columns of a DataFrame
- passed into the constructor to be used as data. Should not be used with ‘data’.
- x (list or [list]): A list containing ‘x’ axis values. Can be a list
- of lists if there is more than one set of bars.
- y (list or [list]): A list containing ‘y’ values. Can be a list of
- lists if there is more than one set of bars (more than one set of data for each ‘x’ axis value).
- labels (str or [str]): Defines the label for each set of bars. If
- str, defines the column of the DataFrame to use for labelling. The column’s entry for a row will be the label for that row. If it is a list of strings, should be used with x and y, and defines the label for each set of bars.
- barmode: Defines how sets of bars are displayed. Can be set to
- “group” or “stack”.
- colors ([str]): The list of colors to use for each set of bars.
- The length of this list should be equal to the number of rows (sets of bars) present in your data.
bargap (int/float): Separation between bars. return_plot (bool): Returns the dictionary representation of the
figure if True. If False, prints according to self.mode (set with mode in __init__).- Returns:
- A Plotly bar chart object.
-
create_plot
(fig, return_plot=False)¶ Creates a plotly plot based on its dictionary representation. The modes of plotting are:
- offline: Makes an offline html.
- notebook: Embeds in Jupyter notebook
- online: Send to Plotly, requires credentials
- static: Creates a static image of the plot
- return: Returns the dictionary representation of the plot.
- Args:
fig: (dictionary) contains data and layout information return_plot (bool): Returns the dictionary representation of the
figure if True. If False, prints according to self.mode (set with mode in __init__).- Returns:
- A Plotly Figure object (if return_plot = True)
-
data_from_col
(col, data=None)¶ Try to get data based on column name in dataframe and return informative error if failed.
- Args:
- col (str): column name to look for data (pandas.DataFrame): if dataframe try to get col column from it
Returns (pd.Series or col itself):
-
heatmap_basic
(data=None, x_labels=None, y_labels=None, colorscale=None, colorscale_range=None, annotations_text=None, annotations_font_size=20, annotations_color='white', return_plot=False)¶ Make a heatmap plot, either using 2D arrays of values, or a dataframe.
- Args:
- data: (array) an array of arrays. For example, in case of a pandas
- dataframe ‘df’, data=df.values.tolist(). If None, uses the data frame passed into the constructor.
x_labels: (array) an array of strings to label the heatmap columns y_labels: (array) an array of strings to label the heatmap rows colorscale (str/array): See colorscale in __init__. colorscale_range: (array) Sets the minimum (first array item) and
maximum value (second array item) of the colorscale.- annotations_text: (array) an array of arrays, with each value being
- a string annotation to the corresponding value in ‘data’
annotations_font_size: (int) size of annotation text annotations_color: (str/array) color of annotation text - accepts
similar formats as other color variables
Returns: A Plotly heatmap plot Figure object.
-
heatmap_df
(data=None, cols=None, x_labels=None, x_nqs=6, y_labels=None, y_nqs=4, precision=1, annotation='count', annotation_color='black', colorscale=None, return_plot=False)¶ A heatmap which can accept a dataframe as input directly.
- Args:
data: (dataframe): only the first 3 numerical columns considered cols ([str]): A list of strings specifying the columns of the
dataframe (either data or self.df) to use. Currenly, only 3 columns is supported. Note that the order in cols matter, the first is considered x, second y and the third as z (color)x_labels ([str]): labels for the categories in x data (first column) x_nqs (int or None): if unique values for x_prop is more than this,
x_prop is divided into x_nqs quantiles for better presentation *if x_labels is set, x_nqs ignored (i.e. x_nqs = len(x_labels))y_labels ([str]): similar to x_labels but for the 2nd column in data y_nqs (int or None): similar to x_nqs but for the 2nd column in data precision (int): number of floating points used for binning/display annotation (str or None): mode of annotation. Options are:
None: no annotations “count”: the number of data available in each cell displayed “value”: the actual value of the cell in addition to colorbarannotation_color (str): the color of annotation (text inside cells) colorscale: see the __init__ doc for colorscale return_plot (bool): Returns the dictionary representation of the
figure if True. If False, prints according to self.mode (set with mode in __init__).
Returns: A Plotly heatmap plot Figure object.
-
histogram
(data=None, cols=None, orientation='vertical', histnorm='count', n_bins=None, bins=None, colors=None, bargap=0, return_plot=False)¶ Creates a Plotly histogram. If multiple series of data are available, will create an overlaid histogram.
For n_bins, start, end, size, colors, and bargaps, all defaults are Plotly defaults.
- Args:
- data (DataFrame or list): A dataframe containing at least
- one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
- cols ([str]): A list of strings specifying the columns of the
- dataframe to use. Each column will be represented with its own histogram in the overlay.
- orientation (str): Determines whether histogram is oriented
- horizontally or vertically. Use “vertical” or “horizontal”.
- histnorm: The technique for creating the plot. Can be “probability
- density”, “probability”, “density”, or “” (count).
- n_bins (int or [int]): The number of binds to include on each plot.
- if only one number specified, all histograms will have the same number of bins
- bins (dict or [dict]): specifications of the bins including start,
- end and size. If n_bins is set, size cannot be set in bins. Also size is ignored if start or end not specified. Examples: 1) bins=None, n_bins = 25 2) bins={‘start’: 0, ‘end’: 50, ‘size’: 2.0}, n_bins=None
- colors (str or list): The list of colors for each histogram (if
- overlaid). If only one series of data is present or all series should have the same value, a single str determines the color of the bins.
- bargaps (float or list): The gaps between bars for all histograms
- shown.
- return_plot (bool): Returns the dictionary representation of the
- figure if True. If False, prints according to self.mode (set with mode in __init__).
- Returns:
- Plotly histogram figure.
-
parallel_coordinates
(data=None, cols=None, line=None, precision=2, colors=None, return_plot=False)¶ Create a Plotly Parcoords plot from dataframes.
- Args:
- data (DataFrame or list): A dataframe containing at least
- one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
- cols ([str]): A list of strings specifying the columns of the
- dataframe to use.
colors (str): The name of the column to use for the color bar. line (dict): plotly line dict with keys such as “color” or “width” precision (int): the number of floating points for columns with
float data type (2 is recommended for a nice visualization)- return_plot (bool): Returns the dictionary representation of the
- figure if True. If False, prints according to self.mode (set with mode in __init__).
- Returns:
- a Plotly parallel coordinates plot.
-
scatter_matrix
(data=None, cols=None, colors=None, marker=None, labels=None, marker_scale=1.0, return_plot=False, default_color='#98AFC7', **kwargs)¶ Create a Plotly scatter matrix plot from dataframes using Plotly. Args:
- data (DataFrame or list): A dataframe containing at least
- one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
- cols ([str]): A list of strings specifying the columns of the
- dataframe to use.
colors: (str) name of the column used for colorbar marker (dict): if size is set, it will override the automatic size return_plot (bool): Returns the dictionary representation of the
figure if True. If False, prints according to self.mode (set with mode in __init__).labels (see PlotlyFig.xy_plot documentation): default_color (str): default marker color. Ignored if colors is
set. Histograms color is always set by this default_color.- **kwargs: keyword arguments of scatterplot. Forbidden args are
- ‘size’, ‘color’ and ‘colorscale’ in ‘marker’. See example below
Returns: a Plotly scatter matrix plot
# Example for more control over markers: from matminer.figrecipes.plotly.make_plots import PlotlyFig from matminer.datasets.dataframe_loader import load_elastic_tensor df = load_elastic_tensor() pf = PlotlyFig() pf.scatter_matrix(df[[‘volume’, ‘G_VRH’, ‘K_VRH’, ‘poisson_ratio’]],
colorcol=’poisson_ratio’, text=df[‘material_id’], marker={‘symbol’: ‘diamond’, ‘size’: 8, ‘line’: {‘width’: 1, ‘color’: ‘black’}}, colormap=’Viridis’, title=’Elastic Properties Scatter Matrix’)
-
set_arguments
(**kwargs)¶ Method to modify some of the layout and PlotlyFig arguments after instantiation.
Allowed arguments: title, x_title, y_title, colorbar_title, filename, mode, api_key, username, show_offline_plot
- Args:
- **kwargs: allowed variables to change are listed below:
Returns: None
-
violin
(data=None, cols=None, use_colorscale=False, rugplot=False, group_col=None, groups=None, colorscale=None, return_plot=False)¶ Create a violin plot using Plotly.
- Args:
- data: (DataFrame/list) A dataframe containing at least one
- numerical column. Also accepts lists/arrays of numerical values, using columns as separate variables (distributions are down rows). If None, uses the dataframe passed into the constructor.
- cols: ([str]) The labels for the columns of the dataframe to be
- included in the plot. If data is passed as a list/array, pass a list of cols to be used as labels for the violins.
- rugplot: (bool) If True, plots the distribution of the data next
- to the violin with a ‘rugplot’.
- group_col: (str) Name of the column containing the group for each
- row, if it exists. Used only if there is one entry in cols.
- groups: ([str]): All group names to be included in the violin plot.
- Used only if there is one entry in cols.
- colorscale: (str/tuple/list/dict) either a plotly scale name (Greys,
- YlGnBu, Greens, etc.), an rgb or hex color, a color tuple, a list/dict of colors. The color is representative of the median value of the violin.
- use_colorscale: (bool) Only applicable if grouping by another
- variable. Will implement a colorscale based on the first 2 colors of param colors. This means colors must be a list with at least 2 colors in it (Plotly colorscales are accepted since they map to a list of two rgb colors)
- return_plot (bool): Returns the dictionary representation of the
- figure if True. If False, prints according to self.mode (set with mode in __init__).
Returns: A Plotly violin plot Figure object.
-
xy
(xy_pairs, colors=None, color_range=None, labels=None, names=None, sizes=None, modes='markers', markers=None, marker_scale=1.0, lines=None, colorscale=None, showlegends=None, error_bars=None, normalize_size=True, return_plot=False)¶ Make an XY scatter plot, either using arrays of values, or a dataframe.
- Args:
- xy_pairs (tuple or [tuple]): x & y columns of scatter plots
- with possibly different lengths are extracted from this arg example: ([1, 2], [3, 4]) example: [(df[‘x1’], df[‘y1’]), (df[‘x2’], df[‘y2’])] example: [(‘x1’, ‘y1’), (‘x2’, ‘y2’)]
- colors (list or np.ndarray or pd.Series): set the colorscale for
- the colorbar (list of numbers); overwrites marker[‘color’]
- color_range ([min, max]): the range of numbers included in colorbar.
- if any number is outside of this range, it will be forced to either one. Note that if colorcol_range is set, the colorbar ticks will be updated to reflext -min or max+ at the two ends.
- labels (list or [list]): to individually set annotation for scatter
- point either the same for all traces or can be set for each
- names (str or [str]): list of trace names used for legend. By
- default column name (or trace if NA) used if pd.Series passed
- sizes (str, float, [float], [list]). Options:
- str: column name in data with list of numbers used for marker size float: a single size used for all traces in xy_pairs [float]: list of fixed sizes used for traces (length==len(xy_pairs)) [list]: list of list of sizes for each trace in xy_pairs
- modes (str or [str]): trace style; can be ‘markers’, ‘lines’ or
- ‘lines+markers’.
- markers (dict or [dict]): gives the ability to fine tune marker
- of each scatter plot individually if list of dicts passed. Note that the key “size” is forbidden in markers. Use sizes arg instead.
lines (dict or [dict]: similar to markers though only if mode==’lines’ colorscale (str): see the colorscale doc in __init__ showlegends (bool or [bool]): indicating whether to show legend
for each trace (or simply turn it on/off for all if not list)- error_bars ([str or list]): numbers used for error bars in the y
- direction. String input is interpreted as dataframe column name
normalize_size (bool): if True, normalize the size list. return_plot (bool): Returns the dictionary representation of the
figure if True. If False, prints according to self.mode (set with mode in __init__).
Returns: A Plotly Scatter plot Figure object.
-