File I/O & Processing (mpes.fprocessing)

Custom methods to handle ARPES data I/O and standard data processing methods (filtering, dewarping, etc.)

@author: R. Patrick Xian, L. Rettig

mpes.fprocessing._arraysum(array_a, array_b)

Calculate the sum of two arrays.

mpes.fprocessing._hist1d_numba_seq(sample, bins, ranges)

1D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers

mpes.fprocessing._hist2d_numba_seq(sample, bins, ranges)

2D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers

mpes.fprocessing._hist3d_numba_seq(sample, bins, ranges)

3D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers

mpes.fprocessing._hist4d_numba_seq(sample, bins, ranges)

4D Binning function, pre-compiled by Numba for performance. Behaves much like numpy.histogramdd, but calculates and returns unsigned 32 bit integers

mpes.fprocessing.applyJitter(df, amp, col, type)

Add jittering to a dataframe column.

Parameters

df: dataframe

Dataframe to add noise/jittering to.

amp: numeric

Amplitude scaling for the jittering noise.

col: str

Name of the column to add jittering to.

Return

Uniformly distributed noise vector with specified amplitude and size.

mpes.fprocessing.binDataframe(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', **kwds)

Calculate multidimensional histogram from columns of a dask dataframe. Prof. Yves Acremann’s method.

Paramters

axes: (list of) strings | None

Names the axes to bin.

nbins: (list of) int | None

Number of bins along each axis.

ranges: (list of) tuples | None

Ranges of binning along every axis.

binDict: dict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

pbar: bool | True

Option to display a progress bar.

pbenv: str | ‘classic’

Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).

jittered: bool | True

Option to add histogram jittering during binning.

**kwds: keyword arguments

See keyword arguments in mpes.fprocessing.hdf5Processor.localBinning().

Return
histdictdict

Dictionary containing binned data and the axes values (if ret = True).

mpes.fprocessing.binDataframe_fast(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', jpart=True, **kwds)

Calculate multidimensional histogram from columns of a dask dataframe.

Paramters
axes(list of) strings | None

Names the axes to bin.

nbins(list of) int | None

Number of bins along each axis.

ranges(list of) tuples | None

Ranges of binning along every axis.

binDictdict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

pbarbool | True

Option to display a progress bar.

pbenvstr | ‘classic’

Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).

jitteredbool | True

Option to add histogram jittering during binning.

**kwdskeyword arguments

See keyword arguments in mpes.fprocessing.hdf5Processor.localBinning().

Return
histdictdict

Dictionary containing binned data and the axes values (if ret = True).

mpes.fprocessing.binDataframe_lean(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', **kwds)

Calculate multidimensional histogram from columns of a dask dataframe.

Paramters

axes: (list of) strings | None

Names the axes to bin.

nbins: (list of) int | None

Number of bins along each axis.

ranges: (list of) tuples | None

Ranges of binning along every axis.

binDict: dict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

pbar: bool | True

Option to display a progress bar.

pbenv: str | ‘classic’

Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).

jittered: bool | True

Option to add histogram jittering during binning.

**kwds: keyword arguments

See keyword arguments in mpes.fprocessing.hdf5Processor.localBinning().

Return histdict: dict

Dictionary containing binned data and the axes values (if ret = True).

mpes.fprocessing.binDataframe_numba(df, ncores=4, axes=None, nbins=None, ranges=None, binDict=None, pbar=True, jittered=True, pbenv='classic', jpart=True, **kwds)

Calculate multidimensional histogram from columns of a dask dataframe.

Paramters
axes(list of) strings | None

Names the axes to bin.

nbins(list of) int | None

Number of bins along each axis.

ranges(list of) tuples | None

Ranges of binning along every axis.

binDictdict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

pbarbool | True

Option to display a progress bar.

pbenvstr | ‘classic’

Progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).

jitteredbool | True

Option to add histogram jittering during binning.

**kwdskeyword arguments

See keyword arguments in mpes.fprocessing.hdf5Processor.localBinning().

Return
histdictdict

Dictionary containing binned data and the axes values (if ret = True).

mpes.fprocessing.binPartition(partition, binaxes, nbins, binranges, jittered=False, jitter_params={})

Bin the data within a file partition (e.g. dask dataframe).

Parameters

partition: dataframe partition

Partition of a dataframe.

binaxes: list

List of axes to bin.

nbins: list

Number of bins for each binning axis.

binranges: list

The range of each axis to bin.

jittered: bool | False

Option to include jittering in binning.

jitter_params: dict | {}

Parameters used to set jittering.

Return

hist_partition: ndarray

Histogram from the binning process.

mpes.fprocessing.binPartition_numba(partition, binaxes, nbins, binranges, jittered=False, jitter_params={})

Bin the data within a file partition (e.g. dask dataframe).

Parameters
partitiondataframe partition

Partition of a dataframe.

binaxeslist

List of axes to bin.

nbinslist

Number of bins for each binning axis.

binrangeslist

The range of each axis to bin.

jitteredbool | False

Option to include jittering in binning.

jitter_paramsdict | {}

Parameters used to set jittering.

Return
hist_partitionndarray

Histogram from the binning process.

class mpes.fprocessing.dataframeProcessor(datafolder, paramfolder='', datafiles=[], ncores=None)

Processs the parquet file converted from single events data.

_addBinners(axes=None, nbins=None, ranges=None, binDict=None)

Construct the binning parameters within an instance.

appendColumn(colnames, colvals)

Append columns to dataframe.

Parameters

colnames: list/tuple

New column names.

colvals: numpy array/list

Entries of the new columns.

appendEAxis(E0, **kwds)

Calculate and append the E axis to the events dataframe. This method can be reused.

Parameter

E0: numeric

Time-of-flight offset.

appendKAxis(x0, y0, X='X', Y='Y', newX='kx', newY='ky', **kwds)

Calculate and append the k axis coordinates (kx, ky) to the events dataframe. This method can be reused.

appendMarker(source_name='ADC', mapping=<function multithresh>, marker_name='Marker', lower_bounds=[], upper_bounds=[], thresholds=[], update='append', **kwds)

Append markers to specific ranges in a source column. The mapping of the marker is usually a piecewise defined function. This enables binning in nonequivalent steps as the next step.

appendRow(folder=None, df=None, ftype='parquet', **kwds)

Append rows read from other files to existing dataframe.

Parameters

folder: str | None

Folder directory for the files to append to the existing dataframe (i.e. when appending parquet files).

df: dataframe | None

Dataframe to append to the exisitng dataframe.

ftype: str | ‘parquet’

File type (‘parquet’, ‘dataframe’)

**kwds: keyword arguments

Additional arguments to submit to dask.dataframe.append().

applyECorrection(type, **kwds)

Apply correction to the time-of-flight (TOF) axis of single-event data.

Parameters

type: str

Type of correction to apply to the TOF axis.

**kwds: keyword arguments

Additional parameters to use for the correction. :corraxis: str | ‘t’

String name of the axis to correct.

center

list/tuple | (650, 650) Image center pixel positions in (row, column) format.

amplitude

numeric | -1 Amplitude of the time-of-flight correction term (negative sign meaning subtracting the curved wavefront).

d

numeric | 0.9 Field-free drift distance.

t0

numeric | 0.06 Time zero position corresponding to the tip of the valence band.

gam

numeric Linewidth value for correction using a 2D Lorentz profile.

sig

numeric Standard deviation for correction using a 2D Gaussian profile.

gam2

numeric Linewidth value for correction using an asymmetric 2D Lorentz profile, X-direction.

amplitude2

numeric Amplitude value for correction using an asymmetric 2D Lorentz profile, X-direction.

applyFilter(colname, lb=- inf, ub=inf, update='replace', ret=False)

Application of bound filters to a specified column (can be used consecutively).

Parameters

colname: str

Name of the column to filter.

lb, ub: numeric, numeric | -infinity, infinity

The lower and upper bounds used in the filtering.

update: str | ‘replace’

Update option for the filtered dataframe.

ret: bool | False

Return option for the filtered dataframe.

applyKCorrection(X='X', Y='Y', newX='Xm', newY='Ym', type='mattrans', **kwds)

Calculate and replace the X and Y values with their distortion-corrected version. This method can be reused.

Parameters

X, Y: str, str | ‘X’, ‘Y’

Labels of the columns before momentum distortion correction.

newX, newY: str, str | ‘Xm’, ‘Ym’

Labels of the columns after momentum distortion correction.

columnApply(mapping, rescolname, **kwds)

Apply a user-defined function (e.g. partial function) to an existing column.

Parameters

mapping: function

Function to apply to the column.

rescolname: str

Name of the resulting column.

**kwds: keyword arguments

Keyword arguments of the user-input mapping function.

convert(form='parquet', save_addr=None, namestr='/data', pq_append=False, **kwds)

Update or convert to other file formats.

Parameters

form: str | ‘parquet’

File format to convert into.

save_addr: str | None

Path of the folder to save the converted files to.

namestr: ‘/data’

Extra namestring attached to the filename.

pq_append: bool | False

Option to append to the existing parquet file (if True) in the specified folder, otherwise the existing parquet files will be deleted before writing new files in.

**kwds: keyword arguments

See extra keyword arguments in dask.dataframe.to_parquet() for parquet conversion, or in dask.dataframe.to_hdf() for HDF5 conversion.

deleteColumn(colnames)

Delete columns.

Parameters

colnames: str/list/tuple

List of column names to be dropped.

distributedBinning(axes, nbins, ranges, binDict=None, pbar=True, binmethod='numba', ret=False, **kwds)

Binning the dataframe to a multidimensional histogram.

Parameters

axes, nbins, ranges, binDict, pbar:

See mpes.fprocessing.binDataframe().

binmethod: str | ‘numba’

Dataframe binning method (‘original’, ‘lean’, ‘fast’ and ‘numba’).

ret: bool | False

Option to return binning results as a dictionary.

**kwds: keyword arguments

See mpes.fprocessing.binDataframe() or mpes.fprocessing.binDataframe_lean()

getCountRate(fids='all', plot=False)

Create count rate data for the files in the data frame processor specified in fids.

Parameters fids: the file ids to include. ‘all’ | list of file ids.

See arguments in parallelHDF5Processor.subset() and hdf5Processor.getCountRate().

getElapsedTime(fids='all')

Return the elapsed time in the file from the msMarkers wave.

Return

The length of the the file in seconds.

mapColumn(mapping, *args, **kwds)

Apply a dataframe-partition based mapping function to an existing column.

Parameters

oldcolname: str

The name of the column to use for computation.

mapping: function

Functional map to apply to the values of the old column. Takes the data frame as first argument. Further arguments are passed by **kwds

newcolname: str | ‘Transformed’

New column name to be added to the dataframe.

args: tuple | ()

Additional arguments of the functional map.

update: str | ‘append’

Updating option. ‘append’ = append to the current dask dataframe as a new column with the new column name. ‘replace’ = replace the values of the old column.

**kwds: keyword arguments

Additional arguments for the dask.dataframe.apply() function.

property ncol

Number of columns in the distrbuted dataframe.

property nrow

Number of rows in the distributed dataframe.

read(source='folder', ftype='parquet', fids=[], update='', timeStamps=False, **kwds)

Read into distributed dataframe.

**Parameters*8

source: str | ‘folder’

Source of the file readout. :’folder’: Read from the provided data folder. :’files’: Read from the provided list of file addresses.

ftype: str | ‘parquet’

Type of file to read into dataframe (‘h5’ or ‘hdf5’, ‘parquet’, ‘json’, ‘csv’).

fids: list | []

IDs of the files to be selected (see mpes.base.FileCollection.select()). Specify ‘all’ to read all files of the given file type.

update: str | ‘’

File selection update option (see mpes.base.FileCollection.select()).

**kwds: keyword arguments

See keyword arguments in mpes.readDataframe().

saveHistogram(form, save_addr, dictname='histdict', **kwds)

Export binned histogram in other formats.

Parameters

See mpes.fprocessing.saveDict().

toBandStructure()

Convert to the xarray data structure from existing binned data.

Return

An instance of BandStructure() or MPESDataset() from the mpes.bandstructure module.

transformColumn(oldcolname, mapping, newcolname='Transformed', args=(), update='append', **kwds)

Apply a simple function to an existing column.

Parameters

oldcolname: str

The name of the column to use for computation.

mapping: function

Functional map to apply to the values of the old column.

newcolname: str | ‘Transformed’

New column name to be added to the dataframe.

args: tuple | ()

Additional arguments of the functional map.

update: str | ‘append’

Updating option. ‘append’ = append to the current dask dataframe as a new column with the new column name. ‘replace’ = replace the values of the old column.

**kwds: keyword arguments

Additional arguments for the dask.dataframe.apply() function.

transformColumn2D(map2D, X, Y, **kwds)

Apply a mapping simultaneously to two dimensions.

Parameters

map2D: function

2D mapping function.

X, Y: series, series

The two columns of the dataframe to apply mapping to.

**kwds: keyword arguments

Additional arguments for the 2D mapping function.

viewEventHistogram(dfpid, ncol, axes=['X', 'Y', 't', 'ADC'], bins=[80, 80, 80, 80], ranges=[0, 1800, 0, 1800, 68000, 74000, 0, 500], backend='bokeh', legend=True, histkwds={}, legkwds={}, **kwds)

Plot individual histograms of specified dimensions (axes) from a substituent dataframe partition.

Parameters

dfpid: int

Number of the data frame partition to look at.

ncol: int

Number of columns in the plot grid.

axes: list/tuple

Name of the axes to view.

bins: list/tuple

Bin values of all speicified axes.

ranges: list

Value ranges of all specified axes.

backend: str | ‘matplotlib’

Backend of the plotting library (‘matplotlib’ or ‘bokeh’).

legend: bool | True

Option to include a legend in the histogram plots.

histkwds, legkwds, **kwds: dict, dict, keyword arguments

Extra keyword arguments passed to mpes.visualization.grid_histogram().

mpes.fprocessing.extractEDC(folder=None, files=[], axes=['t'], bins=[1000], ranges=[65000, 100000], binning_kwds={'jittered': True}, ret=True, **kwds)

Extract EDCs from a list of bias scan files.

class mpes.fprocessing.hdf5Processor(f_addr, **kwds)

Class for generating multidimensional histogram from hdf5 files.

_addBinners(axes=None, nbins=None, ranges=None, binDict=None, irregular_bins=False)

Construct the binning parameters within an instance.

getCountRate(plot=False)

Create count rate trace from the msMarker field in the hdf5 file.

Parameters

plot: bool | False

No function yet.

Return

countRate: numeric

The count rate in Hz.

secs: numeric

The seconds into the scan.

getElapsedTime()

Return the elapsed time in the file from the msMarkers wave.

Return

The length of the the file in seconds.

loadMapping(energy, momentum)

Load the mapping parameters

localBinning(axes=None, nbins=None, ranges=None, binDict=None, jittered=False, histcoord='midpoint', ret='dict', **kwds)

Compute the photoelectron intensity histogram locally after loading all data into RAM.

Paramters

axes: (list of) strings | None

Names the axes to bin.

nbins: (list of) int | None

Number of bins along each axis.

ranges: (list of) tuples | None

Ranges of binning along every axis.

binDict: dict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

jittered: bool | False

Determines whether to add jitter to the data to avoid rebinning artefact.

histcoord: string | ‘midpoint’

The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).

ret: bool | True
True

returns the dictionary containing binned data explicitly

False

no explicit return of the binned data, the dictionary

generated in the binning is still retained as an instance attribute.

**kwds: keyword argument

Return

histdict: dict

Dictionary containing binned data and the axes values (if ret = True).

localBinning_numba(axes=None, nbins=None, ranges=None, binDict=None, jittered=False, histcoord='midpoint', ret='dict', **kwds)

Compute the photoelectron intensity histogram locally after loading all data into RAM.

Paramters
axes(list of) strings | None

Names the axes to bin.

nbins(list of) int | None

Number of bins along each axis.

ranges(list of) tuples | None

Ranges of binning along every axis.

binDictdict | None

Dictionary with specifications of axes, nbins and ranges. If binDict is not None. It will override the specifications from other arguments.

jitteredbool | False

Determines whether to add jitter to the data to avoid rebinning artefact.

histcoordstring | ‘midpoint’

The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).

retbool | True
True

returns the dictionary containing binned data explicitly

False

no explicit return of the binned data, the dictionary

generated in the binning is still retained as an instance attribute.

**kwdskeyword argument

keyword

data type

default

meaning

amin

numeric/None

None

minimum value of electron sequence

amax

numeric/None

None

maximum value of electron sequence

jitter_axes

list

axes

list of axes to jitter

jitter_bins

list

nbins

list of the number of bins

jitter_amplitude

numeric/array

0.5

jitter amplitude (single number for all)

jitter_ranges

list

ranges

list of the binning ranges

Return
histdictdict

Dictionary containing binned data and the axes values (if ret = True).

saveHistogram(dictname='histdict', form='h5', save_addr='./histogram', **kwds)

Save binned histogram and the axes. See mpes.fprocessing.saveDict().

saveParameters(form='h5', save_addr='./binning')

Save all the attributes of the binning instance for later use (e.g. binning axes, ranges, etc).

Parameters

form: str | ‘h5’

File format to for saving the parameters (‘h5’/’hdf5’, ‘mat’).

save_addr: str | ‘./binning’

The address for the to be saved file.

toBandStructure()

Convert to an instance of BandStructure.

toSplitter()

Convert to an instance of hdf5Splitter.

updateHistogram(axes=None, sliceranges=None, ret=False)

Update the dimensional sizes of the binning results.

viewEventHistogram(ncol, axes=['X', 'Y', 't', 'ADC'], bins=[80, 80, 80, 80], ranges=[0, 1800, 0, 1800, 68000, 74000, 0, 500], axes_name_type='alias', backend='bokeh', legend=True, histkwds={}, legkwds={}, **kwds)

Plot individual histograms of specified dimensions (axes).

Parameters

ncol: int

Number of columns in the plot grid.

axes: list/tuple

Name of the axes to view.

bins: list/tuple

Bin values of all speicified axes.

ranges: list

Value ranges of all specified axes.

axes_name_type: str | ‘alias’
‘alias’

human-comprehensible aliases of the datasets from the hdf5 file (e.g. ‘X’, ‘ADC’, etc)

‘original’

original names of the datasets from the hdf5 file (e.g. ‘Stream0’, etc).

Type of specified axes names.

backend: str | ‘matplotlib’

Backend of the plotting library (‘matplotlib’ or ‘bokeh’).

legend: bool | True

Option to include a legend in the histogram plots.

histkwds, legkwds, **kwds: dict, dict, keyword arguments

Extra keyword arguments passed to mpes.visualization.grid_histogram().

class mpes.fprocessing.hdf5Reader(f_addr, ncores=None, **kwds)

HDF5 reader class.

_assembleGroups(gnames, amin=None, amax=None, use_alias=True, dtyp='float32', timeStamps=False, ret='array')

Assemble the content values of the selected groups.

Parameters

gnames: list

List of group names.

amin, amax: numeric, numeric | None, None

Index selection range for all groups.

use_alias: bool | True

See hdf5Reader.getGroupNames().

dtype: str | ‘float32’

Data type string.

ret: str | ‘array’

Return type specification (‘array’ or ‘dict’).

convert(form, save_addr='./summary', pq_append=False, **kwds)

Format conversion from hdf5 to mat (for Matlab/Python) or ibw (for Igor).

Parameters

form: str

The format of the data to convert into.

save_addr: str | ‘./summary’

File address to save to.

pq_append: bool | False

Option to append to parquet files. :True: Append to existing parquet files. :False: The existing parquet files will be deleted before new file creation.

getAttributeNames(wexpr=None, woexpr=None)

Retrieve attribute names from the loaded hdf5 file with string filtering.

Parameters

wexpr: str | None

Expression in a name to leave in the attribute name list (w = with).

woexpr: str | None

Expression in a name to leave out of the attribute name list (wo = without).

Return

filteredAttrbuteNames: list

List of filtered attribute names.

getGroupNames(wexpr=None, woexpr=None, use_alias=False)

Retrieve group names from the loaded hdf5 file with string filtering.

Parameters

wexpr: str | None

Expression in a name to leave in the group name list (w = with).

woexpr: str | None

Expression in a name to leave out of the group name list (wo = without).

use_alias: bool | False

Specification on the use of alias to replace the variable name.

Return

filteredGroupNames: list

List of filtered group names.

name2alias(names_to_convert)

Find corresponding aliases of the named groups.

Parameter

names_to_convert: list/tuple

Names to convert to aliases.

Return

aliases: list/tuple

Aliases corresponding to the names.

static readAttribute(element, *attribute, nullval='None')

Retrieve the content of the attribute(s) in the loaded hdf5 file.

Parameter

attribute: list/tuple

Collection of attribute names.

nullval: str | ‘None’

Null value to retrieve as a replacement of NoneType.

Return

attributeContent: list/tuple

Collection of values of the corresponding attributes.

static readGroup(element, *group, amin=None, amax=None, sliced=True)

Retrieve the content of the group(s) in the loaded hdf5 file.

Parameter

group: list/tuple

Collection of group names.

amin, amax: numeric, numeric | None, None

Minimum and maximum indice to select from the group (dataset).

sliced: bool | True

Perform slicing on the group (dataset), if True.

Return

groupContent: list/tuple

Collection of values of the corresponding groups.

summarize(form='text', use_alias=True, timeStamps=False, ret=False, **kwds)

Summarize the content of the hdf5 file (names of the groups, attributes and the selected contents. Output in various user-specified formats.)

Parameters

form: str | ‘text’
‘dataframe’

HDF5 content summarized into a dask dataframe.

‘dict’

HDF5 content (both data and metadata) summarized into a dictionary.

‘metadict’

HDF5 metadata summarized into a dictionary.

‘text’

descriptive text summarizing the HDF5 content.

Format to summarize the content of the file into.

use_alias: bool | True

Specify whether to use the alias to rename the groups.

ret: bool | False

Specify whether function return is sought.

**kwds: keyword arguments

Return

hdfdict: dict

Dictionary including both the attributes and the groups, using their names as the keys.

edf: dataframe

Dataframe (edf = electron dataframe) constructed using only the group values, and the column names are the corresponding group names (or aliases).

class mpes.fprocessing.hdf5Splitter(f_addr, **kwds)

Class to split large hdf5 files.

split(nsplit, save_addr='./', namestr='split_', split_group='Stream_0', pbar=False)

Split and save an hdf5 file.

Parameters

nsplit: int

Number of split files.

save_addr: str | ‘./’

Directory to store the split files.

namestr: str | ‘split_

Additional namestring attached to the front of the filename.

split_group: str | ‘Stream_0’

Name of the example group to split for file length reference.

pbar: bool | False

Enable (when True)/Disable (when False) the progress bar.

subset(file_id)

Spawn an instance of hdf5Processor from a specified split file.

toProcessor()

Change to an hdf5Processor instance.

mpes.fprocessing.im2mat(fdir)

Convert image to numpy ndarray.

mpes.fprocessing.mat2im(datamat, dtype='uint8', scaling=['normal'], savename=None)

Convert data matrix to image.

mpes.fprocessing.metaReadHDF5(hfile, attributes=[], groups=[])

Parse the attribute (i.e. metadata) tree in the input HDF5 file and construct a dictionary of attributes.

Parameters

hfile: HDF5 file instance

Instance of the h5py.File class.

attributes, groups: list, list | [], []

List of strings representing the names of the specified attribute/group names. When specified as None, the components (all attributes or all groups) are ignored. When specified as [], all components (attributes/groups) are included. When specified as a list of strings, only the attribute/group names matching the strings are retrieved.

mpes.fprocessing.numba_histogramdd(sample, bins, ranges)

Wrapper for the Number pre-compiled binning functions. Behaves in total much like numpy.histogramdd. Returns uint32 arrays. This was chosen because it has a significant performance improvement over uint64 for large binning volumes. Be aware that this can cause overflows for very large sample sets exceeding 3E9 counts in a single bin. This should never happen in a realistic photoemission experiment with useful bin sizes.

class mpes.fprocessing.parallelHDF5Processor(files=[], file_sorting=True, folder=None, ncores=None)

Class for parallel processing of hdf5 files.

_parse_metadata(attributes, groups)

Parse the metadata from all HDF5 files.

Parameters

attributes, groups: list, list

See mpes.fprocessing.metaReadHDF5().

combineResults(ret=True)

Combine the results from all segments (only when self.results is non-empty).

Parameters

ret: bool | True
True

returns the dictionary containing binned data explicitly

False

no explicit return of the binned data, the dictionary

generated in the binning is still retained as an instance attribute.

Return

combinedresult: dict

Return combined result dictionary (if ret == True).

convert(form='parquet', save_addr='./summary', append_to_folder=False, pbar=True, pbenv='classic', **kwds)

Convert files to another format (e.g. parquet).

Parameters

form: str | ‘parquet’

File format to convert into.

save_addr: str | ‘./summary’

Path of the folder for saving parquet files.

append_to_folder: bool | False

Option to append to the existing parquet files in the specified folder, otherwise the existing parquet files will be deleted first. The HDF5 files in the same folder are kept intact.

pbar: bool | True

Option to display progress bar.

pbenv: str | ‘classic’

Specification of the progress bar environment (‘classic’ for generic version and ‘notebook’ for notebook compatible version).

**kwds: keyword arguments

See mpes.fprocessing.hdf5Processor.convert().

getCountRate(fids='all', plot=False)

Create count rate data for the files in the parallel hdf5 processor specified in ‘fids’

Parameters

fids: the file ids to include. ‘all’ | list of file ids.

See arguments in parallelHDF5Processor.subset() and hdf5Processor.getCountRate().

getElapsedTime(fids='all')

Return the elapsed time in the file from the msMarkers wave

return: secs: the length of the the file in seconds.

parallelBinning(axes, nbins, ranges, scheduler='threads', combine=True, histcoord='midpoint', pbar=True, binning_kwds={}, compute_kwds={}, pbenv='classic', ret=False)

Parallel computation of the multidimensional histogram from file segments. Version with serialized loop over processor threads and parallel recombination to save memory.

Parameters

axes: (list of) strings | None

Names the axes to bin.

nbins: (list of) int | None

Number of bins along each axis.

ranges: (list of) tuples | None

Ranges of binning along every axis.

scheduler: str | ‘threads’

Type of distributed scheduler (‘threads’, ‘processes’, ‘synchronous’)

histcoord: string | ‘midpoint’

The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).

pbar: bool | true

Option to display the progress bar.

binning_kwds: dict | {}

Keyword arguments to be included in mpes.fprocessing.hdf5Processor.localBinning().

compute_kwds: dict | {}

Keyword arguments to specify in dask.compute().

parallelBinning_old(axes, nbins, ranges, scheduler='threads', combine=True, histcoord='midpoint', pbar=True, binning_kwds={}, compute_kwds={}, ret=False)

Parallel computation of the multidimensional histogram from file segments. Old version with completely parallel binning with unlimited memory consumption. Crashes for very large data sets.

Parameters
axes(list of) strings | None

Names the axes to bin.

nbins(list of) int | None

Number of bins along each axis.

ranges(list of) tuples | None

Ranges of binning along every axis.

schedulerstr | ‘threads’

Type of distributed scheduler (‘threads’, ‘processes’, ‘synchronous’)

combinebool | True

Option to combine the results obtained from distributed binning.

histcoordstring | ‘midpoint’

The coordinates of the histogram. Specify ‘edge’ to get the bar edges (every dimension has one value more), specify ‘midpoint’ to get the midpoint of the bars (same length as the histogram dimensions).

pbarbool | true

Option to display the progress bar.

binning_kwdsdict | {}

Keyword arguments to be included in mpes.fprocessing.hdf5Processor.localBinning().

compute_kwdsdict | {}

Keyword arguments to specify in dask.compute().

saveHistogram(dictname='combinedresult', form='h5', save_addr='./histogram', **kwds)

Save binned histogram and the axes.

Parameters

See mpes.fprocessing.saveDict().

saveParameters(form='h5', save_addr='./binning')

Save all the attributes of the binning instance for later use (e.g. binning axes, ranges, etc).

Parameters

form: str | ‘h5’

File format to for saving the parameters (‘h5’/’hdf5’, ‘mat’).

save_addr: str | ‘./binning’

The address for the to be saved file.

subset(file_id)

Spawn an instance of mpes.fprocessing.hdf5Processor from a specified substituent file.

Parameter

file_id: int

Integer-numbered file ID (any integer from 0 to self.nfiles - 1).

summarize(form='dataframe', ret=False, **kwds)

Summarize the measurement information from all HDF5 files.

Parameters

form: str | ‘dataframe’

Format of the files to summarize into.

ret: bool | False

Specification on value return.

**kwds: keyword arguments

See keyword arguments in mpes.fprocessing.readDataframe().

updateHistogram(axes=None, sliceranges=None, ret=False)

Update the dimensional sizes of the binning results.

Parameters

axes: tuple/list | None

Collection of the names of axes for size change.

sliceranges: tuple/list/array | None

Collection of ranges, e.g. (start_position, stop_position) pairs, for each axis to be updated.

ret: bool | False

Option to return updated histogram.

viewEventHistogram(fid, ncol, **kwds)

Plot individual histograms of specified dimensions (axes) from a substituent file.

Parameters

See arguments in parallelHDF5Processor.subset() and hdf5Processor.viewEventHistogram().

class mpes.fprocessing.parquetProcessor(folder, files=[], source='folder', ftype='parquet', fids=[], update='', ncores=None, **kwds)

Legacy version of the mpes.fprocessing.dataframeProcessor class.

mpes.fprocessing.readARPEStxt(fdir, withCoords=True)

Read and convert Igor-generated ARPES .txt files into numpy arrays. The withCoords option specify whether the energy and angle information is given.

mpes.fprocessing.readBinnedhdf5(fpath, combined=True, typ='float32')

Read binned hdf5 file (3D/4D data) into a dictionary.

Parameters

fpath: str

File path.

combined: bool | True

Specify if the volume slices are combined.

typ: str | ‘float32’

Data type of the numerical values in the output dictionary.

Return

out: dict

Dictionary with keys being the axes and the volume (slices).

mpes.fprocessing.readDataframe(folder=None, files=None, ftype='parquet', timeStamps=False, **kwds)

Read stored files from a folder into a dataframe.

Parameters

folder, files: str, list/tuple | None, None

Folder path of the files or a list of file paths. The folder path has the priority such that if it’s specified, the specified files will be ignored.

ftype: str | ‘parquet’

File type to read (‘h5’ or ‘hdf5’, ‘parquet’, ‘json’, ‘csv’, etc). If a folder path is given, all files of the specified type are read into the dataframe in the reading order.

**kwds: keyword arguments

See the keyword arguments for the specific file parser in dask.dataframe module.

Return

Dask dataframe read from specified files.

mpes.fprocessing.readIgorBinFile(fdir, **kwds)

Read Igor binary formats (pxp and ibw).

mpes.fprocessing.readimg(f_addr)

Read images (jpg, png, 2D/3D tiff)

mpes.fprocessing.readtsv(fdir, header=None, dtype='float', **kwds)

Read tsv file from hemispherical detector

Parameters

fdir: str

file directory

header: int | None

number of header lines

dtype: str | ‘float’

data type of the return numpy.ndarray

**kwds: keyword arguments

other keyword arguments for pandas.read_table().

Return

data: numpy ndarray

read and type-converted data

mpes.fprocessing.rot2d(th, angle_unit)

Construct 2D rotation matrix.

mpes.fprocessing.saveDict(dct={}, processor=None, dictname='', form='h5', save_addr='./histogram', **kwds)

Save the binning result dictionary, including the histogram and the axes values (edges or midpoints).

Parameters

dct: dict | {}

A dictionary containing the binned data and axes values to be exported.

processor: class | None

Class including all attributes.

dictname: str | ‘’

Namestring of the dictionary to save (such as the attribute name in a class).

form: str | ‘h5’

Save format, supporting ‘mat’, ‘h5’/’hdf5’, ‘tiff’ (need tifffile) or ‘png’ (need imageio).

save_addr: str | ‘./histogram’

File path to save the binning result.

**kwds: keyword arguments
mpes.fprocessing.sgfltr2d(datamat, span, order, axis=0)

Savitzky-Golay filter for two dimensional data Operated in a line-by-line fashion along one axis Return filtered data

mpes.fprocessing.sortNamesBy(namelist, pattern, gp=0, slicerange=None, None)

Sort a list of names according to a particular sequence of numbers (specified by a regular expression search pattern).

Parameters

namelist: str

List of name strings.

pattern: str

Regular expression of the pattern.

gp: int

Grouping number.

Returns

orderedseq: array

Ordered sequence from sorting.

sortednamelist: str

Sorted list of name strings.

mpes.fprocessing.txtlocate(ffolder, keytext)

Locate specific txt files containing experimental parameters.