mdfreader module documentation¶
Measured Data Format file reader main module
Platform and python version¶
With Unix and Windows for python 2.6+ and 3.2+
Author: | Aymeric Rateau |
---|
Created on Sun Oct 10 12:57:28 2010
Dependencies¶
- Python >2.6, >3.2 <http://www.python.org>
- Numpy >1.6 <http://numpy.scipy.org>
- Sympy to convert channels with formula
- bitarray for not byte aligned data parsing
- Matplotlib >1.0 <http://matplotlib.sourceforge.net>
- NetCDF
- h5py for the HDF5 export
- xlwt for the excel export (not existing for python3)
- openpyxl for the excel 2007 export
- scipy for the Matlab file conversion
- zlib to uncompress data block if needed
Attributes¶
- PythonVersion : float
- Python version currently running, needed for compatibility of both python 2.6+ and 3.2+
mdfreader module¶
-
class
mdfreader.mdfreader.
mdf
(fileName=None, channelList=None, convertAfterRead=True, filterChannelNames=False, noDataLoading=False, compression=False, convertTables=False, metadata=2)¶ Bases:
mdfreader.mdf3reader.mdf3
,mdfreader.mdf4reader.mdf4
mdf class
Notes
mdf class is a nested dict Channel name is the primary dict key of mdf class At a higher level, each channel includes the following keys :
- ‘data’ : containing vector of data (numpy)
- ‘unit’ : unit (string)
- ‘master’ : master channel of channel (time, crank angle, etc.)
- ‘description’ : Description of channel
- ‘conversion’: mdfinfo nested dict for CCBlock.
- Exist if channel not converted, used to convert with getChannelData method
Examples
>>> import mdfreader >>> yop=mdfreader.mdf('NameOfFile') >>> yop.keys() # list channels names # list channels grouped by raster or master channel >>> yop.masterChannelList >>> yop.plot('channelName') or yop.plot({'channel1','channel2'}) >>> yop.resample(0.1) or yop.resample(channelName='master3') >>> yop.exportoCSV(sampling=0.01) >>> yop.exportNetCDF() >>> yop.exporttoHDF5() >>> yop.exporttoMatlab() >>> yop.exporttoExcel() >>> yop.exporttoXlsx() >>> yop.convertToPandas() # converts data groups into pandas dataframes >>> yop.write() # writes mdf file # drops all the channels except the one in argument >>> yop.keepChannels({'channel1','channel2','channel3'}) >>> yop.getChannelData('channelName') # returns channel numpy array
Attributes
fileName (str) file name MDFVersionNumber (int) mdf file version number masterChannelList (dict) Represents data structure: a key per master channel with corresponding value containing a list of channels One key or master channel represents then a data group having same sampling interval. multiProc (bool) Flag to request channel conversion multi processed for performance improvement. One thread per data group. file_metadata (dict) file metadata with minimum keys : author, organisation, project, subject, comment, time, date Methods
read( fileName = None, multiProc = False, channelList=None, convertAfterRead=True, filterChannelNames=False, noDataLoading=False, compression=False) reads mdf file version 3.x and 4.x write( fileName=None ) writes simple mdf file getChannelData( channelName ) returns channel numpy array convertAllChannel() converts all channel data according to CCBlock information getChannelUnit( channelName ) returns channel unit plot( channels ) Plot channels with Matplotlib resample( samplingTime = 0.1, masterChannel=None ) Resamples all data groups exportToCSV( filename = None, sampling = 0.1 ) Exports mdf data into CSV file exportToNetCDF( filename = None, sampling = None ) Exports mdf data into netcdf file exportToHDF5( filename = None, sampling = None ) Exports mdf class data structure into hdf5 file exportToMatlab( filename = None ) Exports mdf class data structure into Matlab file exportToExcel( filename = None ) Exports mdf data into excel 95 to 2003 file exportToXlsx( filename=None ) Exports mdf data into excel 2007 and 2010 file convertToPandas( sampling=None ) converts mdf data structure into pandas dataframe(s) keepChannels( channelList ) keeps only list of channels and removes the other channels mergeMdf( mdfClass ): Merges data of 2 mdf classes -
allPlot
()¶
-
convertAllChannel
()¶ Converts all channels from raw data to converted data according to CCBlock information Converted data will take more memory.
-
convertToPandas
(sampling=None)¶ converts mdf data structure into pandas dataframe(s)
Parameters: sampling : float, optional
resampling interval
Notes
One pandas dataframe is converted per data group Not adapted yet for mdf4 as it considers only time master channels
-
copy
()¶ make a shallow copy a mdf class
-
cut
(begin=None, end=None)¶ Cut data
Parameters: begin : float
beginning value in master channel from which to start cutting in all channels
end : float
ending value in master channel from which to start cutting in all channels
Notes
Use this method if whole data in mdf are using same physical or type of master channel (for instance time).
-
exportToCSV
(filename=None, sampling=None)¶ Exports mdf data into CSV file
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
sampling : float, optional
sampling interval. None by default
Notes
Data saved in CSV fille be automatically resampled as it is difficult to save in this format data not sharing same master channel Warning: this can be slow for big data, CSV is text format after all
-
exportToExcel
(filename=None)¶ Exports mdf data into excel 95 to 2003 file
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
Notes
xlwt is not fast even for small files, consider other binary formats like HDF5 or Matlab If there are more than 256 channels, data will be saved over different worksheets Also Excel 2003 is becoming rare these days, prefer using exportToXlsx
-
exportToHDF5
(filename=None, sampling=None, compression=None, compression_opts=None)¶ Exports mdf class data structure into hdf5 file
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
sampling : float, optional
sampling interval.
compression : str, optional
HDF5 compression algorithm. Valid options are ‘gzip’, ‘lzf’. gzip compression recommended for portability. szip compression not supported due to legal reasons.
compression_opts : int, optional
HDF5 gzip compression level, 0-9. Only valid if gzip compression is used. Level 4 (default) recommended for best balance between compression and time.
Notes
The maximum attributes will be stored Data structure will be similar has it is in masterChannelList attribute
-
exportToMatlab
(filename=None)¶ Export mdf data into Matlab file format 5, tentatively compressed
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
Notes
This method will dump all data into Matlab file but you will loose below information: - unit and descriptions of channel - data structure, what is corresponding master channel to a channel.
Channels might have then different lengths
-
exportToNetCDF
(filename=None, sampling=None)¶ Exports mdf data into netcdf file
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
sampling : float, optional
sampling interval.
-
exportToXlsx
(filename=None)¶ Exports mdf data into excel 2007 and 2010 file
Parameters: filename : str, optional
file name. If no name defined, it will use original mdf name and path
Notes
It is recommended to export resampled data for performances
-
getChannelData
(channelName, raw_data=False)¶ Return channel numpy array
Parameters: channelName : str
channel name
raw_data: bool
flag to return non converted data
Notes
This method is the safest to get channel data as numpy array from ‘data’ dict key might contain raw data
-
keepChannels
(channelList)¶ keeps only list of channels and removes the other channels
Parameters: channelList : list of str
list of channel names
-
mergeMdf
(mdfClass)¶ Merges data of 2 mdf classes
Parameters: mdfClass : mdf
mdf class instance to be merge with self
Notes
both classes must have been resampled, otherwise, impossible to know master channel to match create union of both channel lists and fill with Nan for unknown sections in channels
-
plot
(channel_name_list_of_list)¶ Plot channels with Matplotlib
Parameters: channel_name_list_of_list : str or list of str or list of list of str
channel name or list of channel names or list of list of channel names list of list will create multiplots
Notes
Channel description and unit will be tentatively displayed with axis labels
-
read
(fileName=None, multiProc=False, channelList=None, convertAfterRead=True, filterChannelNames=False, noDataLoading=False, compression=False, metadata=2)¶ reads mdf file version 3.x and 4.x
Parameters: fileName : str, optional
file name
multiProc : bool
flag to activate multiprocessing of channel data conversion
channelList : list of str, optional
list of channel names to be read If you use channelList, reading might be much slower but it will save you memory. Can be used to read big files
convertAfterRead : bool, optional
flag to convert channel after read, True by default If you use convertAfterRead by setting it to false, all data from channels will be kept raw, no conversion applied. If many float are stored in file, you can gain from 3 to 4 times memory footprint To calculate value from channel, you can then use method .getChannelData()
filterChannelNames : bool, optional
flag to filter long channel names from its module names separated by ‘.’
noDataLoading : bool, optional
Flag to read only file info but no data to have minimum memory use
compression : bool or str, optional
To compress data in memory using blosc or bcolz, takes cpu time if compression = int(1 to 9), uses bcolz for compression if compression = ‘blosc’, uses blosc for compression Choice given, efficiency depends of data
metadata: int, optional, default = 2
Reading metadata has impact on performance, especially for mdf 4.x using xml. 2: minimal metadata reading (mostly channel blocks) 1: used for noDataLoading 0: all metadata reading, including Source Information, Attachment, etc..
Notes
- If you keep convertAfterRead to true, you can set attribute mdf.multiProc to activate channel conversion
- in multiprocessing. Gain in reading time can be around 30% if file is big and using a lot of float channels
-
resample
(samplingTime=None, masterChannel=None)¶ Resamples all data groups into one data group having defined sampling interval or sharing same master channel
Parameters: samplingTime : float, optional
resampling interval, None by default. If None, will merge all datagroups into a unique datagroup having the highest sampling rate from all datagroups
**or**
masterChannel : str, optional
master channel name to be used for all channels
Notes
1. resampling is relatively safe for mdf3 as it contains only time series. However, mdf4 can contain also distance, angle, etc. It might make not sense to apply one resampling to several data groups that do not share same kind of master channel (like time resampling to distance or angle data groups) If several kind of data groups are used, you should better use pandas to resample
2. resampling will convert all your channels so be careful for big files and memory consumption
-
write
(fileName=None, compression=False)¶ Writes simple mdf file, same format as originally read, default is 4.x
Parameters: fileName : str, optional
Name of file If file name is not input, written file name will be the one read with appended ‘_new’ string before extension
compression : bool
Flag to store data compressed (from mdf version 4.1) If activated, will write in version 4.1 even if original file is in version 3.x
Notes
All channels will be converted, so size might be bigger than original file
-
class
mdfreader.mdfreader.
mdfinfo
() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶ Bases:
dict
Methods
clear
()copy
()fromkeys
($type, iterable[, value])Returns a new dict with keys from iterable and values equal to value. get
(k[,d])items
()keys
()listChannels
([fileName])Read MDF file blocks and returns a list of contained channels pop
(k[,d])If key is not found, d is returned if given, otherwise KeyError is raised popitem
()2-tuple; but raise KeyError if D is empty. readinfo
([fileName, fid, minimal])Reads MDF file and extracts its complete structure setdefault
(k[,d])update
([E, ]**F)If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] values
()-
fid
¶
-
fileName
¶
-
filterChannelNames
¶
-
listChannels
(fileName=None)¶ Read MDF file blocks and returns a list of contained channels
Parameters: fileName : string
file name
Returns: nameList : list of string
list of channel names
-
mdfversion
¶
-
readinfo
(fileName=None, fid=None, minimal=0)¶ Reads MDF file and extracts its complete structure
Parameters: fileName : str, optional
file name. If not input, uses fileName attribute
fid : file identifier, optional
minimal : int
0 will load every metadata 1 will load DG, CG, CN and CC 2 will load only DG
-
zipfile
¶
-