pyqstrat package¶
Submodules¶
pyqstrat.pq_utils module¶
-
class
pyqstrat.pq_utils.
ReasonCode
[source]¶ Bases:
object
A class containing constants for predefined order reason codes. Prefer these predefined reason codes if they suit the reason you are creating your order. Otherwise, use your own string.
-
BACKTEST_END
= 'backtest end'¶
-
ENTER_LONG
= 'enter long'¶
-
ENTER_SHORT
= 'enter short'¶
-
EXIT_LONG
= 'exit long'¶
-
EXIT_SHORT
= 'exit short'¶
-
MARKER_PROPERTIES
= {'backtest end': {'color': 'green', 'size': 50, 'symbol': '*'}, 'enter long': {'color': 'blue', 'size': 50, 'symbol': 'P'}, 'enter short': {'color': 'red', 'size': 50, 'symbol': 'P'}, 'exit long': {'color': 'blue', 'size': 50, 'symbol': 'X'}, 'exit short': {'color': 'red', 'size': 50, 'symbol': 'X'}, 'none': {'color': 'green', 'size': 50, 'symbol': 'o'}, 'roll future': {'color': 'green', 'size': 50, 'symbol': '>'}}¶
-
NONE
= 'none'¶
-
ROLL_FUTURE
= 'roll future'¶
-
-
pyqstrat.pq_utils.
date_2_num
(d)[source]¶ Adopted from matplotlib.mdates.date2num so we don’t have to add a dependency on matplotlib here
-
pyqstrat.pq_utils.
decode_future_code
(future_code, as_str=True)[source]¶ Given a future code such as “X”, return either the month number (from 1 - 12) or the month abbreviation, such as “nov”
Parameters: - future_code (str) – the one letter future code
- as_str (bool, optional) – If set, we return the abbreviation, if not, we return the month number
>>> decode_future_code('X', as_str = False) 11 >>> decode_future_code('X') 'nov'
-
pyqstrat.pq_utils.
get_empty_np_value
(np_dtype)[source]¶ Get empty value for a given numpy datatype >>> a = np.array([‘2018-01-01’, ‘2018-01-03’], dtype = ‘M8[D]’) >>> get_empty_np_value(a.dtype) numpy.datetime64(‘NaT’)
-
pyqstrat.pq_utils.
get_fut_code
(month)[source]¶ Given a month number such as 3 for March, return the future code for it, e.g. H >>> get_fut_code(3) ‘H’
-
pyqstrat.pq_utils.
has_display
()[source]¶ If we are running in unit test mode or on a server, then don’t try to draw graphs, etc.
-
pyqstrat.pq_utils.
infer_compression
(input_filename)[source]¶ Infers compression for a file from its suffix. For example, given “/tmp/hello.gz”, this will return “gzip” >>> infer_compression(“/tmp/hello.gz”) ‘gzip’ >>> infer_compression(“/tmp/abc.txt”) is None True
-
pyqstrat.pq_utils.
infer_frequency
(dates)[source]¶ Returns most common frequency of date differences as a fraction of days :param dates: A numpy array of monotonically increasing datetime64
>>> dates = np.array(['2018-01-01 11:00:00', '2018-01-01 11:15:00', '2018-01-01 11:30:00', '2018-01-01 11:35:00'], dtype = 'M8[ns]') >>> print(round(infer_frequency(dates), 8)) 0.01041667
-
pyqstrat.pq_utils.
is_newer
(filename, ref_filename)[source]¶ whether filename ctime (modfication time) is newer than ref_filename or either file does not exist >>> import time >>> import tempfile >>> temp_dir = tempfile.gettempdir() >>> touch(f’{temp_dir}/x.txt’) >>> time.sleep(0.1) >>> touch(f’{temp_dir}/y.txt’) >>> is_newer(f’{temp_dir}/y.txt’, f’{temp_dir}/x.txt’) True >>> touch(f’{temp_dir}/y.txt’) >>> time.sleep(0.1) >>> touch(f’{temp_dir}/x.txt’) >>> is_newer(f’{temp_dir}/y.txt’, f’{temp_dir}/x.txt’) False
-
pyqstrat.pq_utils.
millis_since_epoch
(dt)[source]¶ Given a python datetime, return number of milliseconds between the unix epoch and the datetime. Returns a float since it can contain fractions of milliseconds as well >>> millis_since_epoch(datetime.datetime(2018, 1, 1)) 1514764800000.0
-
pyqstrat.pq_utils.
monotonically_increasing
(array)[source]¶ Returns True if the array is monotonically_increasing, False otherwise
>>> monotonically_increasing(np.array(['2018-01-02', '2018-01-03'], dtype = 'M8[D]')) True >>> monotonically_increasing(np.array(['2018-01-02', '2018-01-02'], dtype = 'M8[D]')) False
-
pyqstrat.pq_utils.
np_get_index
(array, value)[source]¶ Get index of a value in a numpy array. Returns -1 if the value does not exist.
-
pyqstrat.pq_utils.
resample_ohlc
(df, sampling_frequency, resample_funcs=None)[source]¶ Downsample OHLCV data using sampling frequency
Parameters: - df (pd.DataFrame) – Must contain an index of numpy datetime64 type which is monotonically increasing
- sampling_frequency (str) – See pandas frequency strings
- (dict of str (resample_funcs) – int) : a dictionary of column name -> resampling function for any columns that are custom defined. Default None. If there is no entry for a custom column, defaults to ‘last’ for that column
Returns: Resampled dataframe
Return type: pd.DataFrame
>>> df = pd.DataFrame({'date' : np.array(['2018-01-08 15:00:00', '2018-01-09 13:30:00', '2018-01-09 15:00:00', '2018-01-11 15:00:00'], dtype = 'M8[ns]'), ... 'o' : np.array([8.9, 9.1, 9.3, 8.6]), ... 'h' : np.array([9.0, 9.3, 9.4, 8.7]), ... 'l' : np.array([8.8, 9.0, 9.2, 8.4]), ... 'c' : np.array([8.95, 9.2, 9.35, 8.5]), ... 'v' : np.array([200, 100, 150, 300]), ... 'x' : np.array([300, 200, 100, 400]) ... }) >>> df['vwap'] = 0.5 * (df.l + df.h) >>> df.set_index('date', inplace = True) >>> resample_ohlc(df, sampling_frequency = 'D', resample_funcs={'x' : lambda df, sampling_frequency : df.x.resample(sampling_frequency).agg(np.mean)}) date o h l c v x vwap 0 2018-01-08 8.9 9 8.8 8.95 200 300 8.9 1 2018-01-09 9.1 9.4 9 9.35 250 150 9.24 2 2018-01-10 nan nan nan nan 0 nan nan 3 2018-01-11 8.6 8.7 8.4 8.5 300 400 8.55
-
pyqstrat.pq_utils.
resample_ts
(dates, values, sampling_frequency)[source]¶ Downsample a pair of dates and values using sampling frequency, using the last value if it does not exist at bin edge. See pandas.Series.resample
Parameters: - dates – a numpy datetime64 array
- values – a numpy array
- sampling_frequency – See pandas frequency strings
-
pyqstrat.pq_utils.
resample_vwap
(df, sampling_frequency)[source]¶ Compute weighted average of vwap given higher frequency vwap and volume
-
pyqstrat.pq_utils.
series_to_array
(series)[source]¶ Convert a pandas series to a numpy array. If the object is not a pandas Series return it back unchanged
-
pyqstrat.pq_utils.
set_defaults
(df_float_sf=4, df_display_max_rows=200, df_display_max_columns=99, np_seterr='raise', plot_style='ggplot', mpl_figsize=(8, 6))[source]¶ Set some display defaults to make it easier to view dataframes and graphs.
Parameters: - df_float_sf – Number of significant figures to show in dataframes (default 4). Set to None to use pandas defaults
- df_display_max_rows – Number of rows to display for pandas dataframes when you print them (default 200). Set to None to use pandas defaults
- df_display_max_columns – Number of columns to display for pandas dataframes when you print them (default 99). Set to None to use pandas defaults
- np_seterr – Error mode for numpy warnings. See numpy seterr function for details. Set to None to use numpy defaults
- plot_style – Style for matplotlib plots. Set to None to use default plot style.
- mpl_figsize – Default figure size to use when displaying matplotlib plots (default 8,6). Set to None to use defaults
-
pyqstrat.pq_utils.
shift_np
(array, n, fill_value=None)[source]¶ Similar to pandas.Series.shift but works on numpy arrays.
Parameters: - array – The numpy array to shift
- n – Number of places to shift, can be positive or negative
- fill_value – After shifting, there will be empty slots left in the array. If set, fill these with fill_value. If fill_value is set to None (default), we will fill these with False for boolean arrays, np.nan for floats
-
pyqstrat.pq_utils.
str2date
(s)[source]¶ Converts a string like “2008-01-15 15:00:00” to a numpy datetime64. If s is not a string, return s back
-
pyqstrat.pq_utils.
strtup2date
(tup)[source]¶ Converts a string tuple like (“2008-01-15”, “2009-01-16”) to a numpy datetime64 tuple. If the tuple does not contain strings, return it back unchanged
-
pyqstrat.pq_utils.
to_csv
(df, file_name, index=False, compress=False, *args, **kwargs)[source]¶ Creates a temporary file then renames to the permanent file so we don’t have half written files. Also optionally compresses using the xz algorithm
pyqstrat.pq_types module¶
-
class
pyqstrat.pq_types.
Contract
(symbol, multiplier=1.0)[source]¶ Bases:
object
A Contract can be a real or virtual instrument. For example, for futures you may wish to create a single continous contract instead of a contract for each future series
-
class
pyqstrat.pq_types.
Trade
(symbol, date, qty, price, fee=0.0, commission=0.0, order=None)[source]¶ Bases:
object
-
__init__
(symbol, date, qty, price, fee=0.0, commission=0.0, order=None)[source]¶ Args: symbol: a string date: Trade execution datetime qty: Number of contracts or shares filled price: Trade price fee: Fees paid to brokers or others. Default 0 commision: Commission paid to brokers or others. Default 0 order: A reference to the order that created this trade. Default None
-
pyqstrat.holiday_calendars module¶
-
class
pyqstrat.holiday_calendars.
Calendar
(holidays)[source]¶ Bases:
object
-
EUREX
= 'eurex'¶
-
NYSE
= 'nyse'¶
-
__init__
(holidays)[source]¶ Do not use this function directly. Use Calendar.get_calendar instead :param holidays: holidays for this calendar, excluding weekends :type holidays: np.array of datetime64[D]
-
add_calendar
(holidays)[source]¶ Add a trading calendar to the class level calendars dict
Parameters: - exchange_name (str) – Name of the exchange.
- holidays (np.array of datetime64[D]) – holidays for this exchange, excluding weekends
-
add_trading_days
(start, num_days, roll='raise')[source]¶ Adds trading days to a start date
Parameters: - start – np.datetime64 or str or datetime
- num_days (int) – number of trading days to add
- roll (str, optional) – one of ‘raise’, ‘nat’, ‘forward’, ‘following’, ‘backward’, ‘preceding’, ‘modifiedfollowing’, ‘modifiedpreceding’} From numpy documentation: How to treat dates that do not fall on a valid day. The default is ‘raise’. ‘raise’ means to raise an exception for an invalid day. ‘nat’ means to return a NaT (not-a-time) for an invalid day. ‘forward’ and ‘following’ mean to take the first valid day later in time. ‘backward’ and ‘preceding’ mean to take the first valid day earlier in time. ‘modifiedfollowing’ means to take the first valid day later in time unless it is across a Month boundary, in which case to take the first valid day earlier in time. ‘modifiedpreceding’ means to take the first valid day earlier in time unless it is across a Month boundary, in which case to take the first valid day later in time.
Returns: The date num_days trading days after start
Return type: np.datetime64[D]
>>> calendar = Calendar.get_calendar(Calendar.NYSE) >>> calendar.add_trading_days(datetime.date(2015, 12, 24), 1) numpy.datetime64('2015-12-28') >>> calendar.add_trading_days(np.datetime64('2017-04-15'), 0, roll = 'preceding') # 4/14/2017 is a Friday and a holiday numpy.datetime64('2017-04-13') >>> calendar.add_trading_days(np.datetime64('2017-04-08'), 0, roll = 'preceding') # 4/7/2017 is a Friday and not a holiday numpy.datetime64('2017-04-07')
-
get_calendar
()[source]¶ Get a calendar object for the given exchange:
Parameters: - exchange_name (str) – The exchange for which you want a calendar. Calendar.NYSE, Calendar.EUREX are predefined.
- you want to add a new calendar, use the add_calendar class level function (If) –
Returns: The calendar object
Return type:
-
get_trading_days
(start, end, include_first=False, include_last=True)[source]¶ Get back a list of numpy dates that are trading days between the start and end
>>> nyse = Calendar.get_calendar(Calendar.NYSE) >>> nyse.get_trading_days('2005-01-01', '2005-01-08') array(['2005-01-03', '2005-01-04', '2005-01-05', '2005-01-06', '2005-01-07'], dtype='datetime64[D]') >>> nyse.get_trading_days(datetime.date(2005, 1, 1), datetime.date(2005, 2, 1)) array(['2005-01-03', '2005-01-04', '2005-01-05', '2005-01-06', '2005-01-07', '2005-01-10', '2005-01-11', '2005-01-12', '2005-01-13', '2005-01-14', '2005-01-18', '2005-01-19', '2005-01-20', '2005-01-21', '2005-01-24', '2005-01-25', '2005-01-26', '2005-01-27', '2005-01-28', '2005-01-31', '2005-02-01'], dtype='datetime64[D]') >>> nyse.get_trading_days(datetime.date(2016, 1, 5), datetime.date(2016, 1, 29), include_last = False) array(['2016-01-06', '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12', '2016-01-13', '2016-01-14', '2016-01-15', '2016-01-19', '2016-01-20', '2016-01-21', '2016-01-22', '2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28'], dtype='datetime64[D]') >>> nyse.get_trading_days('2017-07-04', '2017-07-08', include_first = False) array(['2017-07-05', '2017-07-06', '2017-07-07'], dtype='datetime64[D]') >>> nyse.get_trading_days(np.datetime64('2017-07-04'), np.datetime64('2017-07-08'), include_first = False) array(['2017-07-05', '2017-07-06', '2017-07-07'], dtype='datetime64[D]')
-
is_trading_day
(dates)[source]¶ Returns whether the date is not a holiday or a weekend
Parameters: dates – str or datetime.datetime or np.datetime64[D] or numpy array of np.datetime64[D] Returns: Whether this date is a trading day Return type: bool >>> import datetime >>> eurex = Calendar.get_calendar(Calendar.EUREX) >>> eurex.is_trading_day('2016-12-25') False >>> eurex.is_trading_day(datetime.date(2016, 12, 22)) True >>> nyse = Calendar.get_calendar(Calendar.NYSE) >>> nyse.is_trading_day('2017-04-01') # Weekend False >>> nyse.is_trading_day(np.arange('2017-04-01', '2017-04-09', dtype = np.datetime64)) # doctest:+ELLIPSIS array([False, False, True, True, True, True, True, False]...)
-
num_trading_days
(start, end, include_first=False, include_last=True)[source]¶ Count the number of trading days between two date series including those two dates You can pass in a string like ‘2009-01-01’ or a python date or a pandas series for start and end
>>> eurex = Calendar.get_calendar(Calendar.EUREX) >>> eurex.num_trading_days('2009-01-01', '2011-12-31') 772.0 >>> dates = pd.date_range('20130101',periods=8) >>> increments = np.array([5, 0, 3, 9, 4, 10, 15, 29]) >>> dates2 = dates + increments * 1000000000000000 >>> df = pd.DataFrame({'x': dates, 'y' : dates2}) >>> df.iloc[4]['x'] = np.nan >>> df.iloc[6]['y'] = np.nan >>> nyse = Calendar.get_calendar(Calendar.NYSE) >>> np.set_printoptions(formatter = {'float' : lambda x : f'{x:.1f}'}) # After numpy 1.13 positive floats don't have a leading space for sign >>> print(nyse.num_trading_days(df.x, df.y)) [39.0 0.0 23.0 71.0 nan 80.0 nan 232.0]
-
pyqstrat.marketdata module¶
-
class
pyqstrat.marketdata.
MarketData
(dates, c, o=None, h=None, l=None, v=None, vwap=None, additional_arrays=None, resample_funcs=None, fill_values=None)[source]¶ Bases:
object
- Used to store OHLCV bars, and any additional time series data you want to use to simulate orders and executions.
- You must at least supply dates and close prices. All other fields are optional.
-
dates
¶ A numpy datetime array with the datetime for each bar. Must be monotonically increasing.
-
c
¶ A numpy float array with close prices for the bar.
-
o
¶ A numpy float array with open prices . Default None
-
h
¶ A numpy float array with high prices. Default None
-
l
¶ A numpy float array with high prices. Default None
-
v
¶ A numpy integer array with volume for the bar. Default None
-
vwap
¶ A numpy float array with the volume weighted average price for the bar. Default None
-
additional_arrays
¶ A dictionary of name -> numpy array you want to add. Default None
-
resample_funcs
¶ A dictionary of functions for resampling each additional array. Default None.
-
fill_funcs
¶ A dictionary of functions for filling empty rows when we add dates. Default None.
-
__init__
(dates, c, o=None, h=None, l=None, v=None, vwap=None, additional_arrays=None, resample_funcs=None, fill_values=None)[source]¶ Zeroes in o, h, l, c are set to nan
-
add_dates
(dates)[source]¶ Adds new dates to a market data object. If fill_values was specified we use that to fill in values for any columns for new dates that are not the same as the old dates.
Parameters: dates (np.array of np.datetime64) – New dates to add. Does not have to be sorted or unique >>> dates = np.array(['2018-01-05', '2018-01-09', '2018-01-10'], dtype = 'M8[ns]') >>> c = np.array([8.1, 8.2, 8.3]) >>> o = np.array([9, 10, 11]) >>> additional_arrays = {'x' : np.array([5.1, 5.3, 5.5])} >>> fill_values = {'x' : 0} >>> md = MarketData(dates, c, o, additional_arrays = additional_arrays, fill_values = fill_values) >>> new_dates = np.array(['2018-01-07', '2018-01-09'], dtype = 'M8[ns]') >>> md.add_dates(new_dates) >>> print(md.dates) ['2018-01-05T00:00:00.000000000' '2018-01-07T00:00:00.000000000' '2018-01-09T00:00:00.000000000' '2018-01-10T00:00:00.000000000'] >>> np.set_printoptions(formatter = {'float' : lambda x : f'{x:.4f}'}) # After numpy 1.13 positive floats don't have a leading space for sign >>> print(md.o, md.c, md.x) [9.0000 nan 10.0000 11.0000] [8.1000 nan 8.2000 8.3000] [5.1000 0.0000 5.3000 5.5000]
-
describe
(warn_std=10, time_distribution_frequency='15 min', print_time_distribution=False)[source]¶ Describe the bars. Shows an overview, errors and warnings for the bar data. This is a good function to use before running any backtests on a set of bar data.
Parameters: - warn_std – See warning function
- time_distribution_frequency – See time_distribution function
- print_time_distribution – Whether to print the time distribution in addition to plotting it.
-
errors
(display=True)[source]¶ Returns a dataframe indicating any highs that are lower than opens, closes, lows or lows that are higher than other columns Also includes any ohlcv values that are negative
-
overview
(display=True)[source]¶ Returns a dataframe showing basic information about the data, including count, number and percent missing, min, max
Parameters: display – Whether to print out the warning dataframe as well as returning it
-
plot
(figsize=(15, 8), date_range=None, sampling_frequency=None, title='Price / Volume')[source]¶ Plot a candlestick or line plot depending on whether we have ohlc data or just close prices
Parameters: - figsize – Size of the figure (default (15,8))
- date_range – A tuple of strings or numpy datetimes for plotting a smaller sample of the data, e.g. (“2018-01-01”, “2018-01-06”)
- sampling_frequency – Downsample before plotting. See pandas frequency strings for possible values.
- title – Title of the graph, default “Price / Volume”
-
resample
(sampling_frequency, inplace=False)[source]¶ Downsample the OHLCV data into a new bar frequency
Parameters: - sampling_frequency – See sampling frequency in pandas
- inplace – If set to False, don’t modify this object, return a new object instead.
-
time_distribution
(frequency='15 minutes', display=True, plot=True, figsize=None)[source]¶ Return a dataframe with the time distribution of the bars
Parameters: - frequency – The width of each bin (default “15 minutes”). You can use hours or days as well.
- display – Whether to display the data in addition to returning it.
- plot – Whether to plot the data in addition to returning it.
- figsize – If plot is set, optional figure size for the plot (default (20,8))
-
warnings
(warn_std=10, display=True)[source]¶ Returns a dataframe indicating any values where the bar over bar change is more than warn_std standard deviations.
Parameters: - warn_std – Number of standard deviations to use as a threshold (default 10)
- display – Whether to print out the warning dataframe as well as returning it
-
class
pyqstrat.marketdata.
MarketDataCollection
(symbols=None, marketdata_list=None)[source]¶ Bases:
object
Used to store a set of market data linking symbol -> MarketData
-
pyqstrat.marketdata.
roll_futures
(md, date_func, condition_func, expiries=None, return_full_df=False)[source]¶ Construct a continuous futures dataframe with one row per datetime given rolling logic
Parameters: - md – A dataframe containing the columns ‘date’, ‘series’, and any other market data, for example, ohlcv data. Date can contain time for sub-daily bars. The series column must contain a different string name for each futures series, e.g. SEP2018, DEC2018, etc.
- date_func – A function that takes the market data object as an input and returns a numpy array of booleans True indicates that the future should be rolled on this date if the condition specified in condition_func is met. This function can assume that we have all the columns in the original market data object plus the same columns suffixed with _next for the potential series to roll over to.
- condition_func – A function that takes the market data object as input and returns a numpy array of booleans. True indicates that we should try to roll the future at that row.
- expiries – An optional dataframe with 2 columns, ‘series’ and ‘expiry’. This should have one row per future series indicating that future’s expiry date. If you don’t pass this in, the function will assume that the expiry column is present in the original dataframe.
- return_full_df – If set, will return the datframe without removing extra dates so you can use your own logic for rolling, including the _next columns and the roll flag
Returns: - A pandas DataFrame with one row per date, which contains the columns in the original md DataFrame and the same columns suffixed with _next
representing the series we want to roll to. There is also a column called roll_flag which is set to True whenever the date and roll condition functions are met.
>>> md = pd.DataFrame({'date' : np.concatenate((np.arange(np.datetime64('2018-03-11'), np.datetime64('2018-03-16')), ... np.arange(np.datetime64('2018-03-11'), np.datetime64('2018-03-16')))), ... 'c' : [10, 10.1, 10.2, 10.3, 10.4] + [10.35, 10.45, 10.55, 10.65, 10.75], ... 'v' : [200, 200, 150, 100, 100] + [100, 50, 200, 250, 300], ... 'series' : ['MAR2018'] * 5 + ['JUN2018'] * 5})[['date','series', 'c', 'v']] >>> expiries = pd.Series(np.array(['2018-03-15', '2018-06-15'], dtype = 'M8[D]'), index = ['MAR2018', 'JUN2018'], name = "expiry") >>> date_func = lambda md : md.expiry - md.date <= np.timedelta64(3, 'D') >>> condition_func = lambda md : md.v_next > md.v
>>> df = roll_futures(md, date_func, condition_func, expiries) >>> df[df.series == 'MAR2018'].date.max() == np.datetime64('2018-03-14') True >>> df[df.series == 'JUN2018'].date.max() == np.datetime64('2018-03-15') True
pyqstrat.account module¶
-
class
pyqstrat.account.
Account
(contracts, marketdata_collection, starting_equity=1000000.0, calc_frequency='D')[source]¶ Bases:
object
An Account calculates pnl for a set of contracts
-
__init__
(contracts, marketdata_collection, starting_equity=1000000.0, calc_frequency='D')[source]¶ Parameters: - contracts (list of Contract) – Contracts that we want to compute PNL for
- marketdata_collection (MarketDataCollection) – MarketData corresponding to contracts
- starting_equity (float, optional) – Starting equity in account currency. Default 1.e6
- calc_frequency (str, optional) – Account will calculate pnl at this frequency. Default ‘D’ for daily
>>> from pyqstrat.marketdata import MarketData, MarketDataCollection >>> from pyqstrat.pq_types import Contract >>> dates = np.array(['2018-01-01', '2018-01-02'], dtype = 'M8[D]') >>> account = Account([Contract("IBM")], MarketDataCollection(["IBM"], [MarketData(dates, [8.1, 8.2])])) >>> np.set_printoptions(formatter = {'float' : lambda x : f'{x:.4f}'}) # After numpy 1.13 positive floats don't have a leading space for sign >>> print(account.marketdata['IBM'].c) [8.1000 8.2000]
-
calc
(i)[source]¶ Computes P&L and stores it internally for all contracts.
Parameters: i – Index to compute P&L at. Account remembers the last index it computed P&L up to and will compute P&L between these two indices
-
df_pnl
(symbol=None)[source]¶ Returns a dataframe with P&L columns. If symbol is set to None (default), sums up P&L across symbols
-
df_trades
(symbol=None, start_date=None, end_date=None)[source]¶ Returns a dataframe with data from trades with the given symbol and with trade date between (and including) start date and end date if they are specified. If symbol is None, trades for all symbols are returned
-
equity
(date)[source]¶ Returns equity in this account in Account currency. Will cause calculation if Account has not previously calculated up to this date
-
position
(symbol, date)[source]¶ Returns position for a symbol at a given date in number of contracts or shares. Will cause calculation if Account has not previously calculated up to this date
-
-
class
pyqstrat.account.
ContractPNL
(contract, marketdata)[source]¶ Bases:
object
Computes pnl for a single contract over time given trades and market data
-
calc
(prev_i, i)[source]¶ Compute pnl and store it internally
Parameters: - prev_i – Start index to compute pnl from
- i – End index to compute pnl to
-
trades
(start_date=None, end_date=None)[source]¶ Get a list of trades
Parameters: - start_date – A string or numpy datetime64. Trades with trade dates >= start_date will be returned. Default None
- end_date – A string or numpy datetime64. Trades with trade dates <= end_date will be returned. Default None
-
pyqstrat.orders module¶
-
class
pyqstrat.orders.
LimitOrder
(symbol, date, qty, limit_price, reason_code='none', status='open')[source]¶ Bases:
object
-
__init__
(symbol, date, qty, limit_price, reason_code='none', status='open')[source]¶ Parameters: - symbol – A string
- date – A numpy datetime indicating the time the order was placed
- qty – Number of contracts or shares. Use a negative quantity for sell orders
- limit_price – Limit price (float)
- reason_code – A string representing the reason this order was created. Prefer a predefined constant from the ReasonCode class if it matches your reason for creating this order.
- status – Status of the order, “open”, “filled”, etc. (default “open”)
-
-
class
pyqstrat.orders.
MarketOrder
(symbol, date, qty, reason_code='none', status='open')[source]¶ Bases:
object
-
__init__
(symbol, date, qty, reason_code='none', status='open')[source]¶ Parameters: - symbol – A string
- date – A numpy datetime indicating the time the order was placed
- qty – Number of contracts or shares. Use a negative quantity for sell orders
- reason_code – A string representing the reason this order was created. Prefer a predefined constant from the ReasonCode class if it matches your reason for creating this order.
- status – Status of the order, “open”, “filled”, etc. (default “open”)
-
-
class
pyqstrat.orders.
RollOrder
(symbol, date, close_qty, reopen_qty, reason_code='roll future', status='open')[source]¶ Bases:
object
A roll order is used to roll a future from one series to the next. It represents a sell of one future and the buying of another future.
-
__init__
(symbol, date, close_qty, reopen_qty, reason_code='roll future', status='open')[source]¶ Parameters: - symbol – A string
- date – A numpy datetime indicating the time the order was placed
- close_qty – Quantity of the future you are rolling
- reopen_qty – Quantity of the future you are rolling to
- reason_code – A string representing the reason this order was created. Prefer a predefined constant from the ReasonCode class if it matches your reason for creating this order.
- status – Status of the order, “open”, “filled”, etc. (default “open”)
-
-
class
pyqstrat.orders.
StopLimitOrder
(symbol, date, qty, trigger_price, limit_price=nan, reason_code='none', status='open')[source]¶ Bases:
object
Used for stop loss or stop limit orders. The order is triggered when price goes above or below trigger price, depending on whether this is a short or long order. Becomes either a market or limit order at that point, depending on whether you set the limit price or not.
-
__init__
(symbol, date, qty, trigger_price, limit_price=nan, reason_code='none', status='open')[source]¶ Parameters: - symbol – A string
- date – A numpy datetime indicating the time the order was placed
- qty – Number of contracts or shares. Use a negative value for sell orders
- trigger_price – Order becomes a market or limit order if price crosses trigger_price.
- limit_price – If not set (default), order becomes a market order when price crosses trigger price. Otherwise it becomes a limit order
- reason_code – A string representing the reason this order was created. Prefer a predefined constant from the ReasonCode class if it matches your reason for creating this order.
- status – Status of the order, “open”, “filled”, etc. (default “open”)
-
pyqstrat.strategy module¶
-
class
pyqstrat.strategy.
Strategy
(contracts, marketdata_collection, starting_equity=1000000.0, calc_frequency='D', additional_order_dates=None, additional_trade_dates=None)[source]¶ Bases:
object
-
__init__
(contracts, marketdata_collection, starting_equity=1000000.0, calc_frequency='D', additional_order_dates=None, additional_trade_dates=None)[source]¶ Parameters: - contracts (list of Contract) – The contracts we will potentially trade
- starting_equity (float, optional) – Starting equity in Strategy currency. Default 1.e6
- calc_frequency (str, optional) – How often P&L is calculated. Default is ‘D’ for daily
- additional_account_dates (np.array of np.datetime64, optional) – If present, we check for orders on these dates. Default None
- additional_tradedates (np.array of np.datetime64, optional) – If present, we check for trades on these dates. Default None
-
add_indicator
(name, indicator_function)[source]¶ Parameters: - name – Name of the indicator
- indicator_function – A function taking a MarketData object and returning a numpy array containing indicator values. The return array must have the same length as the MarketData object
-
add_market_sim
(market_sim_function, symbols=None)[source]¶ Add a market simulator. A market simulator takes a list of Orders as input and returns a list of Trade objects.
Parameters: - market_sim_function – A function that takes a list of Orders and MarketData as input and returns a list of Trade objects
- symbols – A list of the symbols that this market_sim_function applies to. If None (default) it will apply to all symbols
-
add_rule
(name, rule_function, signal_name, sig_true_values=None)[source]¶ Add a trading rule
Parameters: - name (str) – Name of the trading rule
- rule_function (function) – A trading rule function that returns a list of Orders
- signal_name (str) – The strategy will call the trading rule function when the signal with this name matches sig_true_values
- sig_true_values (numpy array, optional) – If the signal value at a bar is equal to one of these values, the Strategy will call the trading rule function. Default [TRUE]
-
add_signal
(name, signal_function)[source]¶ Parameters: - name – Name of the signal
- signal_function – A function taking a MarketData object and a dictionary of indicator value arrays as input and returning a numpy array containing signal values. The return array must have the same length as the MarketData object
-
df_data
(symbols=None, add_pnl=True, start_date=None, end_date=None)[source]¶ Add indicators and signals to end of market data and return as a pandas dataframe.
Parameters: - symbols – list of symbols to include. All if set to None (default)
- add_pnl – If True (default), include P&L columns in dataframe
- start_date – string or numpy datetime64. Default None
- end_date – string or numpy datetime64: Default None
-
df_orders
(symbol=None, start_date=None, end_date=None)[source]¶ Returns a dataframe with data from orders with the given symbol and with order date between (and including) start date and end date if they are specified. If symbol is None, orders for all symbols are returned
-
df_pnl
(symbol=None)[source]¶ Returns a dataframe with P&L columns. If symbol is set to None (default), sums up P&L across symbols
-
df_returns
(symbol=None, sampling_frequency='D')[source]¶ Return a dataframe of returns and equity indexed by date.
Parameters: - symbol – The symbol to get returns for. If set to None (default), this returns the sum of PNL for all symbols
- sampling_frequency – Downsampling frequency. Default is None. See pandas frequency strings for possible values
-
df_trades
(symbol=None, start_date=None, end_date=None)[source]¶ Returns a dataframe with data from trades with the given symbol and with trade date between (and including) start date and end date if they are specified. If symbol is None, trades for all symbols are returned
-
evaluate_returns
(symbol=None, plot=True, float_precision=4)[source]¶ Returns a dictionary of common return metrics.
Parameters: - symbol (str) – Date frequency. Default ‘D’ for daily so we downsample to daily returns before computing metrics
- plot (bool) – If set to True, display plots of equity, drawdowns and returns. Default False
- float_precision (float, optional) – Number of significant figures to show in returns. Default 4
-
orders
(symbol=None, start_date=None, end_date=None)[source]¶ Returns a list of orders with the given symbol and with order date between (and including) start date and end date if they are specified. If symbol is None orders for all symbols are returned
-
plot
(symbols=None, md_columns='c', pnl_columns='equity', title=None, figsize=(20, 15), date_range=None, date_format=None, sampling_frequency=None, trade_marker_properties=None, hspace=0.15)[source]¶ Plot indicators, signals, trades, position, pnl
Parameters: - symbols – List of symbols or None (default) for all symbols
- md_columns – List of columns of market data to plot. Default is ‘c’ for close price. You can set this to ‘ohlcv’ if you want to plot a candlestick of OHLCV data
- pnl_columns – List of P&L columns to plot. Default is ‘equity’
- title – Title of plot (None)
- figsize – Figure size. Default is (20, 15)
- date_range – Tuple of strings or datetime64, e.g. (“2018-01-01”, “2018-04-18 15:00”) to restrict the graph. Default None
- date_format – Date format for tick labels on x axis. If set to None (default), will be selected based on date range. See matplotlib date format strings
- sampling_frequency – Downsampling frequency. Default is None. The graph may get too busy if you have too many bars of data, in which case you may want to downsample before plotting. See pandas frequency strings for possible values
- trade_marker_properties – A dictionary of order reason code -> marker shape, marker size, marker color for plotting trades with different reason codes. Default is None in which case the dictionary from the ReasonCode class is used
- hspace – Height (vertical) space between subplots. Default is 0.15
-
plot_returns
(symbol=None)[source]¶ Display plots of equity, drawdowns and returns for the given symbol or for all symbols if symbol is None (default)
-
run_indicators
(indicator_names=None, symbols=None)[source]¶ Calculate values of the indicators specified and store them.
Parameters: - indicator_names – List of indicator names. If None (default) run all indicators
- symbols – List of symbols to run these indicators for. If None (default) use all symbols
-
run_rules
(rule_names=None, symbols=None, start_date=None, end_date=None)[source]¶ Run trading rules.
Parameters: - rule_names – List of rule names. If None (default) run all rules
- symbols – List of symbols to run these signals for. If None (default) use all symbols
- start_date – Run rules starting from this date. Default None
- end_date – Don’t run rules after this date. Default None
-
pyqstrat.portfolio module¶
-
class
pyqstrat.portfolio.
Portfolio
(name='main')[source]¶ Bases:
object
A portfolio contains one or more strategies that run concurrently so you can test running strategies that are uncorrelated together.
-
add_strategy
(name, strategy)[source]¶ Parameters: - name – Name of the strategy
- strategy – Strategy object
-
df_returns
(sampling_frequency='D', strategy_names=None)[source]¶ Return dataframe containing equity and returns with a date index. Equity and returns are combined from all strategies passed in.
Parameters: - sampling_frequency – Date frequency for rows. Default ‘D’ for daily so we will have one row per day
- strategy_names – A list of strategy names. By default this is set to None and we use all strategies.
-
evaluate_returns
(sampling_frequency='D', strategy_names=None, plot=True, float_precision=4)[source]¶ Returns a dictionary of common return metrics.
Parameters: - sampling_frequency – Date frequency. Default ‘D’ for daily so we downsample to daily returns before computing metrics
- strategy_names – A list of strategy names. By default this is set to None and we use all strategies.
- plot – If set to True, display plots of equity, drawdowns and returns. Default False
- float_precision – Number of significant figures to show in returns. Default 4
-
plot
(sampling_frequency='D', strategy_names=None)[source]¶ Display plots of equity, drawdowns and returns
Parameters: - sampling_frequency – Date frequency. Default ‘D’ for daily so we downsample to daily returns before computing metrics
- strategy_names – A list of strategy names. By default this is set to None and we use all strategies.
-
run
(strategy_names=None, start_date=None, end_date=None)[source]¶ Run indicators, signals and rules.
Parameters: - strategy_names – A list of strategy names. By default this is set to None and we use all strategies.
- start_date – Run rules starting from this date. Sometimes we have a few strategies in a portfolio that need different lead times before they are ready to trade so you can set this so they are all ready by this date. Default None
- end_date – Don’t run rules after this date. Default None
-
run_indicators
(strategy_names=None)[source]¶ Compute indicators for the strategies specified
Parameters: strategy_names – A list of strategy names. By default this is set to None and we use all strategies.
-
pyqstrat.optimize module¶
-
class
pyqstrat.optimize.
Experiment
(suggestion, cost, other_costs)[source]¶ Bases:
object
An Experiment stores a suggestion and its result
-
suggestion
¶ A dictionary of variable name -> value
-
cost
¶ A float representing output of the function we are testing with this suggestion as input.
-
other_costs
¶ A dictionary of other results we want to store and look at later.
-
-
class
pyqstrat.optimize.
Optimizer
(name, generator, cost_func, max_processes=None)[source]¶ Bases:
object
Optimizer is used to optimize parameters for a strategy.
-
__init__
(name, generator, cost_func, max_processes=None)[source]¶ Parameters: - name – string used to display title in plotting, etc.
- generator – A generator (see Python Generators) that takes no inputs and yields a list of dictionaries with parameter name -> parameter value.
- cost_func – A function that takes a dictionary of parameter name -> parameter value as input and outputs cost for that set of parameters.
- max_processes – If not set, the Optimizer will look at the number of CPU cores on your machine to figure out how many processes to run.
-
df_experiments
(sort_column='cost', ascending=True)[source]¶ Returns a dataframe containing experiment data, sorted by sort_column (default “cost”)
-
experiment_list
(sort_order='lowest_cost')[source]¶ Returns the list of experiments we have run
Parameters: sort_order – Can be set to lowest_cost, highest_cost or sequence. If set to sequence, experiments are returned in the sequence in which they were run
-
plot_2d
(x, y='all', plot_type='line', figsize=(15, 8), marker='X', marker_size=50, marker_color='r', xlim=None, hspace=None)[source]¶ Creates a 2D plot of the optimization output for plotting 1 parameter and costs.
Parameters: - x – Name of the parameter to plot on the x axis, corresponding to the same name in the generator.
- y – Can be one of: “cost” The name of another cost variable corresponding to the output from the cost function “all”, which creates a subplot for cost plus all other costs
- plot_type – line or scatter (default line)
- figsize – Figure size
- marker – Adds a marker to each point in x, y to show the actual data used for interpolation. You can set this to None to turn markers off.
- hspace – Vertical space between subplots
-
plot_3d
(x, y, z='all', plot_type='surface', figsize=(15, 15), interpolation='linear', cmap='viridis', marker='X', marker_size=50, marker_color='r', xlim=None, ylim=None, hspace=None)[source]¶ Creates a 3D plot of the optimization output for plotting 2 parameters and costs.
Parameters: - x – Name of the parameter to plot on the x axis, corresponding to the same name in the generator.
- y – Name of the parameter to plot on the y axis, corresponding to the same name in the generator.
- z – Can be one of: “cost” The name of another cost variable corresponding to the output from the cost function “all”, which creates a subplot for cost plus all other costs
- plot_type – surface or contour (default surface)
- figsize – Figure size
- interpolation – Can be ‘linear’, ‘nearest’ or ‘cubic’ for plotting z points between the ones passed in. See scipy.interpolate.griddata for details
- cmap – Colormap to use (default viridis). See matplotlib colormap for details
- marker – Adds a marker to each point in x, y, z to show the actual data used for interpolation. You can set this to None to turn markers off.
- hspace – Vertical space between subplots
-
pyqstrat.plot module¶
-
class
pyqstrat.plot.
BucketedValues
(name, bucket_names, bucket_values, proportional_widths=True, show_means=True, show_all=True, show_outliers=False, notched=False)[source]¶ Bases:
object
Data in a subplot where x axis is a categorical we summarize properties of a numpy array. For example, drawing a boxplot with percentiles.
-
__init__
(name, bucket_names, bucket_values, proportional_widths=True, show_means=True, show_all=True, show_outliers=False, notched=False)[source]¶ Parameters: - name – name used for this data in a plot legend
- bucket_names – list of strings used on x axis labels
- bucket_values – list of numpy arrays that are summarized in this plot
- proportional_widths – if set to True, the width each box in the boxplot will be proportional to the number of items in its corresponding array
- show_means – Whether to display a marker where the mean is for each array
- show_outliers – Whether to show markers for outliers that are outside the whiskers. Box is at Q1 = 25%, Q3 = 75% quantiles, whiskers are at Q1 - 1.5 * (Q3 - Q1), Q3 + 1.5 * (Q3 - Q1)
- notched – Whether to show notches indicating the confidence interval around the median
-
-
class
pyqstrat.plot.
DateFormatter
(dates, fmt)[source]¶ Bases:
matplotlib.ticker.Formatter
Formats dates on plot axes. See matplotlib Formatter
-
class
pyqstrat.plot.
DateLine
(date, name=None, line_type='dashed', color=None)[source]¶ Bases:
object
Draw a vertical line on a plot with a datetime x-axis
-
class
pyqstrat.plot.
HorizontalLine
(y, name=None, line_type='dashed', color=None)[source]¶ Bases:
object
Draws a horizontal line on a subplot
-
class
pyqstrat.plot.
OHLC
(name, dates, o, h, l, c, v=None, vwap=None, colorup='darkgreen', colordown='#F2583E')[source]¶ Bases:
object
Data in a subplot that contains open, high, low, close, volume bars. volume is optional.
-
class
pyqstrat.plot.
Plot
(subplot_list, title=None, figsize=(15, 8), date_range=None, date_format=None, sampling_frequency=None, show_grid=True, show_date_gaps=True, hspace=0.15)[source]¶ Bases:
object
Top level plot containing a list of subplots to draw
-
__init__
(subplot_list, title=None, figsize=(15, 8), date_range=None, date_format=None, sampling_frequency=None, show_grid=True, show_date_gaps=True, hspace=0.15)[source]¶ Parameters: - subplot_list – List of Subplot objects to draw
- title – Title for this plot. Default None
- figsize – Figure size. Default (15, 8)
- date_range – Tuple of strings or numpy datetime64 limiting dates to draw. e.g. (“2018-01-01 14:00”, “2018-01-05”). Default None
- date_format – Date format to use for x-axis
- sampling_frequency – Set this to downsample subplots that have a datetime x axis. For example, if you have minute bar data, you might want to subsample to hours if the plot is too crowded. See pandas time frequency strings for possible values. Default None
- show_grid – If set to True, show a grid on the subplots. Default True
- show_date_gaps – If set to True, then when there is a gap between dates will draw a dashed vertical line. For example, you may have minute bars and a gap between end of trading day and beginning of next day. Even if set to True, this will turn itself off if there are too many gaps to avoid clutter. Default True
- hspace – Height (vertical) space between subplots. Default 0.15
-
-
class
pyqstrat.plot.
Subplot
(data_list, title=None, xlabel=None, ylabel=None, zlabel=None, date_lines=None, horizontal_lines=None, vertical_lines=None, xlim=None, ylim=None, height_ratio=1.0, display_legend=True, legend_loc='best', log_y=False, y_tick_format=None)[source]¶ Bases:
object
A top level plot contains a list of subplots, each of which contain a list of data objects to draw
-
__init__
(data_list, title=None, xlabel=None, ylabel=None, zlabel=None, date_lines=None, horizontal_lines=None, vertical_lines=None, xlim=None, ylim=None, height_ratio=1.0, display_legend=True, legend_loc='best', log_y=False, y_tick_format=None)[source]¶ Parameters: - data_list – A list of objects to draw. Each element can contain XYData, XYZData, TimeSeries, OHLC, BucketedValues or TradeSet
- title – Title to show for this subplot. Default None
- zlabel – Only applicable to 3d subplots. Default None
- date_lines – A list of DateLine objects to draw as vertical lines. Only applicable when x axis is datetime. Default None
- horizontal_lines – A list of HorizontalLine objects to draw on the plot. Default None
- vertical_lines – A list of VerticalLine objects to draw on the plot
- xlim – x limits for the plot as a tuple of numpy datetime objects when x-axis is datetime, or tuple of floats. Default None
- ylim – y limits for the plot. Tuple of floats. Default None
- height_ratio – If you have more than one subplot on a plot, use height ratio to determine how high each subplot should be. For example, if you set height_ratio = 0.75 for the first subplot and 0.25 for the second, the first will be 3 times taller than the second one. Default 1.0
- display_legend – Whether to show a legend on the plot. Default True
- legend_loc – Location for the legend. Default ‘best’
- log_y – whether the y axis should be logarithmic. Default False
- y_tick_format – Format string to use for y axis labels. For example, you can decide to use fixed notation instead of scientific notation or change number of decimal places shown. Default None
-
-
class
pyqstrat.plot.
TimeSeries
(name, dates, values, plot_type='line', line_type='solid', line_width=None, color=None, marker=None, marker_size=50, marker_color='red')[source]¶ Bases:
object
Data in a subplot where x is an array of numpy datetimes and y is a numpy array of floats
-
__init__
(name, dates, values, plot_type='line', line_type='solid', line_width=None, color=None, marker=None, marker_size=50, marker_color='red')[source]¶ Args: name: Name to show in plot legend dates: pandas Series or numpy array of datetime64 values: pandas Series or numpy array of floats plot_type: ‘line’ or ‘scatter’ marker: If set, show a marker at each value in values. See matplotlib marker types
-
-
class
pyqstrat.plot.
TradeSet
(name, trades, marker='P', marker_color=None, marker_size=50)[source]¶ Bases:
object
Data for subplot that contains a set of trades along with marker properties for these trades
-
class
pyqstrat.plot.
VerticalLine
(x, name=None, line_type='dashed', color=None)[source]¶ Bases:
object
Draws a vertical line on a subplot where x axis is not a date-time axis
-
class
pyqstrat.plot.
XYData
(name, x, y, plot_type='line', line_type='solid', line_width=None, color=None, marker=None, marker_size=50, marker_color='red')[source]¶ Bases:
object
Data in a subplot that has x and y values that are both arrays of floats
-
class
pyqstrat.plot.
XYZData
(name, x, y, z, plot_type='surface', marker='X', marker_size=50, marker_color='red', interpolation='linear', cmap='viridis')[source]¶ Bases:
object
Data in a subplot that has x, y and z values that are all floats
-
__init__
(name, x, y, z, plot_type='surface', marker='X', marker_size=50, marker_color='red', interpolation='linear', cmap='viridis')[source]¶ Parameters: - x – pandas series or numpy array of floats
- y – pandas series or numpy array of floats
- z – pandas series or numpy array of floats
- plot_type – surface or contour (default surface)
- marker – Adds a marker to each point in x, y, z to show the actual data used for interpolation. You can set this to None to turn markers off.
- interpolation – Can be ‘linear’, ‘nearest’ or ‘cubic’ for plotting z points between the ones passed in. See scipy.interpolate.griddata for details
- cmap – Colormap to use (default viridis). See matplotlib colormap for details
-
-
pyqstrat.plot.
draw_3d_plot
(ax, x, y, z, plot_type, marker='X', marker_size=50, marker_color='red', interpolation='linear', cmap='viridis')[source]¶ Draw a 3d plot. See XYZData class for explanation of arguments
>>> points = np.random.rand(1000, 2) >>> x = np.random.rand(10) >>> y = np.random.rand(10) >>> z = x ** 2 + y ** 2 >>> if has_display(): ... fig, ax = plt.subplots() ... draw_3d_plot(ax, x = x, y = y, z = z, plot_type = 'contour', interpolation = 'linear')
-
pyqstrat.plot.
draw_boxplot
(ax, names, values, proportional_widths=True, notched=False, show_outliers=True, show_means=True, show_all=True)[source]¶ Draw a boxplot. See BucketedValues class for explanation of arguments
-
pyqstrat.plot.
draw_candlestick
(ax, index, o, h, l, c, v, vwap, colorup='darkgreen', colordown='#F2583E')[source]¶ Draw candlesticks given parrallel numpy arrays of o, h, l, c, v values. v is optional. See OHLC class __init__ for argument descriptions.
-
pyqstrat.plot.
draw_date_line
(ax, plot_dates, date, linestyle, color)[source]¶ Draw vertical line on a subplot with datetime x axis
-
pyqstrat.plot.
draw_horizontal_line
(ax, y, linestyle, color)[source]¶ Draw horizontal line on a subplot
-
pyqstrat.plot.
draw_poly
(ax, left, bottom, top, right, facecolor, edgecolor, zorder)[source]¶ Draw a set of polygrams given parrallel numpy arrays of left, bottom, top, right points
-
pyqstrat.plot.
get_date_formatter
(plot_dates, date_format)[source]¶ Create an appropriate DateFormatter for x axis labels. If date_format is set to None, figures out an appropriate date format based on the range of dates passed in
-
pyqstrat.plot.
trade_sets_by_reason_code
(trades, marker_props={'backtest end': {'color': 'green', 'size': 50, 'symbol': '*'}, 'enter long': {'color': 'blue', 'size': 50, 'symbol': 'P'}, 'enter short': {'color': 'red', 'size': 50, 'symbol': 'P'}, 'exit long': {'color': 'blue', 'size': 50, 'symbol': 'X'}, 'exit short': {'color': 'red', 'size': 50, 'symbol': 'X'}, 'none': {'color': 'green', 'size': 50, 'symbol': 'o'}, 'roll future': {'color': 'green', 'size': 50, 'symbol': '>'}})[source]¶ Returns a list of TradeSet objects. Each TradeSet contains trades with a different reason code. The markers for each TradeSet are set by looking up marker properties for each reason code using the marker_props argument:
Parameters: - trades – List of Trade objects, each containing an order attribute which in turn contains a reason_code attribute
- marker_props – Dictionary from reason code string -> dictionary of marker properties. See ReasonCode.MARKER_PROPERTIES for example. Default ReasonCode.MARKER_PROPERTIES
pyqstrat.evaluator module¶
-
class
pyqstrat.evaluator.
Evaluator
(initial_metrics)[source]¶ Bases:
object
You add functions to the evaluator that are dependent on the outputs of other functions. The evaluator will call these functions in the right order so dependencies are computed first before the functions that need their output. You can retrieve the output of a metric using the metric member function
>>> evaluator = Evaluator(initial_metrics={'x' : np.array([1, 2, 3]), 'y' : np.array([3, 4, 5])}) >>> evaluator.add_metric('z', lambda x, y: sum(x, y), dependencies=['x', 'y']) >>> evaluator.compute() >>> evaluator.metric('z') array([ 9, 10, 11])
-
__init__
(initial_metrics)[source]¶ Inits Evaluator with a dictionary of initial metrics that are used to compute subsequent metrics
Parameters: initial_metrics – a dictionary of string name -> metric. metric can be any object including a scalar, an array or a tuple
-
compute
(metric_names=None)[source]¶ Compute metrics using the internal dependency graph
Parameters: metric_names – an array of metric names. If not passed in, evaluator will compute and store all metrics
-
-
pyqstrat.evaluator.
compute_amean
(returns)[source]¶ Computes arithmetic mean of a return array, ignoring NaNs
Parameters: returns – a numpy array of floats representing returns at any frequency Returns: a float >>> compute_amean(np.array([3, 4, np.nan])) 3.5
-
pyqstrat.evaluator.
compute_annual_returns
(dates, returns, periods_per_year)[source]¶ Takes the output of compute_bucketed_returns and returns geometric mean of returns by year
Returns: A tuple with the first element being an array of years (integer) and the second element an array of annualized returns for those years
-
pyqstrat.evaluator.
compute_bucketed_returns
(dates, returns)[source]¶ Bucket returns by year
Returns: A tuple with the first element being a list of years and the second a list of numpy arrays containing returns for each corresponding year
-
pyqstrat.evaluator.
compute_calmar
(returns_3yr, periods_per_year, mdd_pct_3yr)[source]¶ Compute Calmar ratio, which is the annualized return divided by max drawdown over the last 3 years
-
pyqstrat.evaluator.
compute_dates_3yr
(dates)[source]¶ Given an array of numpy datetimes, return those that are within 3 years of the last date in the array
-
pyqstrat.evaluator.
compute_equity
(dates, starting_equity, returns)[source]¶ Given starting equity, dates and returns, create a numpy array of equity at each date
-
pyqstrat.evaluator.
compute_gmean
(returns, periods_per_year)[source]¶ Computes geometric mean of an array of returns
Parameters: - returns – a numpy array of returns
- periods_per_year – number of trading periods per year
Returns: a float
>>> round(compute_gmean(np.array([0.001, 0.002, 0.003]), 252.), 6) 0.654358
-
pyqstrat.evaluator.
compute_mar
(returns, periods_per_year, mdd_pct)[source]¶ Compute MAR ratio, which is annualized return divided by biggest drawdown since inception.
-
pyqstrat.evaluator.
compute_maxdd_date
(rolling_dd_dates, rolling_dd)[source]¶ Compute date of max drawdown given numpy array of dates, and corresponding rolling dd percentages
-
pyqstrat.evaluator.
compute_maxdd_date_3yr
(rolling_dd_3yr_dates, rolling_dd_3yr)[source]¶ Compute max drawdown date over the last 3 years
-
pyqstrat.evaluator.
compute_maxdd_pct
(rolling_dd)[source]¶ Compute max drawdown percentage given a numpy array of rolling drawdowns, ignoring NaNs
-
pyqstrat.evaluator.
compute_maxdd_pct_3yr
(rolling_dd_3yr)[source]¶ Compute max drawdown percentage over the last 3 years
-
pyqstrat.evaluator.
compute_maxdd_start
(rolling_dd_dates, rolling_dd, mdd_date)[source]¶ Compute date when max drawdown starts, given numpy array of dates, corresponding rolling dd percentages and date that max dd starts
-
pyqstrat.evaluator.
compute_maxdd_start_3yr
(rolling_dd_3yr_dates, rolling_dd_3yr, mdd_date_3yr)[source]¶ Comput max drawdown start date over the last 3 years
-
pyqstrat.evaluator.
compute_periods_per_year
(dates)[source]¶ - Computes trading periods per year for an array of numpy datetime64’s.
- E.g. if most of the dates are separated by 1 day, will return 252.
Parameters: dates – a numpy array of datetime64’s Returns: a float >>> compute_periods_per_year(np.array(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-09'], dtype = 'M8[D]')) 252.0
-
pyqstrat.evaluator.
compute_return_metrics
(dates, rets, starting_equity)[source]¶ Compute a set of common metrics using returns (for example, of an instrument or a portfolio)
Parameters: - dates – a numpy datetime array with one date per return
- rets – a numpy float array of returns
- starting_equity – starting equity value in your portfolio
Returns: An Evaluator object containing computed metrics off the returns passed in. If needed, you can add your own metrics to this object based on the values of existing metrics and recompute the Evaluator. Otherwise, you can just use the output of the evaluator using the metrics function.
-
pyqstrat.evaluator.
compute_returns_3yr
(dates, returns)[source]¶ Given an array of numpy datetimes and an array of returns, return those that are within 3 years of the last date in the datetime array
-
pyqstrat.evaluator.
compute_rolling_dd
(dates, equity)[source]¶ Compute numpy array of rolling drawdown percentage
Parameters: - dates – numpy array of datetime64
- equity – numpy array of equity
-
pyqstrat.evaluator.
compute_rolling_dd_3yr
(dates, equity)[source]¶ Compute rolling drawdowns over the last 3 years
-
pyqstrat.evaluator.
compute_sharpe
(returns, amean, periods_per_year)[source]¶ Note that this does not take into risk free returns so it’s really a sharpe0, i.e. assumes risk free returns are 0
Parameters: - returns – a numpy array of returns
- amean – arithmetic mean of returns
- periods_per_year – number of trading periods per year
>>> round(compute_sharpe(np.array([0.001, -0.001, 0.002]), 0.001, 252), 6) 12.727922
-
pyqstrat.evaluator.
compute_sortino
(returns, amean, periods_per_year)[source]¶ Note that this assumes target return is 0.
Parameters: - returns – a numpy array of returns
- amean – arithmetic mean of returns
- periods_per_year – number of trading periods per year
>>> print(round(compute_sortino(np.array([0.001, -0.001, 0.002]), 0.001, 252), 6)) 33.674916
-
pyqstrat.evaluator.
compute_std
(returns)[source]¶ Computes standard deviation of an array of returns, ignoring nans
-
pyqstrat.evaluator.
display_return_metrics
(metrics, float_precision=3)[source]¶ Creates a dataframe making it convenient to view the output of the metrics obtained using the compute_return_metrics function.
Parameters: float_precision – Change if you want to display floats with more or less significant figures than the default, 3 significant figures. Returns: A one row dataframe with formatted metrics.
pyqstrat.pyqstrat_cpp module¶
-
class
pyqstrat.pyqstrat_cpp.
Aggregator
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
AllOpenInterestAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Writes out all open interest records
-
__call__
()¶ Add an open interest record to be written to disk at some point
Parameters: - oi (
OpenInterestRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- oi (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch. Defaults to Schema.TIMESTAMP_MILLI
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
-
class
pyqstrat.pyqstrat_cpp.
AllOtherAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Writes out any records that are not trades, quotes or open interest
-
__call__
()¶ Add a record to be written to disk at some point
Parameters: - other (
OtherRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- other (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch. Defaults to Schema.TIMESTAMP_MILLI
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
-
class
pyqstrat.pyqstrat_cpp.
AllQuoteAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Writes out every quote we see
-
__call__
()¶ Add a quote record to be written to disk at some point
Parameters: - quote (
QuoteRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- quote (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- batch_size – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
-
class
pyqstrat.pyqstrat_cpp.
AllQuotePairAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Writes out every quote pair we find
-
__call__
()¶ Add a quote pair record to be written to disk at some point
Parameters: quote_pair ( QuoteRecord
) – line_number (int): The line number of the source file that this trade came from. Used for debugging
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch.
- to Schema.TIMESTAMP_MILLI (Defaults) –
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
-
class
pyqstrat.pyqstrat_cpp.
AllTradeAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Writes out every trade we see
-
__call__
()¶ Add a trade record to be written to disk at some point
Parameters: - trade (
TradeRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- trade (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch. Defaults to Schema.TIMESTAMP_MILLI
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
-
class
pyqstrat.pyqstrat_cpp.
ArrowWriter
¶ Bases:
pyqstrat.pyqstrat_cpp.Writer
A subclass of
Writer
that batches of records to a disk file in the Apache arrow format. See Apache arrow for details-
__init__
()¶ Parameters: - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- schema (
Schema
) – A schema object containing the names and datatypes of each field we want to save in a record - create_batch_id_file (bool, optional) – Whether to create a corresponding file that contains a map from batch id -> batch number so we can easily lookup a batch number and then retrieve it from disk. Defaults to False
- max_batch_size (int, optional) – If set, when we get this many records, we write out a batch of records to disk. May be necessary when we are creating large output files, to avoid running out of memory when reading and writing. Defaults to -1
-
add_record
()¶ Add a record that will be written to disk at some point.
Parameters: - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- tuple (tuple) – Must correspond to the schema defined in the constructor. For example, if the schema has a bool and a float, the tuple could be (False, 0.5)
-
close
()¶ Close the writer and flush any remaining data to disk
Parameters: success (bool, optional) – If set to False, we had some kind of exception and are cleaning up. Tells the function to not indicate the file was written successfully, for example by renaming a temp file to the actual filename. Defaults to True
-
write_batch
()¶ Write a batch of records to disk. The batch can have an optional string id so we can later retrieve just this batch of records without reading the whole file.
Parameters: batch_id (str, optional) – An identifier which can later be used to retrieve this batch from disk. Defaults to “”
-
-
class
pyqstrat.pyqstrat_cpp.
ArrowWriterCreator
¶ Bases:
pyqstrat.pyqstrat_cpp.WriterCreator
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
BadLineHandler
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
CheckFields
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
FileProcessor
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
FixedWidthTimeParser
¶ Bases:
pyqstrat.pyqstrat_cpp.TimestampParser
A helper class that takes a string formatted as HH:MM:SS.xxx and parses it into number of milliseconds or micros since the beginning of the day
-
__call__
()¶ Parameters: time (str) – A string like “08:35:22.132” Returns: Millis since beginning of day Return type: int
-
__init__
()¶ Parameters: - micros (bool, optional) – Whether to return timestamp in millisecs or microsecs since 1970. Default false
- hours_start (int, optional) – index where the hour starts in the timestamp string. Default -1
- hours_size (int, optional) – number of characters used for the hour
- minutes_start (int, optional) –
- minutes_size (int, optional) –
- seconds_start (int, optional) –
- seconds_size (int, optional) –
- millis_start (int, optional) –
- millis_size (int, optional) –
- micros_start (int, optional) –
- micros_size (int, optional) –
-
-
class
pyqstrat.pyqstrat_cpp.
FormatTimestampParser
¶ Bases:
pyqstrat.pyqstrat_cpp.TimestampParser
- Helper class that parses timestamps according to the strftime format string passed in. strftime is slow so
- use
FixedWithTimeParser
if your timestamp has a fixed format such as “HH:MM:SS….”
-
__call__
()¶ Parameters: time (str) – The timestamp to parse Returns: Number of millis or micros since epoch Return type: int
-
__init__
()¶ Parameters: - base_date (int) – Sometimes the timestamps in a file contain time only and the name of a file contains the date. In these cases, pass in the date as number of millis or micros from the epoch to the start of that date. If the timestamp has date also, pass in 0 here.
- time_format (str, optional) – strftime format string for parsing the timestamp. Defaults to “%H:%M:%S”
- micros (bool, optional) – If this is set, we will parse and store microseconds. Otherwise we will parse and store milliseconds. Defaults to True
-
class
pyqstrat.pyqstrat_cpp.
IsFieldInList
¶ Bases:
pyqstrat.pyqstrat_cpp.CheckFields
Simple utility class to check whether the value of fields[flag_idx] is in any of flag_values
-
__call__
()¶ Parameters: flag_values – a vector of strings containing possible values for the field Returns: a boolean
-
__init__
()¶ Parameters: - fields – a vector of strings
- flag_idx – the index of fields to check
-
-
class
pyqstrat.pyqstrat_cpp.
LineFilter
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
MissingDataHandler
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
OpenInterestRecord
¶ Bases:
pyqstrat.pyqstrat_cpp.Record
Open interest for a future or option. Usually one record per instrument at the beginning of the day
-
id
¶ Represents a symbol or instrument id, for example, for an option you may concantenate underlying symbol, expiration, strike, put or call to uniquely identify the instrument
Type: str
-
timestamp
¶ Trade time, in milliseconds or microseconds since 1/1/1970
Type: int
-
qty
¶ Trade quantity
Type: float
-
metadata
¶ A string representing any extra information you want to save, such as exchange, or special trade conditions
Type: str
-
__init__
()¶
-
id
-
metadata
-
qty
-
timestamp
-
-
class
pyqstrat.pyqstrat_cpp.
OtherRecord
¶ Bases:
pyqstrat.pyqstrat_cpp.Record
Any other data you want to store from market data besides trades, quotes and open interest. You can capture any important fields in the metadata attribute
-
id
¶ Represents a symbol or instrument id, for example, for an option you may concantenate underlying symbol, expiration, strike, put or call to uniquely identify the instrument
Type: str
-
timestamp
¶ trade time, in milliseconds or microseconds since 1/1/1970
Type: int
-
metadata
¶ a string representing any extra information you want to save, such as exchange, or special trade conditions
Type: str
-
__init__
()¶
-
id
-
metadata
-
timestamp
-
-
class
pyqstrat.pyqstrat_cpp.
PriceQtyMissingDataHandler
¶ Bases:
pyqstrat.pyqstrat_cpp.MissingDataHandler
A helper class that takes a Record as an input, checks whether its a trade or a quote or any open interest record, and if any of the prices or quantities are 0, sets them to NAN
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
PrintBadLineHandler
¶ Bases:
pyqstrat.pyqstrat_cpp.BadLineHandler
A helper class that takes in lines we cannot parse and either prints them and continues or raises an Exception
-
__call__
()¶ Parameters: - line_number (int) – Line number of the input file that corresponds to this line (for debugging)
- line (str) – The actual line that failed to parse
- exception (Exception) – The exception that caused us to fail to parse this line.
-
__init__
()¶ Parameters: raise (bool, optional) – Whether to raise an exception every time this is called or just print debugging info. Defaults to False
-
-
class
pyqstrat.pyqstrat_cpp.
QuoteRecord
¶ Bases:
pyqstrat.pyqstrat_cpp.Record
A parsed quote record that we can save to disk
-
id
¶ Represents a symbol or instrument id, for example, for an option you may concantenate underlying symbol, expiration, strike, put or call to uniquely identify the instrument
Type: str
-
timestamp
¶ Trade time, in milliseconds or microseconds since 1/1/1970
Type: int
-
bid
¶ If True, this is a bid quote, otherwise it is an offer
Type: bool
-
qty
¶ Trade quantity
Type: float
-
price
¶ Trade price
Type: float
-
metadata
¶ A string representing any extra information you want to save, such as exchange, or special trade conditions
Type: str
-
__init__
()¶
-
bid
-
id
-
metadata
-
price
-
qty
-
timestamp
-
-
class
pyqstrat.pyqstrat_cpp.
QuoteTOBAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
Aggregate top of book quotes to top of book records. If you specify a frequency such as “5m”, we will calculate a record every 5 minutes which has the top of book at the end of that bar. If no frequency is specified, we will create a top of book every time a quote comes in. We assume that the quotes are all top of book quotes and are written in pairs so we have a bid quote followed by a offer quote with the same timestamp or vice versa
-
__call__
()¶ Add a quote record to be written to disk at some point
Parameters: - quote (
QuoteRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- quote (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- frequency (str, optional) – A string like “5m” indicating the bar size is 5 mins. Units can be s,m,h or d for second, minute, hour or day. Defaults to “5m”. If you set this to “”, each tick will be recorded.
- batch_by_id (bool, optional) – If set, we will create one batch for each id. This will allow us to retrieve all records for a single
- by reading a single batch. Defaults to True. (instrument) –
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch. Defaults to Schema.TIMESTAMP_MILLI
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
close
()¶ Flush all unwritten records to the Writer, which writes them to disk when its close function is called
-
-
class
pyqstrat.pyqstrat_cpp.
Record
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
pyqstrat.pyqstrat_cpp.
RecordFieldParser
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
RecordFilter
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
RecordGenerator
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
RecordParser
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
()¶
-
add_line
()¶ Parameters: line (str) – The line we need to parse
-
-
class
pyqstrat.pyqstrat_cpp.
RegExLineFilter
¶ Bases:
pyqstrat.pyqstrat_cpp.LineFilter
A helper class that filters lines from the input file based on a regular expression. Note that regular expressions are slow, so if you just need to match specific strings, use a string matching filter instead.
-
__call__
()¶ Parameters: line (str) – The string that the regular expression should match. Returns: Whether the regex matched Return type: bool
-
__init__
()¶ Parameters: pattern (str) – The regex pattern to match. This follows C++ std::regex pattern matching rules as opposed to python
-
-
class
pyqstrat.pyqstrat_cpp.
Schema
¶ Bases:
pybind11_builtins.pybind11_object
Describes a list of field names and data types for writing records to disk
-
types
¶ A list of (str, type) tuples describing a record with the name of each field and its datatype
-
BOOL
= Type.BOOL¶
-
FLOAT32
= Type.FLOAT32¶
-
FLOAT64
= Type.FLOAT64¶
-
INT32
= Type.INT32¶
-
INT64
= Type.INT64¶
-
STRING
= Type.STRING¶
-
TIMESTAMP_MICRO
= Type.TIMESTAMP_MICRO¶
-
TIMESTAMP_MILLI
= Type.TIMESTAMP_MILLI¶
-
class
Type
¶ Bases:
pybind11_builtins.pybind11_object
-
BOOL
= Type.BOOL¶
-
FLOAT32
= Type.FLOAT32¶
-
FLOAT64
= Type.FLOAT64¶
-
INT32
= Type.INT32¶
-
INT64
= Type.INT64¶
-
STRING
= Type.STRING¶
-
TIMESTAMP_MICRO
= Type.TIMESTAMP_MICRO¶
-
TIMESTAMP_MILLI
= Type.TIMESTAMP_MILLI¶
-
__init__
()¶
-
-
__init__
()¶
-
types
-
-
class
pyqstrat.pyqstrat_cpp.
SubStringLineFilter
¶ Bases:
pyqstrat.pyqstrat_cpp.LineFilter
A helper class that will check if a line matches any of a set of strings
-
__call__
()¶ Parameters: line (str) – We check if any of the patterns are present in this string Returns: Whether any of the patterns were present Return type: bool
-
__init__
()¶ Parameters: patterns (list of str) – The list of strings to match against
-
-
class
pyqstrat.pyqstrat_cpp.
TextFileDecompressor
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordGenerator
A helper function that takes a filename and its compression type, and returns a function that we can use to iterate over lines in that file
-
__call__
()¶ Parameters: - filename (str) – The file to read
- compression (str) – One of “” for uncompressed files, “gzip”, “bz2” or “lzip”
Returns: A function that takes an empty string as input, and fills in that string. The function should return False EOF has been reached, True otherwise
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
TextFileProcessor
¶ Bases:
pyqstrat.pyqstrat_cpp.FileProcessor
A helper class that takes text based market data files and creates parsed and aggregated quote, trade, open interest, and other files from them.
-
__call__
()¶ Parameters: - input_filename (str) – The file to read
- compression (str) – One of “” for uncompressed files, “gzip”, “bz2” or “lzip”
Returns: Number of lines processed
Return type: int
-
__init__
()¶ Parameters: - record_generator – A function that takes a filename and its compression type, and returns a function that we can use to iterate over lines in that file
- line_filter – A function that takes a line (str) as input and returns whether we should parse it or discard it
- record_parser – A function that takes a line (str) as input and returns a
Record
object - bad_line_handler – A function that takes a line that failed to parse and returns a
Record
object or None - record_filter – A function that takes a parsed Record object and returns whether we should keep it or discard it
- missing_data_handler – A function that takes a parsed Record object and deals with missing data, for example, by converting 0’s to NANs
- aggregators – A vector of functions that each take a parsed Record object and aggregate it.
- skip_rows (int, optional) – Number of rows to skip in the file before starting to read it. Defaults to 1 to ignore a header line
-
-
class
pyqstrat.pyqstrat_cpp.
TextOpenInterestParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordFieldParser
Helper class that parses an open interest record from a list of fields (strings)
-
__call__
()¶ Parameters: fields (list of str) – A list of fields representing the record Returns: or None if this record is not an open interest record Return type: OpenInterestRecord
-
__init__
()¶ Parameters: - is_open_interest – A function that takes a list of strings as input and returns a bool if the fields represent an open interest record
- base_date (int) – If the timestamp in the files does not have a date component, pass in the date as number of millis or micros since the epoch
- timestamp_idx (int) – Index of the timestamp field within the record
- qty_idx (int) – Index of the quote size field
- id_field_indices (list of str) – Indices of the fields identifying an instrument. For example, for a future this could be symbol and expiry. These fields will be concatenated with a separator and placed in the id field in the record
- meta_field_indices (list of str) – Indices of additional fields you want to store. For example, the exchange.
- timestamp_parser – A function that takes a timestamp as a string and returns number of millis or micros since the epoch
- strip_id (bool, optional) – If we want to strip any whitespace from the id fields before concatenating them. Defaults to True
- strip_meta (bool, optional) – If we want to strip any whitespace from the meta fields before concatenating them. Defaults to True
-
-
class
pyqstrat.pyqstrat_cpp.
TextOtherParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordFieldParser
Helper class that parses a record that contains information other than a quote, trade or open interest record
-
__call__
()¶ Parameters: fields (list of str) – a list of fields representing the record Returns: Return type: OtherRecord
-
__init__
()¶ Parameters: - is_other – A function that takes a list of strings as input and returns a bool if we want to parse this record
- base_date (int) – If the timestamp in the files does not have a date component, pass in the date as number of millis or micros since the epoch
- timestamp_idx (int) – Index of the timestamp field within the record
- id_field_indices (list of str) – Indices of the fields identifying an instrument. For example, for a future this could be symbol and expiry. These fields will be concatenated with a separator and placed in the id field in the record
- meta_field_indices (list of str) – Indices of additional fields you want to store. For example, the exchange.
- timestamp_parser – A function that takes a timestamp as a string and returns number of millis or micros since the epoch
- strip_id (bool, optional) – If we want to strip any whitespace from the id fields before concatenating them. Defaults to True
- strip_meta (bool, optional) – If we want to strip any whitespace from the meta fields before concatenating them. Defaults to True
-
-
class
pyqstrat.pyqstrat_cpp.
TextQuotePairParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordFieldParser
Helper class that parses a quote containing bid / ask in the same record from a list of fields (strings)
-
__call__
()¶ Parameters: - fields (list of str) – A list of fields representing the record
- Returns –
- QuotePairRecord – Or None if this field is not a quote pair
-
__init__
()¶ Parameters: - is_quote_pair – a function that takes a list of strings as input and returns a bool if the fields represent a quote pair
- base_date (int) – if the timestamp in the files does not have a date component, pass in the date as number of millis or micros since the epoch
- timestamp_idx (int) – index of the timestamp field within the record
- bid_price_idx (int) – index of the field that contains the bid price
- bid_qty_idx (int) – index of the field that contains the bid quantity
- ask_price_idx (int) – index of the field that contains the ask price
- ask_qty_idx (int) – index of the field that contains the ask quantity
- id_field_indices (list of str) – indices of the fields identifying an instrument. For example, for a future this could be symbol and expiry. These fields will be concatenated with a separator and placed in the id field in the record.
- meta_field_indices (list of str) – indices of additional fields you want to store. For example, the exchange.
- timestamp_parser – a function that takes a timestamp as a string and returns number of millis or micros since the epoch
- price_multiplier – (float, optional): sometimes the price in a file could be in hundredths of cents, and we divide by this to get dollars. Defaults to 1.0
- strip_id (bool, optional) – if we want to strip any whitespace from the id fields before concatenating them. Defaults to True
- strip_meta (bool, optional) – if we want to strip any whitespace from the meta fields before concatenating them. Defaults to True
-
-
class
pyqstrat.pyqstrat_cpp.
TextQuoteParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordFieldParser
Helper class that parses a quote from a list of fields (strings)
-
__call__
()¶ Parameters: fields (list of str) – A list of fields representing the record Returns: Or None if this field is not a quote Return type: QuoteRecord
-
__init__
()¶ Parameters: - is_quote – a function that takes a list of strings as input and returns a bool if the fields represent a quote
- base_date (int) – if the timestamp in the files does not have a date component, pass in the date as number of millis or micros since the epoch
- timestamp_idx (int) – index of the timestamp field within the record
- bid_offer_idx (int) – index of the field that contains whether this is a bid or offer quote
- price_idx (int) – index of the price field
- qty_idx (int) – index of the quote size field
- id_field_indices (list of str) – indices of the fields identifying an instrument. For example, for a future this could be symbol and expiry. These fields will be concatenated with a separator and placed in the id field in the record
- meta_field_indices (list of str) – indices of additional fields you want to store. For example, the exchange.
- timestamp_parser – a function that takes a timestamp as a string and returns number of millis or micros since the epoch
- bid_str (str) – if the field indicated in bid_offer_idx matches this string, we consider this quote to be a bid
- offer_str (str) – if the field indicated in bid_offer_idx matches this string, we consider this quote to be an offer
- price_multiplier – (float, optional): sometimes the price in a file could be in hundredths of cents, and we divide by this to get dollars. Defaults to 1.0
- strip_id (bool, optional) – if we want to strip any whitespace from the id fields before concatenating them. Defaults to True
- strip_meta (bool, optional) – if we want to strip any whitespace from the meta fields before concatenating them. Defaults to True
-
-
class
pyqstrat.pyqstrat_cpp.
TextRecordParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordParser
A helper class that takes in a text line, separates it into a list of fields based on a delimiter, and then uess the parsers passed in to try and parse the line as a quote, trade, open interest record or any other type of record
-
__init__
()¶ Parameters: - parsers – A vector of functions that each take a list of strings as input and returns a subclass of
Record
or None - exclusive (bool, optional) – Set this when each line can only contain one type of record, after one first parser returns a non None object, we will not call other parsers. Default false
- separator (str, optional) – A single character string. This is the delimiter we use to separate fields from the text passed in. Default ,
- parsers – A vector of functions that each take a list of strings as input and returns a subclass of
-
-
class
pyqstrat.pyqstrat_cpp.
TextTradeParser
¶ Bases:
pyqstrat.pyqstrat_cpp.RecordFieldParser
Helper class that parses a trade from a list of fields (strings)
-
__call__
()¶ Parameters: fields (list of str) – A list of fields representing the record Returns: or None if this record is not a trade Return type: TradeRecord
-
__init__
()¶ Parameters: - is_trade – A function that takes a list of strings as input and returns a bool if the fields represent a trade
- base_date (int) – If the timestamp in the files does not have a date component, pass in the date as number of millis or micros since the epoch
- timestamp_idx (int) – Index of the timestamp field within the record
- price_idx (int) – Index of the price field
- qty_idx (int) – Index of the quote size field
- id_field_indices (list of str) – Indices of the fields identifying an instrument. For example, for a future this could be symbol and expiry. These fields will be concatenated with a separator and placed in the id field in the record
- meta_field_indices (list of str) – Indices of additional fields you want to store. For example, the exchange.
- timestamp_parser – A function that takes a timestamp as a string and returns number of millis or micros since the epoch
- price_multiplier – (float, optional): Sometimes the price in a file could be in hundredths of cents, and we divide by this to get dollars. Defaults to 1.0
- strip_id (bool, optional) – If we want to strip any whitespace from the id fields before concatenating them. Defaults to True
- strip_meta (bool, optional) – If we want to strip any whitespace from the meta fields before concatenating them. Defaults to True
-
-
class
pyqstrat.pyqstrat_cpp.
TimestampParser
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
class
pyqstrat.pyqstrat_cpp.
TradeBarAggregator
¶ Bases:
pyqstrat.pyqstrat_cpp.Aggregator
- Aggregate trade records to create trade bars, given a frequency. Calculates open, high, low, close, volume, vwap as well as last_update_time
- which is timestamp of the last trade that we processed before the bar ended.
-
__call__
()¶ Add a trade record to be written to disk at some point
Parameters: - trade (
TradeRecord
) – - line_number (int) – The line number of the source file that this trade came from. Used for debugging
- trade (
-
__init__
()¶ Parameters: - writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
Writer
interface - output_file_prefix (str) – Path of the output file to create. The writer and aggregator may add suffixes to this to indicate the kind of data and format the file creates. E.g. “/tmp/output_file_1”
- frequency (str, optional) – A string like “5m” indicating the bar size is 5 mins. Units can be s,m,h or d for second, minute, hour or day. Defaults to “5m”
- batch_by_id (bool, optional) – If set, we will create one batch for each id. This will allow us to retrieve all records for a single instrument by reading a single batch. Defaults to True.
- batch_size (int, optional) – If set, we will write a batch to disk every time we have this many records queued up. Defaults to 2.1 billion
- timestamp_unit (Schema.Type, optional) – Whether timestamps are measured as milliseconds or microseconds since the unix epoch. Defaults to Schema.TIMESTAMP_MILLI
- writer_creator – A function that takes an output_file_prefix, schema, create_batch_id and max_batch_size and returns an object
implementing the
-
close
()¶ Flush all unwritten records to the Writer, which writes them to disk when its close function is called
-
class
pyqstrat.pyqstrat_cpp.
TradeRecord
¶ Bases:
pyqstrat.pyqstrat_cpp.Record
A parsed trade record that we can save to disk
-
id
¶ A unique string representing a symbol or instrument id
Type: str
-
timestamp
¶ Trade time, in milliseconds or microseconds since 1/1/1970
Type: int
-
qty
¶ Trade quantity
Type: float
-
price
¶ Trade price
Type: float
-
metadata
¶ a string representing any extra information you want to save, such as exchange, or special trade conditions
Type: str
-
__init__
()¶
-
id
-
metadata
-
price
-
qty
-
timestamp
-
-
class
pyqstrat.pyqstrat_cpp.
Writer
¶ Bases:
pybind11_builtins.pybind11_object
An abstract class that you subclass to provide an object that can write to disk
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
close
()¶ Close the writer and flush any remaining data to disk
Parameters: success (bool, optional) – If set to False, we had some kind of exception and are cleaning up. Tells the function to not indicate the file was written successfully, for example by renaming a temp file to the actual filename. Defaults to True
-
write_batch
()¶ Write a batch of records to disk. The batch can have an optional string id so we can later retrieve just this batch of records without reading the whole file
Parameters: batch_id (str, optional) – An identifier which can later be used to retrieve this batch from disk. Defaults to “”
-
-
class
pyqstrat.pyqstrat_cpp.
WriterCreator
¶ Bases:
pybind11_builtins.pybind11_object
-
__call__
()¶
-
__init__
()¶
-
-
pyqstrat.pyqstrat_cpp.
black_scholes_price
(call: numpy.ndarray[bool], S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute Euroepean option price Args: call (bool): True for a call option, False for a put S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option price
-
pyqstrat.pyqstrat_cpp.
cdf
(x: numpy.ndarray[float64]) → object¶ Cumulative density function of normal distribution Args: x (float): random variable Returns: float: cdf of the random variable
-
pyqstrat.pyqstrat_cpp.
d1
(S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ d1 from Black Scholes Args: S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float:
-
pyqstrat.pyqstrat_cpp.
d2
(S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ d2 from Black Scholes Args: S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float:
-
pyqstrat.pyqstrat_cpp.
delta
(call: numpy.ndarray[bool], S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute European option delta Args: call (bool): True for a call option, False for a put S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option delta
-
pyqstrat.pyqstrat_cpp.
gamma
(S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute European option gamma. Args: S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option gamma
-
pyqstrat.pyqstrat_cpp.
implied_vol
(call: numpy.ndarray[bool], price: numpy.ndarray[float64], S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute implied volatility for a European option. Args: call (bool): True for a call option, False for a put price (float): The option premium S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Implied volatility. For 1% we return 0.01
-
class
pyqstrat.pyqstrat_cpp.
ostream_redirect
¶ Bases:
pybind11_builtins.pybind11_object
-
__init__
(self: pyqstrat.pyqstrat_cpp.ostream_redirect, stdout: bool=True, stderr: bool=True) → None¶
-
-
pyqstrat.pyqstrat_cpp.
rho
(call: numpy.ndarray[bool], S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute European option rho. This is Black Scholes formula rho divided by 100 so we get rho per 1% change in interest rate Args: call (bool): True for a European call option, False for a put S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option theta
-
pyqstrat.pyqstrat_cpp.
theta
(call: numpy.ndarray[bool], S: numpy.ndarray[float64], K: numpy.ndarray[float64], t: numpy.ndarray[float64], r: numpy.ndarray[float64], sigma: numpy.ndarray[float64], q: numpy.ndarray[float64]) → object¶ Compute European option theta per day. This is Black Scholes formula theta divided by 365 to give us the customary theta per day Args: call (bool): True for a call option, False for a put S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option theta
-
pyqstrat.pyqstrat_cpp.
vega
(S: float, K: float, t: float, r: float, sigma: float, q: float) → float¶ Compute European option vega. This is Black Scholes formula vega divided by 100 so we get rho per 1% change in interest rate Args: S (float): Spot price. For a future discount the future price using exp(-rt) K (float): Strike t (float): Time to maturity in years r (float): Continuously compounded interest rate. Use 0.01 for 1% sigma (float): Annualized volatility. Use 0.01 for 1% q (float): Annualized dividend yield. Use 0.01 for 1% Returns: float: Option vega
pyqstrat.marketdata_processor module¶
-
class
pyqstrat.marketdata_processor.
PathFileNameProvider
(path, include_pattern=None, exclude_pattern=None)[source]¶ Bases:
object
A helper class that, given a pattern such as such as “/tmp/abc*.gz” and an optional include and exclude pattern, returns names of all files that match
-
__init__
(path, include_pattern=None, exclude_pattern=None)[source]¶ Parameters: - path (str) – A pattern such as “/tmp/abc*.gz”
- include_pattern (str) – Given a pattern such as “xzy”, will return only filenames that contain xyz
- exclude_pattern (str) – Given a pattern such as “_tmp”, will exclude all filenames containing _tmp
-
-
class
pyqstrat.marketdata_processor.
SingleDirectoryFileNameMapper
(output_dir)[source]¶ Bases:
object
A helper class that provides a mapping from input filenames to their corresponding output filenames in an output directory.
-
__call__
(input_filepath)[source]¶ Parameters: input_filepath (str) – The input file that we are creating an output file for, e.g. “/home/xzy.gz” Returns: - Output file path for that input. We take the filename from the input filepath, strip out any extension
- and prepend the output directory name
Return type: str
-
-
class
pyqstrat.marketdata_processor.
TextHeaderParser
(record_generator, skip_rows=0, separator=', ', make_lowercase=True)[source]¶ Bases:
object
Parses column headers from a text file containing market data
-
__call__
(input_filename, compression)[source]¶ Args:
input_filename (str): The file to read compression (str): Compression type, e.g. “gzip”, or None if the file is not compressed
Returns: column headers Return type: list of str
-
__init__
(record_generator, skip_rows=0, separator=', ', make_lowercase=True)[source]¶ Parameters: - record_generator – A function that takes a filename and its compression type and returns an object that we can use to iterate through lines in that file
- skip_rows (int, optional) – Number of rows to skip before starting to read the file. Default is 0
- separator (str, optional) – Separator for headers. Defaults to ,
- make_lowercase (bool, optional) – Whether to convert headers to lowercase before returning them
-
-
pyqstrat.marketdata_processor.
base_date_filename_mapper
(input_file_path)[source]¶ A helper function that parses out the date from a filename. For example, given a file such as “/tmp/spx_2018-08-09”, this parses out the date part of the filename and returns milliseconds (no fractions) since the epoch to that date.
Parameters: input_filepath (str) – Full path to the input file Returns: Milliseconds since unix epoch to the date implied by that file Return type: int >>> base_date_filename_mapper("/tmp/spy_1970-1-2_quotes.gz") 86400000
-
pyqstrat.marketdata_processor.
create_text_file_processor
(record_generator, line_filter, record_parser, bad_line_handler, record_filter, missing_data_handler, aggregators, skip_rows=1)[source]¶
-
pyqstrat.marketdata_processor.
get_field_indices
(field_names, headers)[source]¶ Helper function to get indices of field names in a list of headers
Parameters: - field_names (list of str) – The fields we want indices of
- headers (list of str) – All headers
Returns: indices of each field name in the headers list
Return type: list of int
-
pyqstrat.marketdata_processor.
process_marketdata
(input_filename_provider, file_processor, num_processes=None, raise_on_error=True)[source]¶ Top level function to process a set of market data files
Parameters: - input_filename_provider – A function that returns a list of filenames (incl path) we need to process.
- file_processor – A function that takes an input filename and processes it, returning number of lines processed.
- num_processes (int, optional) – The number of processes to run to parse these files. If set to None, we use the number of cores present on your machine. Defaults to None
- raise_on_error (bool, optional) – If set, we raise an exception when there is a problem with parsing a file, so we can see a stack trace and diagnose the problem. If not set, we print the error and continue. Defaults to True
-
pyqstrat.marketdata_processor.
process_marketdata_file
(input_filename, output_file_prefix_mapper, record_parser_creator, aggregator_creator, line_filter=None, compression=None, base_date_mapper=<function base_date_filename_mapper>, file_processor_creator=<function create_text_file_processor>, header_parser_creator=<function <lambda>>, header_record_generator=<function text_file_record_generator>, record_generator=<pyqstrat.pyqstrat_cpp.TextFileDecompressor object>, bad_line_handler=<pyqstrat.pyqstrat_cpp.PrintBadLineHandler object>, record_filter=None, missing_data_handler=<pyqstrat.pyqstrat_cpp.PriceQtyMissingDataHandler object>, writer_creator=<pyqstrat.pyqstrat_cpp.ArrowWriterCreator object>)[source]¶ Processes a single market data file
Parameters: - input_filename (str) –
- output_file_prefix_mapper – A function that takes an input filename and returns the corresponding output filename we want
- record_parser_creator – A function that takes a date and a list of column names and returns a function that can take a list of fields and return a subclass of Record
- line_filter (optional) – A function that takes a line and decides whether we want to keep it or discard it. Defaults to None
- compression (str, optional) – Compression type for the input file. Defaults to None
- base_date_mapper (optional) – A function that takes an input filename and returns the date implied by the filename,
represented as millis since epoch. Defaults to helper
function base_date_filename_mapper
- file_processor_creator (optional) – A function that returns an object that we can use to iterate through lines in a file. Defaults to
helper function
create_text_file_processor
- bad_line_handler (optional) – A function that takes a line that we could not parse, and either parses it or does something else
like recording debugging info, or stopping the processing by raising an exception. Defaults to helper function
PrintBadLineHandler
- record_filter (optional) – A function that takes a parsed TradeRecord, QuoteRecord, OpenInterestRecord or OtherRecord and decides whether we want to keep it or discard it. Defaults to None
- missing_data_handler (optional) – A function that takes a parsed TradeRecord, QuoteRecord, OpenInterestRecord or OtherRecord, and decides
deals with any data that is missing in those records. For example, 0 for bid could be replaced by NAN. Defaults to helper function:
price_qty_missing_data_handler
- writer_creator (optional) – A function that takes an output_file_prefix, schema, whether to create a batch id file, and batch_size
and returns a subclass of
Writer
. Defaults to helper function:arrow_writer_creator
-
pyqstrat.marketdata_processor.
text_file_record_generator
(filename, compression)[source]¶ A helper function that returns an object that we can use to iterate through lines in the input file :param filename: The input filename :type filename: str :param compression: The compression type of the input file or None if its not compressed :type compression: str