tensortrade.environments.trading_environment module

class tensortrade.environments.trading_environment.TradingEnvironment(exchange, action_strategy, reward_strategy, feature_pipeline=None, **kwargs)[source]

Bases: gym.core.Env

A trading environment made for use with Gym-compatible reinforcement learning algorithms.

__init__(exchange, action_strategy, reward_strategy, feature_pipeline=None, **kwargs)[source]
Parameters
  • exchange (Union[InstrumentExchange, str]) – The InstrumentExchange that will be used to feed data from and execute trades within.

  • action_strategy (Union[ActionStrategy, str]) – The strategy for transforming an action into a Trade at each timestep.

  • reward_strategy (Union[RewardStrategy, str]) – The strategy for determining the reward at each timestep.

  • feature_pipeline (optional) – The pipeline of features to pass the observations through.

  • kwargs (optional) – Additional arguments for tuning the environment, logging, etc.

property action_strategy

The strategy for transforming an action into a Trade at each timestep.

Return type

ActionStrategy

property exchange

The InstrumentExchange that will be used to feed data from and execute trades within.

Return type

InstrumentExchange

property feature_pipeline

The feature pipeline to pass the observations through.

Return type

FeaturePipeline

render(mode='none')[source]

Renders the environment.

reset()[source]

Resets the state of the environment and returns an initial observation.

Return type

DataFrame

Returns

observation – the initial observation.

property reward_strategy

The strategy for determining the reward at each timestep.

Return type

RewardStrategy

step(action)[source]

Run one timestep within the environment based on the specified action.

Parameters

action – The trade action provided by the agent for this timestep.

Return type

Tuple[DataFrame, float, bool, dict]

Returns

observation (pandas.DataFrame) – Provided by the environment’s exchange, often OHLCV or tick trade history data points. reward (float): An amount corresponding to the benefit earned by the action taken this timestep. done (bool): If True, the environment is complete and should be restarted. info (dict): Any auxiliary, diagnostic, or debugging information to output.