Hi all,
I'm attempting to run a very simple algorithm on a set of custom datapoints I imported through the data parameter for run_algorithm().
The data itself is of 1-minute resolution. The backtest is run over 1 month's worth of data (~40k bars) and takes somewhere between 1-2 minutes to run.
I have a feeling this is a bit slower than what I should expect. I'm attaching my pstat profile of the run, hopefully there's a chance anyone can take a quick look to see what the major bottleneck is?
I can see a lot of the pandas functions taking a significant time to execute, but I'm not sure why or how I can optimise that.
My code for the data import is below:
def csv_mt4_ohlcv_to_panel( filename: object, symbol: object, debug: object ) -> object:
"""
Imports data from a trimmed CSV file into a pandas dataframe.
:param filename: String for CSV filename
:param symbol: Symbol to name the panel by
:param debug: Boolean whether you want head to be printed
:return: The df in question
"""
df = pd.read_csv( filename,
# header = None,
# names = ['date', 'time', 'open', 'high', 'low', 'close', 'volume'],
# parse_dates = [['date', 'time']],
parse_dates = ['date_time'],
compression = 'infer',
)
df.set_index( 'date_time', inplace = True )
df.tz_localize( pytz.timezone( 'EET' ) )
if debug:
print( df.head() )
od = OrderedDict()
od[symbol] = df
panel = pd.Panel( od )
panel.minor_axis = ['open', 'high', 'low', 'close', 'volume']
if debug:
print( panel )
return panel
And my code for the actual strategy is below:
def initialize( context ):
context.i = 0
context.security = symbol( 'EURUSD' )
return
def handle_data( context, data ):
context.i += 1
if context.i < 50:
return
current_positions = context.portfolio.positions[symbol( 'EURUSD' )].amount
ma_long = history( 250, '1m', 'price' ).mean()
ma_short = history( 50, '1m', 'price' ).mean()
if ma_short[0] > ma_long[0]:
order_target( context.security, 1 )
elif ma_short[0] < ma_long[0]:
order_target( context.security, -1 )
record( eurusd = data[symbol( 'EURUSD' )].price,
short_mavg = ma_short[0],
long_mavg = ma_long[0] )
return
algo = run_algorithm(
start = pd.datetime( 2016, 2, 1, 0, 0, 0, 0, pytz.timezone('EET') ),
end = pd.datetime( 2016, 3, 31, 0, 0, 0, 0, pytz.timezone('EET') ),
initialize = initialize,
handle_data = handle_data,
# analyze = analyze,
data = dataset,
data_frequency = 'minute',
capital_base = 1e6 )
Looking forward to any insight! Thank you in advance.
I'm considering writing an ingest-function and doing my own bundle, but I'm still getting to grips with it. Will it massively increase my performance?