Hi. I have created my own gym based env for intra day FX trading.
I have tried stable-baselines' PPO2 and DQN so far with MLP and LSTM policies. So far the best results in the training dataset is with PPO2 and LSTM. However, it seems to overfit because in the ttest set results are bad (almost random returns). I will test your library as well. My actions are buy and hold. So, long only. I tried training the same agent with various currencies, but nothing yet. I haveen't done any serious hyperparamter ttuning butt results in training set are amazing (hence overfitting).
Any ideas?