Hi Eric,
I don't see any problems with using additional data for training. In fact, some of the most successful author identification approaches do so on a regular basis, based on the impostor method. Everything is allowed, as long as it is not cheating, unethical, or any other form of scientific misconduct. One additional limitation, however, is that we cannot grant access to the any web service while your software is executed, lest the test data leak.
So, if you wish to use external data, you can do so only by downloading it up front and using it offline. The important thing is that you tell us about what you did and how you did it and why you believe it is working in your notebook paper.
Best,
Martin