FTP alternative

760 views
Skip to first unread message

Louise Elmose Hedegaard

unread,
Feb 8, 2017, 6:02:22 AM2/8/17
to Google App Engine
Hi,

I need to connect to a FTP server to fetch files for my app.
There is no alternative, as the files are from another vendor, which only has the option to add the files to a FTP server.
I tried using the Java JSch library for a SFTP connection, but as the library creates a new thread when connecting the session this is not an option, as GAE does not support starting new threads either.

Does anybody have suggestions for how I can work around the fact that GAE does not support FTP connections?

Thanks,
-Louise

George (Cloud Platform Support)

unread,
Feb 8, 2017, 10:49:08 AM2/8/17
to google-a...@googlegroups.com
The Google Cloud Platform, considered in its entirety, does in fact support the use of the FTP protocol. You are right in saying that the App Engine in particular does not support FTP. This is to be expected, seen that it is meant to function as a scalable HTTP server. The other main component of the Cloud Platform, namely the Compute Engine, does in fact support FTP. You are encouraged to take advantage of this feature. 

You may consider creating a dedicated Compute Engine instance, and use it to regularly fetch new files from the FTP server of your choice, then upload them to one of the modern file storage services, such as Cloud Storage. In this way, the files become readily available, for the benefit of your app running in the App Engine. It is in fact possible to to commit entities to Cloud Datastore from any device capable of handling REST APIs. You can import the data you transferred to Cloud Storage (or similar) directly into Cloud Datastore or Cloud SQL.

More future-oriented people might resent having to use an FTP server; you may consider filing a feature request with your  third party vendor, and ask for a future-proof file-storage service, other than FTP. A suggested alternative to FTP would be, in our context, a REST API.  For reasons of reliability, scalability and security, FTP support is not a growth area worldwide. It is expected that vendors should gradually adopt sustainable technologies.

Nick

unread,
Feb 8, 2017, 4:47:06 PM2/8/17
to Google App Engine
I have been able to get ftp working on appengine using software libraries, you'll need to pick the write combo though, and it is a bit sensitive to the server on the other end (i.e. Passive mode etc)

If it's critical I suspect you'd be better off bringing another component into your stack to deal with this from GCE or GKE.

The source below will help you, it uses apache ftp client, which from memory uses just sockets, so as long as you have billing enabled with budget should be ok. I had to 'override' a bunch of the implementation for GAE - unfortunately there was a decent amount of blacklisted calls without extension points or declared final.

https://github.com/atomicleopard/thundr-contrib-gae-ftp

Jim

unread,
Feb 9, 2017, 2:22:23 PM2/9/17
to Google App Engine
"More future-oriented people might resent having to use an FTP server"  Ha, well said.

We have a few regularly scheduled FTP pulls that we need to get so we have a VM instance start up on a schedule, execute the FTP downloads, push the data to our GAE app, then the VM shuts itself down. 

Emanuele Ziglioli

unread,
Feb 9, 2017, 4:50:35 PM2/9/17
to Google App Engine
Thanks Nick

I've checked out your code and it looks similar to other work I've done myself before, adapting apache commons so that it would use GAE's low level support for sockets.
It did work for a while but then it stopped. When I complained about they told me simply that GAE don't support FTP, full stop:

Things like "More future-oriented people might resent having to use an FTP server" drive me furious, what about actually listening to customers (us)?
Future-oriented people might resent using GAE standard.
I could add a lot more but I guess our monthly bill is nothing so I'll shut up

Emanuele

Nick

unread,
Feb 9, 2017, 6:39:51 PM2/9/17
to Google App Engine
Sockets and Threads are both first class features of GAE, they should be totally reliable. Historically appengine does suffer from network configuration errors that don't get picked up unless they're heavily used (i.e. geographical issues and unusual ports) - it seems monitoring is based on error rates of total. But reporting them usually gets and investigation and a fix.

Socket usage is much more heavily affected by quota limitations - for example using Apache HttpClient instead of URLFetch will net you problems as you scale. But for your usage you should not expect to see any issues.

Emanuele Ziglioli

unread,
Feb 9, 2017, 6:58:08 PM2/9/17
to Google App Engine
Interesting so you're saying that an error like:
System error: errno: 113, detail:no route to host
java.net.SocketException: System error: errno: 113, detail:no route to host
would be due to a network configuration error on Google's part?
How should I report it then?

Amaury Gauthier

unread,
Feb 10, 2017, 5:22:35 AM2/10/17
to Google App Engine
FTP on AppEngine Standard is really not an option. It *can* work *sometimes*.

From my experience, the underlying socket infrastructure does not always give you the same IP for each socket created (which makes sense when we consider the AppEngine automatic scalability features and how massively distributed the AppEngine infrastructure must be to offer such a promise). Given this, the FTP control and data connexions won't always have the same IP source, and thus will confuse the FTP server a lot.

Nick

unread,
Feb 10, 2017, 6:49:40 PM2/10/17
to Google App Engine
Maybe someone from google can suggest how to report it, but I wouldn't hold your breath on a resolution - the ability to replicate and diagnose might be too hard on their end.

I would also make sure you've eliminated any possibility of it being a config issue on your end.

But I kind of agree with the above - if it's critical probably just do something basic on GCE or GKE. GKE will cost a minimum of about $15 a month for 3 shared vCPUs - probably not worth any more of your time if you can just build a lightweight app there.

Louise Elmose Hedegaard

unread,
Feb 13, 2017, 1:34:18 PM2/13/17
to Google App Engine
Hi all,

Thank you very much for your input.

I cannot see how I can answer individually, so instead:

@George: I do consider myself future-oriented, so I could try to suggest that they provide the information in another way. The vendor is a big international firm though, and I need the FTP files for a feature in my little app, which so far will have only 3 customers, so I do not think I have a big (read:any) say in their solutions.
@Nick: Ok, so your library can be used in the GAE app itself without no need for e.g. a compute engine instance etc.? I'm not sure what you mean by "you'll need to pick the write combo though"?
@Jim: the application on the VM is not a GAE app then?
@George/Nick: what would the most basic setup on GCE/GKE be then? Adding an application to a compute engine instance that fetches files from the FTP server, and uploads to e.g. Cloud storage? But what about the app on the compute engine instance, that would not be a GAE app then?

I really appreciate your input, but I can't shake the feeling that what should have been a small, simple part of my GAE app - i.e. reading files on an FTP server (unscalable as it is), exploded to a big solution which requires a compute engine instance, uploading to cloud storage etc. - and it sounds like it will take me the same amount of time to develop as the other part of the app took...

- Louise

Jim

unread,
Feb 13, 2017, 2:30:10 PM2/13/17
to Google App Engine
Louise,

@Jim: the application on the VM is not a GAE app then?

Correct, in our setup it is running on an Amazon EC2 instance;  we put it together back before the days of GCE VMs.  

The app itself is Java using Apache Commons FTPClient.  Yes, it is a lot of work for something so simple as a file transfer, which gets us back to the original commentary about FTP being archaic.  Understanding how FTP handles ports it is quite reasonable that a platform such as GAE cannot support it while at the same time providing all the other stuff we love about it.  Also understanding some companies can't/won't move away from FTP leads us to fun little hacking exercises such as this.  

Distributed data integration can be a dirty business.


On Wednesday, February 8, 2017 at 5:02:22 AM UTC-6, Louise Elmose Hedegaard wrote:

Nick

unread,
Feb 13, 2017, 7:45:39 PM2/13/17
to Google App Engine
Hi Louise - I meant 'right' combo - I totally hoped no one would notice that 😔

All I mean is that different http implementations make different assumptions - particularly around threading and security. Finding one that works is not easy - the example I linked required hacking apache ftp client.

Regarding GCE or GKE - You may have luck finding an open source application that slurps ftp files down and pushes them to gcs/s3 that you can containerise fairly trivially, otherwise you'll have to write something yourself.

Or consider a third party service (if such a thing exists)
Reply all
Reply to author
Forward
0 new messages