How to configure useragent in curl php google app engine production environment?

363 views
Skip to first unread message

Vivekanand Ilango

unread,
Nov 30, 2016, 9:09:52 AM11/30/16
to Google App Engine

I am trying to extract data from nseindia site. It works fine in the development environment but after I deploy to the production cloud I get permission denied.

Output if run in local development environment:
$ php helloworld.php
UNDERLYING                          ,SYMBOL    ,DEC-16     ,JAN-17     ,FEB-17     ,MAR-17     ,JUN-17     ,SEP-17     ,NOV-16     ,DEC-17     ,JUN-18     ,DEC-18     ,JUN-19     ,DEC-19     ,JUN-20     ,DEC-20     ,JUN-21     
NIFTY BANK                          ,BANKNIFTY ,40         ,40         ,40         ,           ,           ,           ,           ,           ,           ,           ,           ,           ,           ,           ,           
NIFTY 50                            ,NIFTY     ,75         ,75         ,75         ,75         ,75         ,75         ,           ,75         ,75         ,75         ,75         ,75         ,75         ,75         ,75         

Output on running in production attached "ProductionError.jpg"

helloworld.php <?php try { //curl get function curl_get( $curl, $url, $cookiefile) { curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" ); //curl_setopt($curl, CURLOPT_COOKIEJAR, $cookiefile); curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); $data = curl_exec($curl); return $data; } //curl post function curl_post( $curl, $url, $cookiefile, $post) { curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_POST, 1); curl_setopt($curl, CURLOPT_POSTFIELDS, $post); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" ); //curl_setopt($curl, CURLOPT_COOKIEJAR, $cookiefile); curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); $data = curl_exec($curl); return $data; } //cookie file $cookiefile = "cookie.txt"; $output = system("curl -A ,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13' --url https://www.nseindia.com/content/fo/fo_mktlots.csv");

        $curl = curl_init( );

        $data = curl_get( $curl, "https://www.nseindia.com/content/fo/fo_mktlots.csv", $cookiefile);
        echo $data;

        $data = curl_post( $curl, "https://www.nseindia.com/content/fo/fo_mktlots.csv", $cookiefile, "");
        echo $data;
} catch (Exception $e) {
        echo 'Caught exception: ',  $e->getMessage(), "\n";
}
ProdcutionError.jpg

Jordan (Cloud Platform Support)

unread,
Nov 30, 2016, 4:45:35 PM11/30/16
to Google App Engine
Hello Vivekanand,

I was able to reproduce your described issue from the Google Cloud Shell. The 403 you are seeing is due to the actual configuration of https://www.nseindia.com blocking the Google Cloud IP range. 

There are multiple ways the https://www.nseindia.com site could be doing this. One reason could be that their .htaccess file is specifically blocking IP addresses. In this case they need to ensure that App Engine IPs are not blocked. You can find out the IPs that App Engine uses by following the instruction under Static IP Addresses and App Engine apps.

I would also recommend ensuring that the specific directory '/content/fo/' is browsable by adding Options +Indexes to the .htaccess file. Additionally, I would recommend you check the permissions on the actual CSV file, as tricks such as that mentioned here could prevent it from being accessed.  Going through the complete site configuration is needed here.

Note: As this is not an actual Google Cloud issue, I am not able to confirm the actual cause or solution. You will need to consult with the owners of https://www.nseindia.com to further investigate. 

Jordan (Cloud Platform Support)

unread,
Nov 30, 2016, 4:59:43 PM11/30/16
to Google App Engine
As an additional thought, the https://www.nseindia.com server could also be blocking CURL requests. So if the above does not turn out to be the solution, I would recommended looking into this as well. 
Reply all
Reply to author
Forward
0 new messages