Interact with a website using Haxe?

130 views
Skip to first unread message

Dlean Jeans

unread,
Aug 6, 2017, 8:50:23 AM8/6/17
to Haxe
Hi,
I'm inspired by all that Python scripts that people write to do some sweet stuff like finding stuff or downloading things they need from the Internet. And I think it would come in handy and also cool if I could do the same, in Haxe.
So googling brought up no results. All I found is this API page of haxe.Http which I thought I could do something with it but all I could do is to get a website's HTML data and don't know to send some request to it.

It would be great if you can give me some pointers! Thanks in advance!

Norbert Melzer

unread,
Aug 6, 2017, 9:52:47 AM8/6/17
to Haxe

When you got some HTML from the server you did already sent a request. So please get a little bit more into detail what you want to do.


--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.

Dlean Jeans

unread,
Aug 7, 2017, 2:39:15 AM8/7/17
to Haxe
For simplicity, say, I wanna get a picture of a puppy from Google, how would you submit 'puppy' to Google Images and download the first image?

Norbert Melzer

unread,
Aug 7, 2017, 4:06:08 AM8/7/17
to Haxe
Googles image-search API has got deprecated a long time ago, but google custom search API is mentioned as a replacement.


Just read through that documentation and then send and parse requests as described in that document.

Dlean Jeans <dlean...@gmail.com> schrieb am Mo., 7. Aug. 2017 um 08:39 Uhr:
For simplicity, say, I wanna get a picture of a puppy from Google, how would you submit 'puppy' to Google Images and download the first image?

Marcelo de Moraes Serpa

unread,
Aug 7, 2017, 2:23:53 PM8/7/17
to haxe...@googlegroups.com
If no javascript needs to be interpreted, then a simple HTTP request will do. You get the response, parse the HTML using a XML library (if you're using nodejs, you can use the jsdom npm), and then search for the image elements to fetch their `src`. If you need to interpret js code from the page, read/send cookies and keep a session, then a headless browser might be a better choice. Phantomjs is a good one if you're on node.

Norbert Melzer

unread,
Aug 7, 2017, 4:05:56 PM8/7/17
to haxe...@googlegroups.com
Google forbids scraping in its TOS as I read this sentence: (https://www.google.com/intl/en/policies/terms/)

> don't […] try to access [the service] using a method other than the interface and the instructions that we provide.

Interface and instructions they provide do either use a Graphical Browser or are accessing the API.

In general, I'd never do scraping without asking the owner of the site before. Scripted access/scraping a site can lead to increased load and data transfer on the server which at the end will produce higher costs.
Message has been deleted

Dlean Jeans

unread,
Aug 9, 2017, 1:43:05 AM8/9/17
to Haxe

Norbert Melzer

unread,
Aug 9, 2017, 3:12:04 AM8/9/17
to haxe...@googlegroups.com
Dlean Jeans <dlean...@gmail.com> schrieb am Mi., 9. Aug. 2017 um 07:39 Uhr:
I don't really think it's forbidden or something: https://stackoverflow.com/questions/20716842/python-download-images-from-google-image-search.

This question does try to use the deprecated Image-Search-API and is NOT scraping. Most of the answers do suggest the new custom search API, as I did a couple of posts before.

Also I am totally not alone by my understanding, this SO answer goes a bit more into detail and even tells numbers about when you risk to get detected when scraping the result page directly. Those maybe outdated though:

Reply all
Reply to author
Forward
0 new messages