Webhooks

122 views
Skip to first unread message

Adrián

unread,
Jun 21, 2021, 4:35:13 PM6/21/21
to Web Scraping
Hi guys.

I have multiple projects and I want to run them all in an automated fashion.

I believe for that I need to use webhooks but I'm unsure how to use and set them up.

What I want:
I want to run Project A and when that finishes I want the CSV file to be saved in a folder, automatically, and then run Project B and so on until everything is done and I have all my CSV files.

If I understand correctly I need to set up an account in a web service that can receive the  parsehub_callback I set in the project's webhook field and then, in that app, I would need to tell it to run the next project and so on. Is that correct?
  • And to tell it to run I mean creating a file to do something (php is my preferred language) or setting it up on the built-in app options it might have.
  • I read ParseHub's webhook info but don't get the "local" part of it.
  • I found pipedream as an app to achieve that but I'm overwhelmed by all it's options and functions, nor can I tell if it's got what I need.
    • By following these last two points I would need to write this on the first project's webhook field: https://my_unique_pipedream_url:5000/parsehub_callback, correct?
    • Should I write in this app file or built-in option that I want that run to be saved as a CSV file and then to go for the next? If so, how do I tell it to go for the next?
Thank you,
Adrián.

Andrew11

unread,
Jun 21, 2021, 5:17:04 PM6/21/21
to Web Scraping
You might consider Parabola, although it isn't free:
I'm not sure even Ben can help you with the homegrown way of setting this up (he's the ParseHub QA engineer that shows up here).

Andrew11

unread,
Jun 21, 2021, 5:20:03 PM6/21/21
to Web Scraping
See also https://parabola.io/product-overview/updating-and-running-your-flow

On Monday, June 21, 2021 at 1:35:13 PM UTC-7 Adrián wrote:

Adrián

unread,
Jun 23, 2021, 11:11:27 AM6/23/21
to Web Scraping
Hi Andrew.
I do intend to keep things in a free version for now.

I can also consider using the API, a custom .php file and databases too.

How would you go on about it?

Thanks.

Andrew11

unread,
Jun 23, 2021, 11:26:23 AM6/23/21
to Web Scraping
I've never needed to automate my scrapes this much, so I've never used webhooks or API. Maybe you can change the IMPORTDATA url in this tutorial to kick off your scrapes from a Google Sheet macro?

b...@parsehub.com

unread,
Jun 23, 2021, 11:30:08 AM6/23/21
to Web Scraping
Hi Adrián,
You will want to set up a web service to receive the webhook, and use this API call to trigger the next run when the webhook payload is received:  https://www.parsehub.com/docs/ref/api/v2/?php#run-a-project
You can specify that you want your data to be delivered in CSV format by using the format=csv parameter in your API call to retrieve the run data after the run is complete (https://www.parsehub.com/docs/ref/api/v2/?php#get-data-for-a-run).

Adrián

unread,
Jun 23, 2021, 1:12:35 PM6/23/21
to Web Scraping
Alright.
I'll try it out, thank you very much.

Andrew11

unread,
Jun 23, 2021, 1:15:08 PM6/23/21
to Web Scraping
Just out of curiosity, what's the simplest platform for a web service? Do you use AWS somehow?

Adrián

unread,
Jun 23, 2021, 1:24:25 PM6/23/21
to Web Scraping
I'll jump in the wagon. Which free service could I use to achieve what I wanted? What should I look for in that app?

Andrew11

unread,
Jun 24, 2021, 8:05:03 PM6/24/21
to Web Scraping
Without knowing more, I would guess it might be as simple as posting a PHP page at a known URL and plugging in that address as your webhook. Inside the PHP page it should call the API of the next scrape.

Adrián

unread,
Jun 29, 2021, 2:21:48 PM6/29/21
to Web Scraping
Damn, that does sound like a much simpler solution. I'll try it out.
Thank you Andrew11.

Andrew11

unread,
Jun 29, 2021, 2:35:47 PM6/29/21
to Web Scraping
Sure! I think I found confirmation it should work: https://stripe.com/docs/webhooks/build

Adrián

unread,
Jun 30, 2021, 2:22:39 PM6/30/21
to Web Scraping
Hi Andrew,
I'm trying to keep things as free as possible since I believe I need just a tiny bit of resources. Stripe seems to charge, although I couldn't tell if I'm to be charged once and only I do a sale. They ask for a bunch of business related questions and document I do not have.

I think I'm quite close to achieve what I need with Pipedream but I'm missing something as I don't get any values (undefined) when trying out a test and lines of error I don't understand.
You can see the my post about it here.

I've also opened an account in a free hosting service and uploaded my local composer and vendors for ParseHub's PHP library (msankhala->parsehub-php, nategood->httpful and monolog) but when I try stuff like:

//ParseHub API implementation
// Include the autoloader so we can dynamically include the rest of the classes.
require_once(  dirname( __FILE__  ) . '/autoloader.php' );

$api_key = 'apyKey';

$parsehub = new parser\Parsehub($api_key);

$projectList = $parsehub->getProjectList();

echo $projectList;

I get errors when trying to instantiate the ParseHub class.
I believe it's got to do with composer's autoloader configuration that get's messed up when directly uploading as if some configuration gets broken upon upload.
I couldn't work around it by using an absolute namespace path to call the class or using a custom autoloader with the same approach.

I thought you meant to try this in your previous suggestion but I don't get how it should work:
I ran a Project and put this in the webhook field "myfreeHostURL:5000/parsehub_callback" and in my index.php file I had a var_dump($_POST); to see what I got but, I'm missing something here, I don't think I'm doing it right.



Andrew11

unread,
Jun 30, 2021, 2:27:29 PM6/30/21
to Web Scraping
Oh, I didn't know Stripe would actually be useful to you in itself, I just thought it was interesting they said a PHP page on your server will work to receive the webhook. This is way out of my depth, but maybe you could try logging to the console the "err" object as well as "jobs" as seen in your PipeDream post?

Adrián

unread,
Jun 30, 2021, 2:28:18 PM6/30/21
to Web Scraping
By the way. I can only use node.js for coding, as I understand, in pipedream. So in there I need to use that other package.
Both the PHP and Node.js libraries are quite out of date I believe. For example, I get a "job token" error in pipedream but found no such thing in the ParseHub's docs. I only have 1 woker since I'm using a free account of ParseHub.

I've also tried using Postman, followed some tutorials and got answers back but it also has a node.js integration to code with so I felt like stuck in the same place as pipedream. Perhaps any of you guys here have more experience with it and can show me in the right direction.

What I want:
After each ParseHub Run finishes I would save the data gathered (google docs or csv export) and then trigger then next run, save that too and go for the next and so on till the last.
It would be a one button thing whenever I need that data so no cron jobs needed.

Andrew11

unread,
Jun 30, 2021, 2:31:16 PM6/30/21
to Web Scraping
Hopefully Ben has some input... I've actually never seen anyone else post an answer here ; (

Adrián

unread,
Jun 30, 2021, 2:32:43 PM6/30/21
to Web Scraping
For "jobs" I always get "undefined", for "err" I get these which I don't know how to read:
at Request.null (/opt/ee/node_modules/parsehub/parsehub.js:42:20) 
at Request.self.callback (/opt/ee/node_modules/request/request.js:199:22) 
at Request.emit (events.js:376:20) at Request.null (/opt/ee/node_modules/request/request.js:1160:14) 
at Request.emit (events.js:388:22) at IncomingMessage.null (/opt/ee/node_modules/request/request.js:1111:12) 
at IncomingMessage.emit (events.js:388:22) at null.endReadableNT (internal/streams/readable.js:1336:12) 
at process.processTicksAndRejections (internal/process/task_queues.js:82:21)

Adrián

unread,
Jun 30, 2021, 2:34:00 PM6/30/21
to Web Scraping
I hope so too :)

Andrew11

unread,
Jun 30, 2021, 2:40:57 PM6/30/21
to Web Scraping
Maybe you could ditch the open source ParseHub wrapper and just call the APIs through phphttpclient like it does? It doesn't seem to be very well known anyway.

Adrián

unread,
Jun 30, 2021, 2:44:51 PM6/30/21
to Web Scraping
I'll give that a try in the meantime.

Adrián

unread,
Jun 30, 2021, 2:47:53 PM6/30/21
to Web Scraping
I want to make sure I understand the steps:
1- I run a project on my ParseHub client.
2- ParseHub will send a POST request to my Webhook URL whenever any of the project’s runs’ status or data_ready fields change (extracted from the docs)
3- At my Webhook URL I should be able to interact with these POSTs, I can catch the POST body which will have the run object. Will I be able to catch this data with a regular $_POST in the backend and do my stuff? Will it work the moment the project sends the POST or do I need another kind of service, like those of pipedream/Postman, with HTTP?
4- Do what I want with the data. AKA: write it in a google doc, download the CSV file or upload to a database.
5- When setp 4 is finished and before exiting the script I need to tell, by code in my Webhook URL, it to start the next run which should point to the same Webhook URL to repeat the save process.
6- Repeat steps 3 to 5 until all runs are done.

Is this correct?

El miércoles, 30 de junio de 2021 a las 15:27:29 UTC-3, Andrew11 escribió:

Andrew11

unread,
Jun 30, 2021, 2:54:43 PM6/30/21
to Web Scraping
I think Postman's just used to build your API requests and look at the response in a programming/debug cycle. I'm not a PHP guy though.

Adrián

unread,
Jun 30, 2021, 5:21:53 PM6/30/21
to Web Scraping
I've made some progress in getting trying to get a project but I'm failing to authenticate properly.
This is my code:
// Point to where you downloaded the phar
include('./httpful.phar');

$response = \Httpful\Request::get($uri)
->addHeaders(array(
'Content-Type' => 'application/x-www-form-urlencoded',
'charset' => 'utf-8',
'api_key', ' api_key ',
))
->send();
var_dump($response);

And this is my response:
"401 Unauthorized
This server could not verify that you are authorized to access the document you requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not understand how to supply the credentials required."

I tried using ->authenticateWith('api_key', 'api_key') instead of the header to no avail.
I tried changing to post (\Httpful\Request::post($uri) but I get a "405 Method Not Allowed. The method POST is not allowed for this resource."
My guess is that A) this library's authentication method is trying to send user and password over to the ParseHub API but I need to to send api_key instead. B) I'm not properly sending the api_key the way that I should. There's little explanation and examples on this library as well -_- Any other I should switch to for this?

Most of all I would appreciate clarification on my step list as I don't know that I'll be able to achieve what I want with this static URL I'm trying out with now.

Thank you for your attention thus far too.
Cheers!


El miércoles, 30 de junio de 2021 a las 15:40:57 UTC-3, Andrew11 escribió:

Adrián

unread,
Jun 30, 2021, 5:25:02 PM6/30/21
to Web Scraping
I don't understand your assessment of Postman. Are you saying that Postman will be able to receive my ParseHub's run's status changes (by signaling to a Postman URL:500/callback_function -webhook) and then just look at it without being able to do something with it?

Andrew11

unread,
Jun 30, 2021, 5:31:27 PM6/30/21
to Web Scraping
I think you just use Postman to compose your API urls, and it doesn't have a role in the final program. Just PHP and API. In your program I noticed that there's a comma between "api_key" and "api_key" instead of a =>, is that a typo? And just to ask a foolish question, you do know how to replace "api_key" with your actual api key, right?

Adrián

unread,
Jun 30, 2021, 7:01:37 PM6/30/21
to Web Scraping
Good call on the  =>, it was a typo, thanks. Unfortunately there was no change. Is passing that data through the addHeader method the right way? I feel it's not. In what way does ParseHubs API need the api_key to be received?

On a side note: In var_dump($response); I can see there's plenty of data like:
["username"]=> NULL ["password"]=> NULL ["serialized_payload"]=> NULL ["payload"]=> NULL ["parse_callback"]=> NULL ["error_callback"]=> NULL ["send_callback"]=> NULL ["follow_redirects"]=> bool(false) 
and so on.

It's good to mention, yeah I could have been missing that too but I do replace that placeholder with the correct sensitive data both in the URL as in the api_key.

By the way, I've also seen that variable written _apikey. Changing to it makes no difference.

About Postman:
"Compose your API URLs",  my URLs?
There's ParseHub's API, I've just realized I can create an API at Postman, and then there's the phphttpClient API.
If I understand correctly you're implying that Postman won't work for what I want to do, is that  correct?

Thank you!

Andrew11

unread,
Jun 30, 2021, 7:05:34 PM6/30/21
to Web Scraping
I think it's just simpler than you'd expect. If you go to https://www.parsehub.com/docs/ref/api/v2/?php#backwards-compatibility and scroll down, there's sample PHP code that doesn't use weird libraries and such.

Adrián

unread,
Jun 30, 2021, 7:15:21 PM6/30/21
to Web Scraping
Sweet Jesus, indeed that worked.
I'll what I can do from here.
Much appreciated attention Andrew.
Cheers!

Andrew11

unread,
Jun 30, 2021, 7:16:17 PM6/30/21
to Web Scraping
: ) I'll tell the next guy. Cheers!

Adrián

unread,
Jun 30, 2021, 8:08:44 PM6/30/21
to Web Scraping
Man.
I only needed to give the api_key through the URL (?api_key=API_KEY) and then use GET or POST depending on what method I'm using.
Now Postman is working and I bet pipedream too but can't confirm right now, no extra headers needed.

Andrew11

unread,
Jun 30, 2021, 8:14:58 PM6/30/21
to Web Scraping
So what is this program going to do? If you can't tell that's OK. I imagine it uses early-stage scrape results in later-stage scrapes?
Reply all
Reply to author
Forward
0 new messages