Connection Refused while scraping the data

283 views
Skip to first unread message

Kuldeep Avsar

unread,
Mar 6, 2020, 11:20:54 PM3/6/20
to golang-nuts
I am trying to Scrape the Job Titles one by one from the Indeed.co.in website
but it through me connection refused problem while I am visiting to the particular jobs Title
categories page and trying to take response back from the page but It's shows error on that time. 
please help me out to solve this problem i am tried to solve this but this not solved. Please help.
2020/03/07 09:08:41 Error to Connect with Indeed Jobs Category Page.Get https://indeed.co.in/browsejobs/Engineering: dial tcp 169.44.165.69:443: connect: connection refused

package main
import (
    "crypto/tls"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"

    "github.com/PuerkitoBio/goquery"
)
func GetBrowseJobs(Url string) {
    response, err := http.Get(Url)
    if err != nil {
        log.Println("Error to Connect with Indeed Home page.", err)
        return
    }
    defer response.Body.Close()
    document, err := goquery.NewDocumentFromReader(response.Body)
    if err != nil {
        log.Fatal("Error loading HTTP response body", err.Error())
        return
    }
    document.Find("a.icl-GlobalFooter-link").Each(processElement)
}
func processElement(index int, element *goquery.Selection) {
    href, exists := element.Attr("href")
    if exists {
        BrowseJobsPage(href)
        return
    }
}
func BrowseJobsPage(Urls string) {
    fmt.Println(Urls)
    response, err := http.Get(Urls)
    if err != nil {
        log.Println("Error to Connect with Indeed Browse Jobs Page.", err)
        return
    }
    defer response.Body.Close()
    document, err := goquery.NewDocumentFromReader(response.Body)
    if err != nil {
        log.Fatal("Error loading HTTP response body", err.Error())
        return
    }
    document.Find("table#categories tbody tr td a").Each(Processjobs)
    fmt.Println("***********************************************************************")
}
func Processjobs(index int, element *goquery.Selection) {
    href, exists := element.Attr("href")
    if exists {
        PerJobsTitlePage(href)
        return
    }
}

func PerJobsTitlePage(Urls string) {
    fmt.Println(Urls)
    tlsConfig := &tls.Config{
        InsecureSkipVerify: true,
    }
    transport := &http.Transport{
        TLSClientConfig: tlsConfig,
    }
    client := http.Client{Transport: transport}
    response, err := client.Get("https://indeed.co.in" + Urls)
    if err != nil {
        log.Println("Error to Connect with Indeed Jobs Category Page.", err)
        return
    }
    defer response.Body.Close()
    body, err := ioutil.ReadAll(response.Body)
    if err != nil {
        log.Println("Page response is nil", nil)
    }
    document, err := goquery.NewDocumentFromReader(response.Body)
    if err != nil {
        log.Fatal("Error loading HTTP response body", err.Error())
        return
    }
    document.Find("table#titles tbody tr td p.job a").Each(ProcessSinglejob)
    fmt.Println("***********************************************************************")
}
func ProcessSinglejob(index int, element *goquery.Selection) {
    href, exists := element.Attr("title")
    if exists {
        fmt.Println(href)
        return
    }
}
func main() {
    GetBrowseJobs("https://www.indeed.co.in/")
}

Kurtis Rader

unread,
Mar 6, 2020, 11:36:04 PM3/6/20
to Kuldeep Avsar, golang-nuts
On Fri, Mar 6, 2020 at 8:21 PM Kuldeep Avsar <kuldee...@gmail.com> wrote:
I am trying to Scrape the Job Titles one by one from the Indeed.co.in website
but it through me connection refused problem while I am visiting to the particular jobs Title
categories page and trying to take response back from the page but It's shows error on that time. 
please help me out to solve this problem i am tried to solve this but this not solved. Please help.
2020/03/07 09:08:41 Error to Connect with Indeed Jobs Category Page.Get https://indeed.co.in/browsejobs/Engineering: dial tcp 169.44.165.69:443: connect: connection refuse

The web site you are accessing thinks you are executing a DDOS attack or are otherwise violating their terms of service. This has nothing to do with the Go language. You need to rate limit your requests of that site.

--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Amnon Baron Cohen

unread,
Mar 7, 2020, 3:21:23 AM3/7/20
to golang-nuts
You might get better results if you at a www to the beginning of the hostname.

curl https://www.indeed.co.in/browsejobs/Engineering

does download a page

whereas curl https://indeed.co.in/browsejobs/Engineering
does give a connection refused error.

Jesper Louis Andersen

unread,
Mar 7, 2020, 6:40:11 AM3/7/20
to Kurtis Rader, Kuldeep Avsar, golang-nuts
On Sat, Mar 7, 2020 at 5:35 AM Kurtis Rader <kra...@skepticism.us> wrote:
The web site you are accessing thinks you are executing a DDOS attack or are otherwise violating their terms of service. This has nothing to do with the Go language. You need to rate limit your requests of that site.


To add:

Look up the concept of a circuit breaker and run it in reverse. Rate limit your requests, and if the site takes too long to respond, or if it returns a 429 or the like, you should trip the circuit breaker for a while to cool down. Scraping can put a tremendous load on a web site and it is your duty as the scraper to be careful. It is especially important in a language like go, where you can easily make ten thousand requests to a website from a single program.


--
J.
Reply all
Reply to author
Forward
0 new messages