Web Crawler Coding Issues

41 views
Skip to first unread message

andreok7

unread,
Jun 29, 2016, 7:14:05 AM6/29/16
to beautifulsoup
I can't figure out why it produces no responses!! No syntax error..no apparent mistakes...I just get this

        C:\Python34\python.exe C:\Users\TheWo_000\PycharmProjects\Youtube\lesson1_on.py

        Process finished with exit code 0

The input is as follows:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
from bs4 import BeautifulSoup


def analyst_spider(max_pages):
current_page = 0

while current_page < max_pages:
url = 'http://kr.indeed.com/%EC%B7%A8%EC%97%85?q=analyst&l=%EC%84%9C%EC%9A%B8&start=' + str(current_page) + '0'
plain_text = requests.get(url).text
soup = BeautifulSoup(plain_text, 'html.parser')

print('\n----------------------------------\n')
print('START OF PAGE #%s' % str(int(current_page + 10 / 10)))
for link in soup.findAll('a', attrs={'class': 'turnstileLink'}):
href = link.get('href')
title = link.get('title')
print('\t%s: %s' % (title, href))
print('END OF PAGE #%s' % str(int(current_page + 10 / 10)))
current_page += 1


analyst_spider(2)

Connor

unread,
Jun 29, 2016, 8:48:12 PM6/29/16
to beautifulsoup
Hey there,

Which version of Python are you using to run this? You can check by typing "python --version" in a terminal.

Which version of bs4 are you using? You can check by typing "python" to get an interactive prompt, then "import bs4", then "bs4.__version__".

I want to see if I can reproduce your problem, because the code you posted works for me. I get the following output:


----------------------------------

START OF PAGE #1
    CIB - DCM - Analyst/Associate - Seoul: /rc/clk?jk=5e6e026d37a65890&fccid=c46d0116f6e69eae
    글로벌 데이터 애널리스트: /rc/clk?jk=2150a9eb69799f8d&fccid=a8886658d41406ea
    Global Data - Data Analyst, Seoul: /rc/clk?jk=1234bd9fe628813a&fccid=f770da67b3b51c62
    CIB - Equity Research 1st year Analyst - Seoul: /rc/clk?jk=4e953d5b12bedf1b&fccid=c46d0116f6e69eae
    Strategic Communications Analyst: /rc/clk?jk=e9279ec49d6d78fe&fccid=4e041af1d0af1bc8
    2016 하반기 글로벌 특파원(Global Jewelry Analyst): /rc/clk?jk=99f475a69bb1f898&fccid=5fbcb6010f943951
    게임 데이터 분석가 모집: /rc/clk?jk=7c67344e74684640&fccid=c8d68a4a912ede1d
    Global Data - Data Analyst, Seoul: /rc/clk?jk=2044587dcd4f046c&fccid=f770da67b3b51c62
    언더라이팅팀 - Uderwriting Business Analyst: /rc/clk?jk=c23ab48eaa8b6229&fccid=afbf8c270610a38a
    Junior Staff Analyst: /rc/clk?jk=def55ce8b1a048a2&fccid=8765a4045377753a
END OF PAGE #1

----------------------------------

START OF PAGE #2
    Developer (data team) Experienced/Intermediate: /rc/clk?jk=158844bca5825524&fccid=181799781da546b7
    지멘스헬시니어스 초음파사업부 Complaint Analyst: /rc/clk?jk=96cc1b106aaac1e1&fccid=752bf4f536d77fd5
    Data Analyst (Fundamentals) - 6 Months Contract - Seoul: /rc/clk?jk=690ea2f1dc4d037d&fccid=f770da67b3b51c62
    Data Analyst (Localization) - 6 Months Contract, Maternity Cover - Seoul: /rc/clk?jk=e5223f4f8362676d&fccid=f770da67b3b51c62
    Client Service Analyst II: /rc/clk?jk=431cb2b4346c57ba&fccid=d221b1c65f2323a8
    [존슨앤드존슨] North Asia, Senior IT Analyst - Analytics & Insights: /rc/clk?jk=2839b193c834be7c&fccid=8c86105b331d5cc4
    Associate Producer: /rc/clk?jk=ad3cce288afeffaa&fccid=617d7f961cfcf54a
    Data Regional Contributions Analyst (6 months contract) - Seoul: /rc/clk?jk=324436511410c419&fccid=f770da67b3b51c62
    Analyst, Global Shared Service - Advertiser Services: /rc/clk?jk=338c19027bfbdad7&fccid=cb87ec4d7da9286d
    Data Analyst: /rc/clk?jk=5549e4b0132a3626&fccid=19a6225b52746746
END OF PAGE #2


-Connor
Reply all
Reply to author
Forward
0 new messages