failed import of gevent???

49 views
Skip to first unread message

jrm...@gmail.com

unread,
Aug 19, 2020, 11:16:44 AM8/19/20
to gevent: coroutine-based Python network library
Here is my entire code for a PGATour web site scraper.     Any help to resolve this 'issue' and also on lines 33/34 would be greatly appreciated.    My directory does begin with C:// or if forward slashes are permitted C:\\Users/jrm12\     etc., etc.

import os
import gevent
import requests
from bs4 import BeautifulSoup

statUrlFormat = "https://www.pgatour.com/stats/stat.%s.%s.html" # statId, year
categoryLabels = ['ROTT_INQ''RAPP_INQ''RARG_INQ''RPUT_INQ''RSCR_INQ''RSTR_INQ''RMNY_INQ''RPTS_INQ']

def saveHTML(urlfilename):
    print ("Saving", url, "to", filename)
    r = requests.get(url)
    with open(filename, 'wt'as f:
        f.write(r.text)

# startYear: Most recent year of stats
# numYears:  Previous # of years
def generateURL(startYearnumYears):
    statIds = []
    for category in categoryLabels:
        categoryUrl = categoryUrlFormat % (category)
        page = requests.get(categoryUrl)
        html = BeautifulSoup(page.text.replace('\n',''), 'html.parser')
        for table in html.find_all("div"class_="table-content"):
            for link in table.find_all("a"):
                statIds.append(link['href'].split('.')[1])
    for statId in statIds:
        url = statUrlFormat % (statId, startYear)
        page = requests.get(url)
        html = BeautifulSoup(page.text.replace('\n',''), 'html.parser')
        stat = html.find("div"class_="main-content-off-the-tee-details").find('h1').text
        directory = "all_stats_html/%s" % stat.replace('/'' '#need to replace to avoid
        if not os.path.exists '(\Users\jrm12\OneDrive\Documents\GitHub\pga_analytics\)' : 
            os.makedirs '(\Users\jrm12\OneDrive\Documents\GitHub\pga_analytics\)'
        years = []
        for option in html.find("select"class_="statistics-details-select").find_all("option"):
            year = option['value']
            if year not in years and len(years) < numYears and year != "y2020":
                years.append(year)
        urlFilenamePairs = []
        for year in years:
            url = statUrlFormat % (statId, year)
            filename = "%s/%s.html" % (directory, year)
            if not os.path.isfile(filename):
                urlFilenamePairs.append((url, filename))
        jobs = [gevent.spawn(saveHTML, pair[0], pair[1]) for pair in urlFilenamePairs]
        gevent.joinall(jobs)

# Main
generateURL("y2019"5)

Kevin Tewouda

unread,
Aug 20, 2020, 2:56:35 AM8/20/20
to gev...@googlegroups.com
Hi jrn,
the issue on lines 33/34 is that your quotes are surrounding your parenthesis but it should be the other  way ('\Users...')
Some advices:
- You should probably use a nice editor capable of showing you quickly this kind of error. Two good editors I can recommend you are visual studio code (you need to install python extension) and pycharm (the community edition is free)
- I assume you are coding with python3 so I recommend you to use the pathlib module to handle file urls instead of writing them by hand like you are doing. You can write something like this:
path = Path('C:/Users/jrm12/OneDrive/Documents/GitHub/pga_analytics')
..
if not path.exists():
    path.mkdir()

Hope this will help you for your scraping :)

Best regards

--
You received this message because you are subscribed to the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gevent+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gevent/1b149872-b7ee-4217-8948-0fba2aab1ef9o%40googlegroups.com.


--
Tewouda T. R. Kevin
Ingénieur informatique options génie logiciel et réseaux informatiques à 3IL
Titulaire d'un diplôme post master en télécoms à Télécoms Paris Tech
Développeur python à Gandi
Reply all
Reply to author
Forward
0 new messages