Weird issue trying to get Scrapy to run on Windows Scheduled Task

131 vistas

Ir al primer mensaje no leído

alvin. zing

no leída,

8 may 2017, 8:38:32 a.m.8/5/17

para scrapy-users

So I am trying to run the below code from a Window Scheduled Task.

The weird thing is when I try running it by hand in a command prompt, it works.

However when the ST run the code, it only return the following, and the program will end.

I am struggling with this, may the community please help me with this?

Thank you in advance.

[The Command Prompt output when run from the ST]

C:\Windows\system32>python "C:\Users\xyz\Google Drive\cineplex\start.py" seati ngs 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybo t) 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Overridden settings: {} C:\Windows\system32>pause Press any key to continue . . .

[The Python script I am running]

from cineplex import utils

from cineplex.spiders import showtimes_spider as st

from cineplex.spiders import seatings_spider as seat

import scrapy

from scrapy.crawler import CrawlerProcess

from scrapy.utils.log import configure_logging

from scrapy.utils.project import get_project_settings

import sys

import time

from twisted.internet import reactor, defer

'''

Constant for Parent Directory.

Subfolders will contain all movie times and seatings for the day

'''

PARENT_DIR = r'./data/'

'''

Crawls all Seatings per Cinema

'''

def crawl_all_seatings():

    # Create a CrawlerProcess instance to run multiple spiders simultaneously

    # Read more here https://doc.scrapy.org/en/latest/topics/practices.html

    process = CrawlerProcess()

    # Check folder for today

    directory_for_today = utils.create_dir_for_today(PARENT_DIR)

    # Get all showtimes files' filepaths

    filepaths = utils.get_all_showtimes_filepaths(directory_for_today)

    # In every filepath, is a file with all the movie session ids

    for filepath in filepaths:

        sessions = utils.get_all_sessions(filepath)

        # Only start crawling if there are sessions.

        if len(sessions) > 0:

            # Add spiders to crawler process

            for session_id in sessions:

                process.crawl(seat.SeatingsSpider, session_id=session_id, output_dir=directory_for_today)

    # Start crawling

    process.start()

'''

Crawls all Cinemas' movies' showtimes

'''

def crawl_all_showtimes():

    # Create a CrawlerProcess instance to run spiders simultaneously

    # Read more here https://doc.scrapy.org/en/latest/topics/practices.html

    process = CrawlerProcess()

    # Check folder for today

    directory_for_today = utils.create_dir_for_today(PARENT_DIR)

    # Get all cinema id and names first

    cinema_dict = utils.get_all_cinemas()

    # Iterate through all cinema to get show timings

    # Add spiders to crawler process

    for cinema_id, cinema_name in cinema_dict.iteritems():

        process.crawl(st.ShowTimesSpider, cinema_id=cinema_id, cinema_name=cinema_name, output_dir=directory_for_today )

    # Start crawling

    process.start()

'''

Main program run spiders

'''

def main(argv):

    # Turns on Scrapy Logging

    # configure_logging()

    crawl_type = argv[1]

    if crawl_type == 'showtimes':

        # Collect all Showtimes

        crawl_all_showtimes()

    elif crawl_type == 'seatings':

        # Collect all Seatings

        crawl_all_seatings()

    else:

        print 'usage: showtimes for crawling show timing or seatings to crawl seats occupancy'

if __name__ == "__main__":

    # main(sys.argv)

    main(['','seatings'])

    # Exit the program

    sys.exit()

ObserverEffect

no leída,

10 may 2017, 9:46:52 a.m.10/5/17

para scrapy-users

In Windows Task Scheduler.. Did you specify the optional "start in" folder setting?

Maybe set it to C:\Users\xyz\Google Drive\cineplex\

Responder a todos

Responder al autor

Reenviar

0 mensajes nuevos