Weird issue trying to get Scrapy to run on Windows Scheduled Task

131 vistas
Ir al primer mensaje no leído

alvin. zing

no leída,
8 may 2017, 8:38:32 a.m.8/5/17
para scrapy-users
So I am trying to run the below code from a Window Scheduled Task.
The weird thing is when I try running it by hand in a command prompt, it works.
However when the ST run the code, it only return the following, and the program will end.
I am struggling with this, may the community please help me with this?

Thank you in advance.



[The Command Prompt output when run from the ST]
C:\Windows\system32>python "C:\Users\xyz\Google Drive\cineplex\start.py" seati ngs 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybo t) 2017-05-06 21:47:03 [scrapy.utils.log] INFO: Overridden settings: {} C:\Windows\system32>pause Press any key to continue . . .



[The Python script I am running]
from cineplex import utils
from cineplex.spiders import showtimes_spider as st
from cineplex.spiders import seatings_spider as seat
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings
import sys
import time
from twisted.internet import reactor, defer

'''
Constant for Parent Directory.
Subfolders will contain all movie times and seatings for the day
'''
PARENT_DIR = r'./data/'


'''
Crawls all Seatings per Cinema
'''
def crawl_all_seatings():
# Create a CrawlerProcess instance to run multiple spiders simultaneously
process = CrawlerProcess()

# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)

# Get all showtimes files' filepaths
filepaths = utils.get_all_showtimes_filepaths(directory_for_today)

# In every filepath, is a file with all the movie session ids
for filepath in filepaths:
sessions = utils.get_all_sessions(filepath)
# Only start crawling if there are sessions.
if len(sessions) > 0:
# Add spiders to crawler process
for session_id in sessions:
process.crawl(seat.SeatingsSpider, session_id=session_id, output_dir=directory_for_today)

# Start crawling
process.start()


'''
Crawls all Cinemas' movies' showtimes
'''
def crawl_all_showtimes():
# Create a CrawlerProcess instance to run spiders simultaneously
process = CrawlerProcess()

# Check folder for today
directory_for_today = utils.create_dir_for_today(PARENT_DIR)

# Get all cinema id and names first
cinema_dict = utils.get_all_cinemas()

# Iterate through all cinema to get show timings
# Add spiders to crawler process
for cinema_id, cinema_name in cinema_dict.iteritems():
process.crawl(st.ShowTimesSpider, cinema_id=cinema_id, cinema_name=cinema_name, output_dir=directory_for_today )

# Start crawling
process.start()


'''
Main program run spiders
'''
def main(argv):
# Turns on Scrapy Logging
# configure_logging()

crawl_type = argv[1]
if crawl_type == 'showtimes':
# Collect all Showtimes
crawl_all_showtimes()

elif crawl_type == 'seatings':
# Collect all Seatings
crawl_all_seatings()

else:
print 'usage: showtimes for crawling show timing or seatings to crawl seats occupancy'



if __name__ == "__main__":
# main(sys.argv)
main(['','seatings'])

# Exit the program
sys.exit()

ObserverEffect

no leída,
10 may 2017, 9:46:52 a.m.10/5/17
para scrapy-users
In Windows Task Scheduler.. Did you specify the optional "start in" folder setting?

Maybe set it to C:\Users\xyz\Google Drive\cineplex\
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos