Need help with Beautifulsoup, not working as expected.

85 views
Skip to first unread message

W. D. Hare

unread,
Apr 21, 2021, 9:21:01 PM4/21/21
to beautifulsoup
I am very new to Python.  I have copied some code and modified it for my requirements, but not getting the expected results.  Any assistance is GREATLY appreciated!!!

Expected Result:
If the assigned URL contains the word "Wine" 1 time on the website, wait 15 second, loop back and start the code over again.  
Else, print message.
Actual Result:
The URL I have provided has "Wine" 1 time on the website, but it is performing the "Else" function by printing the message. 

My Code:
# Import requests (to download the page)
import requests

# Import BeautifulSoup (to parse what we download)
from bs4 import BeautifulSoup

# Import Time (to add a delay between the times the scape runs)
import time

# Import smtplib (to allow us to email)
import smtplib

# Import URLOPEN
from urllib.request import urlopen

# while this is true (it is true by default),
while True:
    # set the url,
    
    # set the headers like we are a browser,
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    # download the URL
    response = requests.get(url, headers=headers)
    
    # parse the downloaded URL and grab all text
    soup = BeautifulSoup(response.text, "lxml")
    
    # if the number of times the defined word occurs on the page is 1,
    if str(soup).find('Wine') == 1:
        # print "Please wait 15 seconds",
        print("Please wait 15 seconds")
        # pause 15 seconds,
        time.sleep(15)
        # continue with the script,
        continue
        
    # but if the defined word occurs any other number of times,
    else:
        print("Word occurs on website more or less than 1 time.")
        break

facelessuser

unread,
Apr 21, 2021, 9:34:29 PM4/21/21
to beautifulsoup

You are casting the soup object to a string and then using the string’s find method that returns an integer indicating the position of the found word and -1 if nothing is found. The word is not found at 1, but it is found at position 287. You should change the logic to str(soup).find('Wine') != -1:.

In the future, if you are confused about why something works, you should, at the very least, put in some print statements so you can see what is actually happening. Simply printing print(str(soup).find('Wine')) would have shown you exactly why things were going wrong.

W. D. Hare

unread,
Apr 22, 2021, 6:09:51 PM4/22/21
to beautifulsoup
Thank you!  The comments that were provided in the code I borrowed made it seem like it was counting the number of times the word was appearing in the text.  Thank you for clarifying for me; I understand now.  This is a big help.
Reply all
Reply to author
Forward
0 new messages