Passing Variables into find_all()

172 views
Skip to first unread message

Mike L

unread,
Oct 7, 2023, 10:37:12 PM10/7/23
to beautifulsoup
Hi I'm trying to set up a tool that requires me to pass a variable to a function which will then pass that on to soup.find_all() like so:

def fetch_chosen_stat(selected_url, selected_attr):
    if selected_url:
        response = requests.get(selected_url)
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
            print(selected_attr)
            chosen_stat = soup.find_all(selected_attr)
            print(chosen_stat)
            if chosen_stat:
                for stat in chosen_stat:
                    print(stat.get_text())
            else:
                print('Data not found.')
        else:
            print('Failed to retrive the web page.')

(The selected_attr variable is taken from a list)
"attrs={'class': 'left', 'data-stat': 'opp'}"

When I pass this variable I get an empty list.

Output:
attrs={'class': 'left', 'data-stat': 'opp'}
[]
Data not found.



I'm wondering if anyone who has experience with this concept can point me in the right direction.
I've found very little help after searching online and chatGPT. Is it impossible for soup to parse the variable?
Is there a better way to scrape specific data based on what is passed to the function?

Isaac Muse

unread,
Oct 9, 2023, 12:10:10 PM10/9/23
to beautifulsoup

I imagine your issue is that you need to specify

chosen_stat = soup.find_all(attrs=selected_attr)

Mike L

unread,
Oct 9, 2023, 1:13:33 PM10/9/23
to beautifulsoup
I'll give that a try when I'm able, but I think it still gave me the same issue. Thanks for replying

Isaac Muse

unread,
Oct 9, 2023, 1:20:47 PM10/9/23
to beautifulsoup
Well, the reason why it definitely wasn't working is that you were omitting the first input, but using the second input in the first input's spot. That is absolutely wrong.

If you are still not finding the elements you desire, then there may be a disconnect between what you think is in the HTML and what is actually in the HTML. If that is the case, you would have to provide a reproducible example that people could evaluate to determine where you are going wrong. Based on what you've provided, evaluation by a 3rd party, such as myself, is impossible. I'd dump the HTML content and ensure that when you download it, the content is there, checking the content in your web browser may not be sufficient if Javascript is adding attributes after page load as those won't be seen when downloading via requests.

Mike L

unread,
Oct 9, 2023, 3:47:12 PM10/9/23
to beautifulsoup
Ok, I added some output to show what is working, what isn't, and what I'm attempting to accomplish.

This is my code: 
def fetch_chosen_stat(selected_url, selected_attr):
    if selected_url:
        response = requests.get(selected_url)
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
            print(selected_attr)
            print(soup)
            chosen_stat = soup.find_all(selected_attr)
            chosen_stat_hardcoded = soup.find_all(attrs={'class': 'left', 'data-stat': 'opp'})
           
            if chosen_stat:
                for stat in chosen_stat:
                    print(stat.get_text())
            else:
                print('Data not found.\n')
           
            if chosen_stat_hardcoded:
                for stat in chosen_stat_hardcoded:
                    print(stat.get_text())
            else:
                print('Hardcoded Data not found.')
               
        else:
            print('Failed to retrive the web page.')

Then this is what I am getting as output (abbreviated unncessary sections):
_______________________________________
attrs={'class': 'left', 'data-stat': 'opp'}

<!DOCTYPE html>

<html class="no-js" data-root="/home/pfr/build" data-version="klecko-" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="ie=edge" http-equiv="x-ua-compatible"/>
<meta content="width=device-width, initial-scale=1.0, maximum-scale=2.0" name="viewport">
<link href="https://cdn.ssref.net/req/202309261" rel="dns-prefetch"/>
<!-- Quantcast Choice. Consent Manager Tag v2.0 (for TCF 2.0) -->
<script async="true" type="text/javascript">
    (function() {
        var host = window.location.hostname;
        var element = document.createElement('script');
        var firstScript = document.getElementsByTagName('script')[0];
        var url = 'https://cmp.quantcast.com'
            .concat('/choice/', 'XwNYEpNeFfhfr', '/', host,
                    '/choice.js?tag_version=V2');
        var uspTries = 0;
        var uspTriesLimit = 3;
        element.async = true;
        element.type = 'text/javascript';
        element.src = url;

...

<!-- End Google Analytics -->
<!-- Start of HubSpot Embed Code -->
<script async="" defer="" id="hs-script-loader" src="//js.hs-scripts.com/20503178.js" type="text/javascript"></script>
<!-- End of HubSpot Embed Code -->
</body>
<!-- SR -->
</html>

Data not found.

Washington Commanders
New York Giants
Dallas Cowboys
San Francisco 49ers
Cincinnati Bengals
Los Angeles Rams
Seattle Seahawks
Baltimore Ravens
Cleveland Browns
Atlanta Falcons
Houston Texans
Los Angeles Rams
Pittsburgh Steelers
Bye Week
San Francisco 49ers
Chicago Bears
Philadelphia Eagles
Seattle Seahawks
_________________________________


What I'm seeing is the function being called correctly and selected_attr is the correct string. 

Then the total HTML page is being correctly found by soup, but the find_all(selected_attr) is not finding the data when the hardcoded find_all(attrs={'class': 'left', 'data-stat': 'opp'}) is able to.

I had tried the following with the same results as well:

selected_attr = "attrs={'class': 'left', 'data-stat': 'opp'}"

chosen_stat = soup.find_all(attrs=selected_attr)


Isaac Muse

unread,
Oct 9, 2023, 4:49:26 PM10/9/23
to beautifulsoup
As I said, you need to use:

```py
chosen_stat = soup.find_all(attrs=selected_attr)
```

Not

```py
chosen_stat = soup.find_all(selected_attr)
```

It's not the fact that you've hardcoded it that it works properly, it's the way you are passing the variable. Please use `attrs=`.

Mike L

unread,
Oct 9, 2023, 5:14:09 PM10/9/23
to beautifulsoup
I see I have updated my code to reflect that. 

The variable is coming from a dictionary

stat_list = {
    "opponent": "attrs={'class': 'left', 'data-stat': 'opp'}"
...
}

I select the opponent key with a combo box and it makes the variable equivalent to the attrs= string.

Would you recommend I store the variables in another way?

Thank you

Isaac Muse

unread,
Oct 10, 2023, 9:48:50 AM10/10/23
to beautifulsoup
I don't think I understand, why are you storing it as a string? I don't have the context of how you plan to use that, so I'm not sure I can answer.
Reply all
Reply to author
Forward
0 new messages