How to Get the Current URL of an Item in a Loop Using Selenium

106 views

Skip to first unread message

Miracle Akinsola Ayodele

unread,

Nov 30, 2017, 2:40:02 AM11/30/17

to Selenium Users

I trying to scrape a website in selenium but i keep getting :

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=62.0.3202.94)
(Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-101-generic x86_64)

What could be the issue, and here is the code:

def get_financial_info(self):
    # instantiate a chrome options object so you can set the size and headless preference
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--window-size=1920x1080")
    driver = webdriver.Chrome(chrome_options=chrome_options, executable_path='/home/miracle/chromedriver')

    driver.get("https://www.financialjuice.com")

    try:
        WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='trendWrap']")))
    except TimeoutException:
        driver.quit()

    category_url = [a.get_attribute("href") for a in
                    driver.find_elements_by_xpath("//ul[@class='nav navbar-nav']/li[@class='text-uppercase']/a[@href]")]

    for record in category_url:
        driver.get(record)
        item = {}
        cat = driver.find_elements_by_xpath("//h2[@class='text-uppercase corpName']")
        title_element = driver.find_elements_by_xpath("//p[@class='headline-title']")
        source_element = driver.find_elements_by_xpath("//p[@class='time']/span[@class='resource-name']/a")
        url_element = [a.get_attribute('href') for a in driver.find_elements_by_xpath("//p[@class='headline-title']/a")]

        categories = []

        for category in cat:
            categories.append(category.text)

        for title, source, urls in zip(title_element, source_element, url_element):
            item['category'] = str(categories)[1:-1].strip('"u')
            item['title'] = title.text
            item['source'] = source.text
            item['date'] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            driver.get(urls)
            item['url'] = driver.current_url
            print item

Scott Babcock

unread,

Nov 30, 2017, 8:19:10 PM11/30/17

to Selenium Users

You're navigating away from the page from which you extracted the link elements. Instead of working with the references directly, extract the [href] attribute from the references and use these instead.

Reply all

Reply to author

Forward

0 new messages