how to find out broken links in website using selenium webDriver?

580 views
Skip to first unread message

pavan raj

unread,
Mar 5, 2014, 1:21:34 AM3/5/14
to webd...@googlegroups.com

darrell

unread,
Mar 5, 2014, 6:40:09 PM3/5/14
to webd...@googlegroups.com
It would be possible but there are better tools out there for checking links on a website. Have a read of http://darrellgrainger.blogspot.com/2011/12/using-right-tool-for-job.html. I'd check the docs for wget. There should be a way to get it to report/record broken links by crawling the entire website.


On Wednesday, 5 March 2014 01:21:34 UTC-5, pavan raj wrote:

Chris Merrill

unread,
Mar 5, 2014, 8:42:34 PM3/5/14
to webd...@googlegroups.com
It is worth mentioning that there are pros and cons to each approach. In some ways,
WebDriver is exactly the right tool for this job.


In the PRO column using WebDriver:
- the ability to check links that are created dynamically by JS and don't appear in the page
source. This is _really_ difficult without a real browser or making assumptions about the
the page construction
- following links that implemented via an onClick() method (see above)

In the CON column:
- harder to check for "page not found" (404s) without access to the HTTP status code - the
solution may need to be browser specific.
> --
> You received this message because you are subscribed to the Google Groups "webdriver" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> webdriver+...@googlegroups.com <mailto:webdriver+...@googlegroups.com>.
> To post to this group, send email to webd...@googlegroups.com <mailto:webd...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/webdriver.
> For more options, visit https://groups.google.com/groups/opt_out.


--
------------------------------------------------------------------------ -
Chris Merrill | Web Performance, Inc.
ch...@webperformance.com | http://webperformance.com
919-433-1762 | 919-845-7601

Web Performance: Website Load Testing Software & Services
------------------------------------------------------------------------ -

darrell

unread,
Mar 6, 2014, 9:27:47 AM3/6/14
to webd...@googlegroups.com
Chris is correct that tools like wget might not follow javascript links; I'm assuming you will spike on a variety of tools to find out what works for you. The important thing is not to re-invent what someone has probably already done.

The site I just released had a trigger such that if the User-Agent was a known bot, e.g. "User-Agent: googlebot", then it would resolve the javascript server side and give the bot a fully resolved page without any javascript links. If the site you are testing does not do this then wget would not be sufficient. In that case I would still not use WebDriver to find all the broken links. 

Writing the algorithm to find all the links when they aren't necessarily anchors but could be things like div with onclick or event driven could make it difficult to be sure you found all the ways the application created 'links'. There are tools people use to test web crawling like http://www.httrack.com/http://nutch.apache.org/http://code.google.com/p/crawler4j/ or http://www.sphider.eu/. I'm sure if you go to Google, Yahoo or Bing sites for Search Engine Optimization (SEO) they will suggest even more modern tools which traverse an entire website.

If these tools don't work for traversing your website, rather than building a tool which does it might be better to change your website to work with these tools because if they don't work with these tools, your site will not get properly indexed by search engines like google, yahoo and bing.

Aditya Aggarwal

unread,
Mar 7, 2014, 3:32:25 AM3/7/14
to webd...@googlegroups.com
You can use HTTPURLconnection Class. 

class ResponseCodeCheck 
{

    public static void main (String args[]) throws Exception
    {

        URL url = new URL("http://google.com");
        HttpURLConnection connection = (HttpURLConnection)url.openConnection();
        connection.setRequestMethod("GET");
        connection.connect();

        int code = connection.getResponseCode();
        System.out.println("Response code of the object is "+code);
        if (code==200)
        {
            System.out.println("OK");
        }
    }
}

On Wednesday, March 5, 2014 11:51:34 AM UTC+5:30, pavan raj wrote:

Reply all
Reply to author
Forward
0 new messages