Hey there,
If you're interested in running LinkRank or PageRank, there are a number of options. If it's your first time learning about and running these algorithms, I'd suggest either a smaller dataset or a preprocessed one such as the
Hyperlink Graph provided by the Web Data Commons which is based upon the Common Crawl dataset. You are correct in stating that the dataset has all the information you need already however -- each page has a list of its outgoing links, which is the only information these algorithms require.
PageRank is an iterative algorithm, meaning you need multiple passes over the data. To be as fast as possible, that dataset should all be in memory, which is where the large size of Common Crawl complicates things.
I'd suggest reading
Wiki Pagerank using Hadoop to see how to implement the iterative algorithm I mentioned or using a pre-existing package that provides PageRank, such as
GraphX [which runs on Spark, which runs on top of Hadoop -- it's turtles all the way down!].