Wednesday, 16 September 2009

Calculating Pagerank with matricies



Google pagerank is calculated as a function of how many pages link to you, and how many pages link to the pages that link to the pages that link to you, and so on.

Evidence suggests that this pagerank is calculated by google using Matrix calculations, not least, because it would be much faster to calculate.

Here is how it is done.

Imagine six web pages, that interlink as shown
(See image Left)

From this, we create a 6x6 matrix, where the rows represent pages, and the columns represent links to other pages.

A "Link" is a Matrix cell in the format 1/x where x is the total number of out-links from that page. So 1/2 means that there are 2 out-links.


This is the Matrix shown in Fractions (above). This needs to be converted to decimals, so I can feed it into Wolfram alpha for the calculations:
{{0,0.5,0.5,0,0,0},
{0,0,0,0,0,0},
{0.33,0.33,0,0,0.33,0},
{0,0,0,0,0.5,0.5},
{0,0,0,0.5,0,0.5},
{0,0,0,1,0,0}}

To calculate the pagerank, we need to pre-multiply it by a matrix {1,1,1,1,1,1} known as the Pagerank Vector.




This then yields the result:



Which shows, in relative importance, which page would be deemed most important, in this case "1.5" which is Page 4.

No comments: