ImplicitStructure Review

From PublicWiki
Jump to: navigation, search

Implicit Structure and the Dynamics of Blogspace

by Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose

Reviewed 1/13/2006

The goal in this paper is to track 'information epidemics' and infer the routes taken through Blogspace.

They are able to cluster informaion spread into four different types of spreading. (sustained interest, peak day two and slow decay, peak day one and slow decay, peak day one and fact decay). iRank is able to rank importance of blogs better than PageRank in that more importance can be given to blogs that tend to be earlier in the information spread and hence better sources. iRank is also claimed at being a more dynamic solution that PageRank. iRank considers an edge between two blogs anytime that one blog posted a link before the other. A weight is considered based on the timing between the two occurrences (following an exponential curve that peaks on days 1 and 2). Edges are then merged and PageRank is performed on this new implicit graph. Mentioning a link beofer it becomes wildly used will then greatly increase the iRank of a blog. The dataset comprised of 37,153 blogs and 175,712 links appearing more than once. They only used 259 different URL's for analysis however. Filter portals, old memes, sparsely used links. Inferring infection routes is done by measureing blog community similarity, general link similarity, textual similarity and the historic relative infection timing between the two blogs.


This paper only tracks blog posting to the nearest day and only track links(no topics or words). Otherwise it is novel so far in tracking the flow of the topics and attempting to rank users' importance. It might be interesting if the different category of each link was taken into accound in the iRank calculation but it is not exactly clear how that should be handled. Authors also suggest interest in tracking the dynamic change of ratings of blogs. They are answering the question of finding the soruce of information from the spread through blogs.

It would be interesting to adapt the work that has been done here to a larger data set that including more specific timing information. As well and tracking topics and links.

Problem Solved: Finding day-zero blogs, inferring infection links.