Go to index

Link Spamming and its Impact
on Ranking Algorithms (1 of 6)

Abstract:Link analysis algorithms constitute the spine of most search engines. As ranking highly is considered a paramount aspect for any website owner, many attempts have been done to deceive those algorithms and manipulate rankings according to conveniences. These attempts have been progressively successful and numerous in the last years so fighting against this new form of spam called "link spam" is considered a high priority problem to address. Due to the fact that spammers are obtaining higher than deserved rankings, quality of search results is declining and web surfers are being affected.This paper introduces the problem of link spamming, reviews how the most well-known ranking algorithms can be defeated by attacking their pitfalls, and surveys much of the existing work on the war against spam. Additionally, contributions are made by suggesting new research directions in this emerging field.

Introduction

Today, the web has become an ubiquitous source of information for people with all kind of interests. With its tremendous increase, lists of bookmarks have fallen into disuse and the preferred method for information foraging is definitely the use of search engines.

As these tools have become essential for information retrieval in the WWW, users demand day after day higher levels of quality in the results presented for any type of query. However, there exist several phenomena influencing quality in a negative fashion. One of the most recently discovered is that of link spamming which is the target of this state of the art.

For any website owner, it is an important aspect to be listed in the top results of the queries regarded (or not) to their activity. Furthermore, this situation is stressed when economic interests are involved: being at the top positions can constitute a commercial advantage for any enterprise. As a consequence, search engines have started to be target of a manipulation capable of biasing the rankings in favour of minorities anxious to grab this opportunity. This phenomenon called web spam or "spamdexing" Gyongyi05 can be presented in a variety of forms. In general, these are strategies to mislead the normal web search functioning.

To establish the rankings, search engines use the information implicit in the associations between pages: the hyperlinks. The existence of a hyperlink denotes a relationship between two documents (web pages) in which a source document cites a destination document, implying the former bestowals some importance on the latter. Based on this concession or endorsement of authority, search engines determine in a global setting which pages are said to be "more important". The algorithms whose objective is to compute this level of importance are called ranking algorithms and they work on the induced graph of the web, i.e., each page is considered as a node and each hyperlink an edge, resulting in a directed graph representing the web.

Most spammers concentrate their attacks on ranking algorithms in order to boost the ranking of their pages, resulting in cases of undeserved importance and in general, an unfair game for the rest of web owners. The attacks pretending to mislead these "link analysis algorithms" are a form of web spam called link spamming.

Undoubtedly, regular users are injured the most as the quality of results decreases considerably when these type of situations occur. Moreover, link spamming has become a nuisance because beside the tenths of useless results users obtain on the top, the web is being inflated by thousands of worthless documents augmenting the cost of computational resources the algorithms consume.

Taking into account the previous facts, combating link spam is a high priority task for those involved in development and maintenance of search engines, and in general all kind of web miners whose functioning is dependent on the links of the web. With this problem in mind, I pretend to survey the phenomenon of link spamming in this document. The main objectives are to provide a good starting point in the field for those interested in the topic, relate the scenarios and algorithms in which link spamming takes place, explain the existent strategies to fight the problem, and suggest new approaches and research directions in the field. Here, I also pretend to encompass the scarce and heterogeneous previous work on this relatively new field.

This scarcity by no means implies lack of interest. Indeed, awareness of the existence of web spam was first seen in Davison00 and the term link spam was coined recently (2002) Henzinger02. Significant work has been done since then, however, it was only until 2005 when was created the first specialized scenario to promote research activities of the like: The International Workshop on Adversarial Information Retrieval on the Web (AIRWeb, AIRWEB). This workshop has already run for two opportunities and it is intended to bring together researchers and practitioners concerned with topics such as automated link spam detection, link bombs, cloaking, redirection, link optimisation for PageRank, propaganda, etc. Unlike e-mail spam, web spam is still in its childhood, at least in the issues concerned with thwarting the initiatives of spammers.

This paper structures as follows: section two provides an overview of the most common ranking algorithms including their downsides, then in section three a more formal definition of link spamming is given and so the details of how the algorithms are actually being spammed. Section four addresses previous work and mentions some well-known strategies of link spam and famous cases of web spam that have occurred in the Internet. Section five is regarded to survey the existing techniques to combat the problem. Finally a future work section is presented.

Due to the fact that this paper is not only illustrative on link spam techniques but also some directions to fight this problem are given, readers may think that this will encourage spammers to do their job more intensively. However, the purpose here is academic and the main objective is to provide a general framework in the field and establish the basis for quantifying, comparing, and analysing the link spamming phenomenon.


Next page

Go to index