Before going further in to main topic.
Baoning wu is a search quality scientist you can read more about him here..Baoning wu http://wume.cse.lehigh.edu/~wu/
Brian D Davision is an associate professor computer Science & engineering. For more information you can go here..Brian D. Davison http://www.cse.lehigh.edu/~brian/
Research Paper Title:Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings∗ April 2006.
This document has lot of things, which we can easily co-relate with Google’s Panda & Penguin Updates. After spending 3-4 hrs for reading about it I found some sharable things from it.It may help you to build you link building strategy J
My Intentions are to tell something scoopy about Search Engine Algorithm and how search quality is improved. Previously Search Engines were using Hyperlink Induced Topic Search (HITS) algorithm for Search Ranking.
But 3 Major factors degraded previous algorithm
- Mutually Reinforcing Relationships
- The Existence of many duplicate pages, making links cited within them rank highly.
- Link farms
Mutually Reinforcing Relationships???
Here “Mutually Reinforcing Relationships” means – page is the target of many links from multiple pages of the same web site, or alternatively, one page that points to many pages of another web site
We all know other two terms very well right? 🙂
Why it was required?
In normal SEO community we know that it was very easy to build 100s of sites and build directories, link farm etc. to manipulate search engine rankings easily ( Targeted “Anchor Text” & tons of links were enough to get rankings on desired keywords. But at the end data scientists wins the battle 🙂
In this Research Paper proposed Spam Detection Technique is in 3 categories
1) Detect link farm spam
2) Generates clusters or to use bipartite core to find web communities
3) Find duplicate web pages.
Detect Link Farm Spam
Link-farm detection is implemented by identifying webpages, which have higher number of links from specified Bad Pages, this way set of bad networks has been identify.
“We present a different method for finding link farm spam , in which an automated method is used to select a seed set of bad pages (spam pages) which are then expanded to include others that link to enough known bad pages. Our current approach is additionally able to find duplicated pages and has an accuracy of about 5% higher “
Read Above Paragraph to understand about Spam
Web communities or Clusters
Here also lot of things are mentioned but I think people will be easily understand below lines..
“Roberts and Rosenthal  propose a simple algorithms to find clusters of web pages based on their outlink sets, and the authority value of a page is proportional to the number of clusters which tend to link to it rather than the number of pages which link to it.
Duplicate and near-duplicate Web pages
Just read these lines to understand duplicate web pages
Two documents are considered duplicates if they have significant overlap of the chunks or fingerprints. In our approach, only links and anchor texts are extracted from a page and two pages are considered to contain duplicate material if they have common links with common anchor texts without requiring them to be in sequence.
After Section 2.
In Section 3 Proposed Search Quality Algorithm.
Have a glance on this image about algorithm to detect link plagiarism
if you want to go through whole document you can read here
Here are few things which can help seo community to understand more …..
3.1 Complete hyperlinks
“If two pages have a bunch of common links with same anchor texts, it is a stronger sign than using links alone that these two pages are duplicates or made by the same person or machine on purpose, which is an obvious behavior of link farm spammers.“
3.2 Finding bipartite components
I am attaching image to read about it, it is also focused on same thing reducing “link” effect from same link network & low weighted document.
5.2 Exact match vs. bag of words
“One obvious question people may raise is that what if the spammers deliberately use different anchor texts for the same link. Since our algorithm counts same link with different anchor texts as different complete links, the deliberately generated anchor texts for the same link may escape our bipartite graph detection and more spam links may survive.“
So , there are number of scientists , who are improving algorithm for detection of link farm and other manipulative techniques.. sooner or later they will win this game 🙂
All People who are suffering from Panda & Penguin hits are asking only one question… how to recover from it.?
Here are Easy Steps
1) Review your SEO Strategy ? is it only focused on link building? SEO is not about link building it is around link building.
2) Focus on building brand? if you focus on building brand in few years you will get more people by searching on your name or on your brand name 🙂 & Nobody will steal your keyword 🙂
3) Think Long Term .. SEO ( Search Engine Optimisation ) & SEP ( Systematic Investment Plans) are no longer different.
4) User’s Search Patterns are now changed.. Searches are now more specific and targeted.. So, get found on those kind of targeted queries instead of generic keywords. & Long Queries has less cost per conversion
So , I hope you will do right things not easy things 🙂