Boiling Puppies - How Search Engines Really Work
"Computers effing stupid!" is a cry often heard in freshmen computer programming labs in universities around the world.
Teacher's assistants (TAs) usually responds, "Well, that's not really true. They are only as stupid as you programmed them to be."
Jerks.
Although they have a point.
Many people think that the real power of Google is the fact that they have massive amounts of massive computers that crawl the internet all day and find stuff.
But the real power is the amount of data they have about people using their services. Not to sound all paranoid-android on you, but the reason that GMail, Blogger, YouTube, Google Analytics, etc, etc, etc. is free is because Google thinks that the data you create by using their services is much more valuable than the cost of giving those services away.
And they're right! And here's how.
The greatest source of data comes from the searchers themselves. Most likely, if you don't find what you're searching for on the first page or so, you'll change your query to something that you think is more relevant. And Google tracks this.
So when you search for "pictures of dogs" and then you change it to "pictures of puppies", over time, and with enough people doing the same thing, Google knows that "dogs" is associated with "puppies". Smart computer.
And when you search for "hot water" and then change it to "boiling water", Google learns that "hot" is associated with "boiling". Smart computer.
The problem is when you search for something like "hot dog" and you get results about "boiling puppies". Stupid computer.
It's All About Context
Obviously, if Google continued to work this way, they'd be out of business. A lot of people think hot dogs are gross enough by themselves without the added visual of a boiling puppy.
So they got smarter.
They found that "hot dogs" are found on pages that also contained the words "bread", "buns", "mustard", "ketchup", etc. and NOT boiled puppies. With this simple change, the algorithm evolved to something that was capable of handling many more different types of searches.
The trick was determining the context of the search words. And not just the context, but the overall topic as well.
Latent Sematic Indexing
In the seminal forum post that launched him into guru-dom, Ferny broke LSI down and explained how it worked. But mind you, this was 2007, and things have changed a lot on the internet since then.
There are many problems with LSI, but one of the major ones is that it's based on words, not phrases. And as such, it can't tell the difference when words have multiple meanings. Like for instance the difference between a "home business" lead, and the "metal" lead. It doesn't understand context, and as we saw above, that's really important for our "hot dog" example.
So Now What?
There's new buzz on the SEO street, and that buzz is called "Latent Dirichlet Allocation" (LDA). I'm assuming that most of you aren't the types that like studying multidimensional nonlinear vector spaces, so I'm going to make this simple.

Consider two separate topics. Water and Oil. For simplicity sake, these are the only two topics that exist. There is a range of phrases that fall between pure water and pure oil. (lemon juice, salad dressing, gulf of mexico, transmission fluid). And each can be given a score about how much it relates to one or the other. And thus the "topicness" of any specific phrase can be related to one or the other.

OK, Great, What Does This Mean For SEO?
I have a very concrete example here. Remember when I was talking about "home business leads" and "the metal lead", here's how LDA bit me right in the @ss.
A couple of years ago, I had a Magnetic Sponsoring review site. This site was very thin and didn't have much content because I never spent much time building it up. It did, however, have 300 or 400 links. But it didn't rank very well. There were plenty of crap(ier) sites above mine that had a lot less links.
But there was one keyword phrase that kept showing up in my analytics over and over again. And when I searched for this term, I ranked right near the top. What was the search term?
"is lead magnetic"
Now initially, stupid me, thinking Mike Dillard had reached unheard of penetration into the entire American market thought that people were getting leads for their home business and wondering if they qualified as being "magnetic" or not. Apparently I spent far too much time thinking about these things and it distorted my reality significantly.
Then I thought, "No, idiot, they are talking about the metal lead, not a business lead!" Crap!
My domain name ruined that site. It wasn't the content. That was fine. Using "magnetic" and "lead" along with more home-business related terms was not a problem.
http://magneticleadgeneration.com
I thought I was clever with that domain name. Certainly on topic, had the word "magnetic" in it, and the end result of applying Mike's course, "lead generation".
But LDA screwed me. Magnetic lead vs. lead generation. Guess which one won? Apparently Google thinks that the "topicness" of lead is much more related to "magnetism", and far less relation to "generation". And if you really think about it, all those sites talking about medieval alchemy and "generating gold from lead" probably screwed me too.
And the exact-matching domain trick that Ferny and Ray talk about in their Traffic Cipher course, and that has been discussed here and here actually hurt me because of LDA.
It was doomed from the start.
Now, there are definitely things I could have done to overcome this, more links, more content, better anchor text. But basically, I started myself in a hole and I would have to dig myself out.
The Takeaway
Think about your SEO and content strategy from an objective viewpoint. Don't restrict your research to your own niche. And be on the look out for topic collisions like I talked about.
As SEOMOZ put in their post about LDA:
"If we want to rank well for "the rolling stones" it's probably a really good idea to use words like "Mick Jagger," "Keith Richards," and "tour dates." It's also probably not super smart to use words like "rubies," "emeralds," "gemstones," or the phrase "gathers no moss," as these might confuse search engines (and visitors) as to the topic we're covering."
They also have a handy LDA tool to determine if you're drifting off-topic.
-----------------
Joe O'Day is a big fan of the easy tactics in Traffic Cipher and works with Ray and Ferny getting massive amounts of backlinks for our clients. You can discover more about linking at our soon-to-be launched free members site at LinkNetworker.com
-----------------
Rare Webinar Exposes the One Investment the Top 1% Are Betting on During the Coming Economic Collapse...
About the Author: Joseph ODay
Member Since: 01/14/2008
Company: Attraction Marketing Formula
Industry: MLM
Primary Web Site: http://xowiipro.com


Pushing Daisies
Nice eye-catching title Joe, haha.
Good article, and congrats on being a new Dad! :-)
Adam
All Hail - The New King Dork!
Wow, Joe! You are a HUGE dork! This is dorkier than my OG Silo and LSI video from 2008.
You are the new King Dork! LOL :)
-Ferny