Tuesday, December 26, 2006

Latent Semantic Indexing (LSI) And SEO

Indexing has always been considered a highly targeted science. Enter a search query into Google search and the pages that are displayed are generally optimized towards that exact word or term. However, in their continual battle to server the most relevant but most natural pages with genuinely useful information Google has injected latent semantic indexing (LSI) into its algorithms.

What Is LSI?

LSI is a unique indexing method that potentially takes Google search one step closer to becoming human in its way of thinking. If we were to manually search through web pages to find information related to a given search term we would be likely to generate our own results based on the theme of the site, rather than whether a word exists or doesn't exist on the page.

Why Search Engines Might Adopt Latent Semantic Indexing

The extremely rigid form of "keyword indexing"
also meant that black hat SEO techniques were easier to implement. Search engines could be manipulated into ranking a site highly by using set formula. Originally, cramming a page with a particular keyword or set of keywords meant a site would rank highly for that search term. The proceeding set of algorithms ensured that your link profile played more of an important part than your keyword density. Reciprocal linking soon followed once again making it possible to manipulate the search engine spiders by exchanging links with tens, hundreds, or thousands of websites.

Reciprocal linking was soon beaten as Google and to a lesser extent Yahoo and MSN gave less credence to a reciprocal link as they did to a one-way inbound link. Latent Semantic Indexing is another, particularly powerful, method to try and make their result pages appear more natural with natural pages filled with natural content.

The Effects

The introduction of LSI has seen some dramatic changes in the search engine result pages already. Sites that had previously performed well because of an impressive link profile based on a single keyword have found their pages slip in the rankings. Other pages with a more diverse portfolio of inbound links are taking the lead with search terms for which they had not previously performed.

SEO is far from dead because of LSI, in fact if anything, it has probably increased the need for professional white-hat SEO on your website. The field of SEO, though, has almost certainly changed. Website content copywriting for Google's benefit is not merely made up of keyword density and keyword placement as it once was and link-building techniques will need to change to incorporate LSI algorithms but it can be done.

Writing Content For LSI

If optimizing solely for Google then a web page can, theoretically, be naturally written and naturally worded. When we write we instinctively include the appropriate keyword in our text. In order to avoid repetition (or keyword optimization, as it was once called) we would often alter some instances of these keywords for other words with the same or very similar meaning. We naturally include the plural or singular form of a keyword as well as different tenses and a number of different stems of that keyword. In the eyes of LSI algorithms this is all good news.

Looking At Your Link Profile

A link profile should no longer consist of thousands of links
with the same anchor text (that of your primary keyword). There's no reason to panic if you already have this kind of profile. Instead you should look at relevant and similar terms and improve your link profile by gaining links using these as your anchor text.

What It Offers Web Searchers

From the point of view of web searchers, LSI offers some distinct advantages over its earlier form of indexing. For example, LSI recognizes that the word "engine" in "search engine optimization" is not related to searches for terms like "steam engine" or "locomotive engine" and is instead related to Internet marketing topics. In theory, LSI results give a much more accurate page of results as well as providing a broader range of pages still geared towards a particular topic.

Where Google Leads, Others Generally Follow

It is widely acknowledged that Google is the search engine at the forefront of latent semantic indexing. On the whole they try to generate results pages that are literally filled with genuine, useful results and LSI certainly provides another string in their bow. Yahoo and MSN, for now, seem more than happy to go along with keyword specific indexing although Yahoo are known to look at singular and plural keyword variations as well as keyword stemming when judging keyword density.

The Effect On Your Website

How it affects the individual Webmaster is dependent on how they go about promoting their site already. If the pages are filled with natural content including keywords and keyword alternatives, and the link profile is similarly diversified for a number of related keywords then the fact is it won't change very much. However, if all of your efforts have been concentrated, either on-page or off-page, with a single keyword then it's time to readdress the balance.

About the Author:
Matt Jackson is a
homepage content author for WebWiseWords. WebWiseWords specializes in natural web content writing that appeals to search engine spiders and to human visitors.

Friday, December 22, 2006

Matt Cutts: Website Reviews, Duplicate Content

by: Matt Cutts: Gadgets, Google, and SEO

I woke up early on Thursday and was at the convention center by 8am to check on my backpack and laptop. It was still tucked away under a table, untouched. Whew! I hunkered down in the speaker room and started working on my slides. (I hate seeing the same presentations over and over again at conferences, so I always try to make a new presentation for each show.)

By 9am, I was still really behind, so I decided to skip Danny’s keynote and kept chugging. I missed a few other sessions on Thursday, but I figured it was worth it to be prepared.

Around 1:30, Brett had signed me up for a site review panel with Greg Boser, Tim Mayer, Danny Sullivan, and Todd Friesen, with Jake Baillie moderating. The idea behind a site review panel is that people volunteer their sites and get advice on how to do things better. We discussed a promotional gifts company, the Dollar Stretcher site, a real estate company in Las Vegas, a chiropractic doctor, a real estate licensing company, a computer peripheral site, a hifi sounds store, and a day spa in Arizona. In his PubCon recap, Barry said I ripped on sites. For most of the sites I tried to give positive advice, but I wasn’t afraid to tell people of potential problems.

Once again, I sat on the end and had my wireless and VPN working so that I could use all of my Google tools. The promotional gifts company had a couple issues. For one thing, I was immediately able to find 20+ other sites also belonged to the promotional gifts person. The other sites offered overlapping content and overlapping pages on different urls. The larger issue was searching for a few words from a description quickly found dozens of other sites with the exact same descriptions. We discussed the difficulty of adding value to feeds when you’re running lots of sites. One thing to do is to find ways to incorporate user feedback (forums, reviews, etc.). The wrong thing to do is to try to add a few extra sentences or to scramble a few words or bullet points trying to avoid duplicate content detection. If I can spot duplicate content in a minute with a search, Google has time to do more in-depth duplicate detection in its index.

Next up was the Dollar Stretcher. I’d actually heard of this site before (stretcher.com), and everything I checked off-page and on-page looked fine. The site owner said that Google was doing fine on this site, but Yahoo! didn’t seem to like it. Greg Boser managed to find a sitemap-type page that listed hundreds of articles, but in my opinion it only looked bad because the site has been live since 1996 and they had tons of original articles that they had written over the years. So how should a webmaster make a sitemap on their site when they have hundreds of articles? My advice would be to break the sitemap up on your pages. Instead of hundreds of links all on one page, you could organize your articles chronologically (each year could be a page), alphabetically, or by topic. Danny and Todd noticed a mismatch between uppercase url titles on the live pages and lowercase url titles according Yahoo!’s Site Explorer, and a few folks started to wonder if cloaking was going on. The site was definitely hitting a “this is a legit site” chord for me, and I didn’t think they were cloaking, so I checked with a quick wget and also told Googlebot to fetch the page. It all checked out–no cloaking going on to Google. I gently tried to suggest that it might be a Site Explorer issue, which a few people took as a diss. No diss was intended to Site Explorer; I think it’s a fine way to explore urls; I just don’t think stretcher.com was trying to pull any tricks with lowercasing their titles to search engines (not that cloaking to lowercase a title is something that it would help a site anyway).

Holy crawp this is taking forever to write. Let me kick it into high gear. The real estate site was functional (~100-ish pages about different projects + 10 about us/contact sort of pages) about Las Vegas real estate, but it was also pretty close to brochureware. There was nothing compelling or exciting about the site. I recommended looking for good ways to attract links: surveys, articles about the crazy construction levels in Vegas, contests–basically just looking at ways to create a little buzz, as opposed to standard corporate brochureware sites. Linkbait doesn’t have to be sneaky or cunning; great content can be linkbait as well, if you let people know about it.

The chiropractor site looked good. Danny Sullivan made some good points that they wanted to show up for a keyword phrase, but that phrase didn’t occur on the site’s home page. Think about what users will type (and what you want to rank for), and make sure to use those words on your page. The site owner was also using Comic Sans, which is a font that a few people hate. I recommended something more conservative for a medical story. Greg Boser mentioned to be aware of local medical associations and similar community organizations. I recommended local newspapers, and gave the example that when my Mom shows up with a prepared article for her small hometown newspaper about a charity event, they’re usually happy to run it or something close to it. Don’t neglect local resources when you’re trying to promote your site.

My favorite for the real estate licensing site is that in less that a minute, I was able to find 50+ other domains that this person had–everything from learning Spanish to military training. So I got to say “Let us be frank, you and I: how many sites do you have?” He paused for a while, then said “a handful.” After I ran through several of his sites, he agree that he had quite a few. My quick take is that if you’re running 50 or a 100 domains yourself, you’re fundamentally different than the chiropractor with his one site: with that many domains, each domain doesn’t always get as much loving attention, and that can really show. Ask yourself how many domains you have, and if it’s so many that lots of domains end up a bit cookie-cutter-like.

Several times during the session, it was readily apparent that someone had tried to do reciprocal links as a “quick hit” to increase their link popularity. When I saw that in the backlinks, I tried to communicate that 1) it was immediately obvious to me, and therefore our algorithms can do a pretty good job of spotting excessive reciprocal links, and 2) in the instances that I looked at, the reciprocal links weren’t doing any good. I urged folks to spend more time looking for ways to make a compelling site that attract viral buzz or word of mouth. Compelling sites that are well-marketed attract editorially chosen links, which tend to help a site more.


The computer peripheral site had a few issues, but it was a solid site. They had genuine links from e.g. Lexar listing them as a place to buy their memory cards. When you’re a well-known site like that, it’s worth trying to find even more manufacturers whose products you sell. Links from computer part makers would be pretty good links, for example. The peripheral site had urls that were like /i-Linksys-WRT54G-Wireless-G-54Mbps-Broadband-Router-4-Port-10100Mbps-Switch-54Mbps-80211G-Access-Point-519

, which looks kinda cruddy. Instead of using the first 14-15 words of the description, the panel recommended truncating the keywords in the url to ~4-5 words. The site also had session ID stuff like “sid=te8is439m75w6mp” that I recommended to drop if they could. The site also had product categories, but the urls were like “/s-subcat-NETWORK~.html”. Personally, I think having “/network/” and then having the networking products in that subdirectory is a little cleaner.

The HiFi store was fine, but this was another example where someone had 40+ other sites. Having lots of sites isn’t bad, but I’ve mentioned the risk that not all the sites get as much attention as they should. In this case, 1-2 of the sites were stuff like cheap-cheap-(something-related-to-telephone-calling).com. Rather than any real content, most of the pages were pay-per-click (PPC) parked pages, and when I checked the whois on them, they all had “whois privacy protection service” on them. That’s relatively unusual. Having lots of sites isn’t automatically bad, and having PPC sites isn’t automatically bad, and having whois privacy turned on isn’t automatically bad, but once you get several of these factors all together, you’re often talking about a very different type of webmaster than the fellow who just has a single site or so.

Closing out on a fun note, the day spa was done by someone who was definitely a novice. The site owner seemed to have trouble accessing all the pages on the site, so she had loaded a ton of content onto the main page. But it was a real site, and it still ranked well at Yahoo! and Google. The site owner said that she’d been trying to find a good SEO for ages, so Todd guilted the audience and said “Will someone volunteer to help out this nice person for free?” A whole mess of people raised their hand–good evidence that SEOs have big hearts and that this site owner picked the right conference to ask for help.

Okay, that’s a rundown of SEO feedback on some real-world sites, and this post is like a gajillion words long, so I’ll stop now and save more write-ups for later.

Monday, December 11, 2006

Google Gets Personalized

Google Gets Personalized
by Kim Roach

Have you ever become overwhelmed by the number of documents accessible via a search engine? If you're like most people, then you probably have. There are often millions of results and not every result is likely to be of equal importance to you.

In addition to that, there is also ambiguity of language. Words often have multiple meanings and people can have different interpretations of the same word. How does a search engine know the difference? Well, at this point, they don't.

They certainly can't read your mind so the only other alternative is to track your online activities in order to custom tailor your search results based on your recorded preferences.

Google is one of the first major search engines to test this new technology. They have released a total of 15 new patent applications this month in relation to this very endeavor.

Actually, I'm not too surprised that Google is taking a closer look at personalization. Google has already begun testing many of these new search features in Google's personalized search http://www.google.com/psearch, which is currently in beta.

Traditional algorithmic search engines have reached their peak. Personalized search is a natural and necessary progression for Google and other search engines as well. Some alternative search engines have already taken the lead in this endeavor. Eurekster http://www.eurekster.comis one of the main ones that comes to mind, using a searchers history to bring them more relevant results.

Here is an abstract from one of the Google patents entitled, Systems and methods for analyzing a user's web history http://tinyurl.com/ycdhxl: A user's prior searching and browsing activities are recorded for subsequent use. A user may examine the user's prior searching and browsing activities in a number of different ways, including indications of the user's prior activities related to advertisements.

A set of search results may be modified in accordance with the user's historical activities. The user's activities may be examined to identify a set of preferred locations. The user's set of activities may be shared with one or more other users. The set of preferred locations presented to the user may be enhanced to include the preferred locations of one or more other users.

A user's browsing activities may be monitored from one or more different client devices or client application. A user's browsing volume may be graphically displayed.

Now, let's talk about all of that in English. Over time, we develop a history of search queries, selected results that were clicked on, advertisements that were clicked on, and a multitude of other browsing activities. Each of these actions reflect our preferences and interests. Other examples of user activity Google may begin tracking include instant messaging, word processing, participation in chat rooms, and internet phone calls.

Talk about an invasion of privacy. Unfortunately, we don't have enough time to get into that issue.

Within the proposed system, users are able to access their past searching and/or browsing activities to enhance their experience. Each of their online activities gives clues to what they might ultimately be looking for or related areas of interests.

In addition, users can also modify their profile information to better represent their interests. For example, a user may delete a search query from his/her history or he/she could also provide updated information as to new areas of interest.

One of the most interesting aspects of the patent filings involves the re-ranking of search results according to the user's preferences.

After a query is made and the results are received, they are then adjusted based upon information from the user's history. The order of the search results can be adjusted in accordance with a history score and/or any user modified result score. Search results can also be ordered based upon the combined search result score and the history score to come up with optimal results.

A searcher may also be shown an indication of previously visited pages among the SERPs, including information such as the date and time a page was previously visited and the number of times that the user has visited the site within a certain period of time.

A certain number of the most highly ranked results that the user has previously visited may be displayed in a region above the search results for easy access (kind of like memorized favorites).

They could also be displayed in another section of the page, or even in a separate window.These previously visited pages may be ordered based upon a number of different ranking criteria, including the history score, pagerank, time of last access, number of accesses, etc.

A user's browsing activities may also play a part in the ranking of search results. For example, if a website was previously visited by the user, it could have its score boosted based upon the number of times the user has visited that particular website. Google may also track how long a visitor stays at any given website. A site that is bookmarked and visited frequently will almost always rank higher.

On the other hand, search results that were previously presented to searchers but not clicked through could be lowered in the results.

What does this mean for you as a webmaster and SEO? It means that your focus should be on quality. In creating your website, you must emphasize visitor optimization and content optimization over search engine optimization.

The visitor always comes first and you must create a valuable experience for them. Allow them to quickly and easily bookmark your website. Give them a reason to hang out for a while, whether it be a forum, lots of great content, or fun quizzes.

The future of SEO is about creating quality, authority sites.

About This AuthorKim Roach is a staff writer and editor for the SiteProNews & SEO-Newsnewsletters. You can also find additional tips and news on webmaster and SEO topics by Kim at the SiteProNews blog. Kim's email is: kim@seo-news.comThis article may be freely distributed without modification and provided that the copyright notice and author information remain intact.