Friday, July 27, 2012

Have all your posts been added to Google search results yet?

How can you tell if Google has indexed your blog - or if it's indexed all your posts.

The quick way is to search for site:YOUR-BLOG-NAME and see how many results are returned.   If it's roughly the same as your number of published posts, then you're about right.   But what if it's not?

For example, if I google
site:blogger-hints-and-tips.blogspot.com
I get
About 385 results (0.26 seconds)


However I've only 226 published posts in blogger-hints-and-tips.   That tells me there's a problem with content being indexed more than once.   Looking at the results, I can see the problem immediately:  my dynamic-views have been indexed (goodness only knows why) - but also, my archive pages appear to have been indexed.  

For a more in-depth analysis, though, Webmaster Tools now has an Index Status feature, which is particularly useful if there are less pages indexed than published posts, because it shows you if it's always been like that, or if there has been a change at some time which led to a lot of your posts being removed from Google's index.

To use it:



  • If you haven't verified the account before, you may need to do this.
    (Google have said before that blog-owners are automatically verified, but I've found a few times that this hasn't quite worked - I've had to manually verify by installing a meta-tag that they provide)


  • Look under the Health tab for Index Status. It shows a graph of the number of pages indexed vs date for (up to) the last year.



Looking at the graph, if:

  • the line is flat and you havn't made posts at that time, or
  • you have been posting and the line is going upwards,

then it probably means that all your posts are being crawled and indexed (and so can be found in google searches).  

But if there's a different pattern, the advanced tab might be helpful.   It shows:

  • the number of pages indexed
  • the cumulative number of pages crawled,
  • the number of pages that Google knows that it couldn't look at because they are blocked by robots.txt, and
  • the number of pages that were not selected for inclusion in search results.


These numbers are point-in-time totals, not numbers of additions.

Note the important difference: the first graph shows pages indexed (ie added to Google). The advanced one shows pages crawled. If Google crawls a page and finds it hasn't changed since last time they looked at it, then they don't bother indexing it again.

If some of your posts aren't in Google anymore, and the graph of pages indexed has a sudden drop, then most likely a change that you made at that time has caused a problem with indexing or includign your results.

Of course just knowing when things happened doesn't tell you why they happened.  In my case, I remember adding the Archive gadget into Blogger-HAT's layout.   Obviously I didn't stop the pages from getting indexed (though it does make me bitter and twisted that Google doesn't just handle this for us!)

What this does point out, though, is that if you're serious about managing your site for maximum searchability (SEO-friendliness), you should keep a note of the date and time of any structural changes you make, so that you can compare this with diagnostic tools and see if there's a link between the changes and changes in your visits, indexing etc.

adsense
 
 
Copyright © blog
Blogger Theme by Blogger Designed and Optimized by Tipseo