Semantic Web

Semantic Blogging Redux

A while back I posted something about WordPress’ taxonomy model.  At the time I thought it was clever and thought we should use something like it for the DotNetNuke Blog module.  Now, I’m less enamored with it.

Here’s why.

To recap, have a look at this database diagram:

The seeming coolness stemmed from the decision to make “terms” unique, regardless of their use, and to build various taxonomies from them using the wp_11_term_taxonomy structure.  So let’s say you have the term “point-and-shoot”, and you use that as both a tag and a category.  “Point-and-shoot” exists once in the wp_11_terms table and twice in the wp_11_term_taxonomy table – each entry indicating the term’s inclusion in two different structures.  This seems useful because the system “understands” that the tag “point-and-shoot” and the category “point-and-shoot” both mean the same thing.

But is that always a safe assumption?

Consider the case of a photo blog, where the writer is posting photos and writing a little about each.  This photographer has a professional studio, and also shoots portraits in public locations, as well as impromptu shots at parties.

This photographer has set up a category structure indicating the situation in which the photo was taken “Studio/Location/Point-and-Shoot” (meaning, an impromptu photograph) and another structure or set of tags indicating what sort of camera was used “Point-and-Shoot”, as opposed to “DSLR”.

Same term.  Two completely different meanings.  Use that term as a search filter and you will get two sets of results, possibly mutually exclusive.

And so – to truly be “semantic”, the term cannot exist independently of its etymology (as expressed in the category hierarchy) as WordPress attempts to implement.

Taxonomy and SEO

Taxonomy is one of the least understood weapons available for SEO.  We all know the basics of effective SEO:

  • URLs constructed with relevant terms, avoiding parameterization
  • Each page can be accessed by only one URL
  • Effective use of keywords in the title tag
  • Use of keywords in H1 tags
  • Links back to the page from other pages

How does taxonomy fit into all of this?

I started a webzine in 1998 called  I built a custom CMS to run it, and spent a few years on SEO back before there was something called “SEO”.  In fact ProRec predates Google.  By the spring of 2000, ProRec consistently ranked in the top 10 search results on all relevant terms, usually in the top 3.  Due to many factors, some beyond my control, ProRec went dark in 2005 and was relaunched on DotNetNuke’s Blog module in 2007.  It no longer enjoys its former ranking glory, but I hope to use the lessons I learned to improve the Blog module in future versions.

One of the lessons I learned was the importance of effective use of taxonomy on SEO.  Designing and properly using effective taxonomy solves several problems:

  1. Populates META tags appropriately
  2. Encourages or enforces consistent use of similar keywords across the site
  3. Forms basis for navigation within the site, linking related pages
  4. Forms the basis for navigation outside the site, linking to other related information

Let’s look at these one at a time.

Populating META Tags

It’s true that META tags are not as important to search engines as they once were, but they are still used, and therefore still important.  Most blogging systems will take the keywords entered as Category or Tags and use them as META tags.  If you’re using DotNetNuke’s blog module, however, you’re out of luck.  The system simply doesn’t comprehend any kind of taxonomy and doesn’t let you inject keywords into the META tags except at the site level.  Opportunity missed.

When it comes to content tagging, a structured taxonomy (categories) offers benefits over ad-hoc keywords (tags).  The obvious reason is that a predefined and well-engineered taxonomy is more likely to apply the “right” words since a user manually entering tags on the fly can easily be sloppy or forget the appropriate term to apply.   The less obvious reason is that as a search engine crawls the site, it will consistently see the same words over and over again used to describe related content on your site.

Why is it important for the search engine to see the same words over and over again?  Because “spray and pray” (applying lots of different related words to a given piece of content) doesn’t cut it.  You don’t want to be the 1922th site on 100 different search terms.  You want to be the #1, #2, or #3 site on just a few.

So think of a search engine like a really stupid baby.  Your job is to “teach” the baby to use a few important words to describe stuff on your site.  Just like teaching a human, the more consistent you are, the more likely the search engine is to “learn” the content of your site and attach it to a small set of high-value terms.

Enforcing Keyword Usage

One of my main complaints about “tags” versus “categories” is that tags added to content on-the-fly tend to be added off the top of one’s head.  That’s fine for casual bloggers who just want to provide some simple indexing.  But if you are a content site with a lot of information about some particular subject, chances are that tagging like this can get you into trouble.  The reason for this is because on-the-fly tags often inadvertently split a cluster of information into several groups because two or three (or more) terms will be used interchangeably instead of just one.

Consider a site with a well-defined and structured taxonomy.  Let’s consider a very common application: a photography site primarily covering reviews of cameras and photography how-tos.  A solid taxonomy structure would probably include four indexes:

  • Manufacturer (Canon, Nikon, Lumix, etc..)
  • Product Model (EOS, D40, TZ3, etc..)
  • Product Type (DSLR, Rangefinder, micro, etc..)
  • Topic (Product Review, Lighting, Nature, Weddings, etc..)

Generally, the product reviews would be indexed by manufacturer, product model, and product type, with the “Topic” categorized as “Product Review”.  How-tos would be indexed by their topic (“Weddings”) as well as any camera information if the article covered the use of a specific camera.  For example, an article called “How to Improve Low-Light Performance of the Lumix TZ3” might be indexed thusly:

  • Manufacturer: Lumix
  • Product Model: TZ3
  • Product Type: Compact Digital
  • Topic: High ISO

Having a system that prompts the user to appropriately classify each article ensures that the correct keywords will be applied.  Getting the manufacturer and model correct is probably pretty easy.  It’s harder to remember the correct product type (“Compact Digital” versus “Compact”).  And remembering the right topic is a real challenge (“High ISO” versus “Low Light” versus “Exposure” or any of a hundred other terms I could throw at it).  Moreover, the user must to remember to apply all four keywords when the article is created.

We can see the value of focused keywords from this example.  At a site level, relevant keywords are at a high abstraction level, like “camera review”.  It’s unrealistic to think a web site could own a top search engine ranking for such a broad term.  At the time of this writing, Google shows almost 14 million web pages in the search result for “camera review”.  But a search for the new Nikon laser rangefinder “nikon forestry 550” returned only 138!  An early review on this product with the right SEO terms could easily capture that search space.

Having a system with four specific prompts and some kind of list is essential to keeping these indexes accurate.  Ideally the system provides a drop down or type-ahead list that encourages reuse of existing keywords.

Creating a Navigation System

Here’s where it all starts to come together.  Once you have a big pile of content all indexed using the above four indexes, the next obvious step is to create entry points into your content based on the index, and to cross-link related content by index.

On ProRec, we had five entry points into the content:

  • Main view (chronological)
  • Manufacturer index
  • Product Model index
  • Product Type index
  • Topic index

Needless to say, when a search engine finds a comprehensive listing of articles on your site, categorized by major topic, it greatly increases the relevance of those articles because the engine is able to better understand your content.  Think about it: right there under the big H1 tag that says “High ISO” is this list of six articles all of which deeply cover the ins and outs of low-light photography.  It’s a search engine gold mine.  Obviously it also helps users navigate your site and find articles of interest, too.

My favorite part of the magic, however, was using the taxonomy to create a “Related Articles” list on each article.  Say you’re reading a review of a Lumix TZ3.  We can use the taxonomy to display a list of articles about other Lumix cameras as well as other Compact Digital cameras.  On ProRec this was even more valuable, because ProRec reviews (and how-tos) many different types of gear and covers a lot of different topics.  Go to a review of a Shure KSM32 microphone, and here’s this list of reviews of other mics.

The “Related Articles” list immediately creates a web interconnecting each article to a set of the most similar articles on the site.  Instantly the search engine is able to make much more sense out of the site.  And, of course, readers will be encouraged to navigate to those other pages, increasing site stickiness.

More SEO Fun with Taxonomy

Once the system was in place I was able to extend it nicely.  For example, I created a Barnes & Noble Affiliate box that used the taxonomy to pull the most relevant book out of a list of ISBNs categorized using this same taxonomy and display it in a “Recommended Reading” box on the page.  So you’re reading an article called “Home Studio Basics” and right there on the page is “Home Studio Soundproofing for Beginners by F. Alton Everest” recommended to you.  The benefit to readers is obvious.  But there are SEO benefits, too, because search engines know “Home Studio Soundproofing for Beginners by F. Alton Everest” only shows up on pages dealing with soundproofing home studios.  Pages with that title listed on them (linked to the related page on Barnes & Noble) will rank higher than those that don’t.

You can start to see how quickly a simple “tagging” interface starts to break down.  You need the ability to create multiple index dimensions (like product, product type, and topic) as well as some system to encourage or enforce consistent use of the correct terms.  Otherwise, you’re doing most of the work, but only getting part of the benefit.

Taxonomy, Blogging, and DNN

Obviously, most casual bloggers don’t want to be forced into engineering and maintaining a predefined taxonomy.  That’s why “tagging” became popular.  Casual bloggers want to be able to add content quickly and easily and anything that makes them stop and think is a serious impediment to workflow.  So you just don’t see blog platforms with well-engineered categorization schemes, and you definitely don’t see any that allow for multiple category dimensions.

In my article “Blog Module Musings” I wondered aloud about what sort of people really use DotNetNuke as a blogging platform in the traditional sense of the word “blogging”.  My guess is that most people using DNN as a personal weblog probably have some personal reason for choosing DNN instead of any of the free and easy tools readily available like WordPress or Blogger.  So I have a belief about DNN that it isn’t a good platform for a “blog” per se, but it’s a great platform for content management and publishing.  My guess is that the DNN Blog module has much greater utility as a “publishing platform” instead of a “personal weblog”.

As such, I think it makes sense that DNN’s publishing module should offer more taxonomy power than the typical blog.  I also think that it’s possible, using well-designed user interfaces, to make a powerful taxonomy easy to manage.  My experience with ProRec demonstrated this.  It was very easy to manage ProRec’s various indices, primarily because I had a fat client to provide a rich user interface.  With Web 2.0 technologies, we can now provide these user experiences in the browser.

More on Tagging

I stumbled across a bit of text that clarified an earlier discussion on tagging:

Hierarchical: indicates a parent-child (vertical) relationship like cat and dog are children of mammals)

Association: indicates a “similar to” (horizontal) relationship like mammals is similar to animals.

Bingo!  This is what people think of when they create categories and tags.  Categories are hierarchical, and tags are associative.  The problem is – they’re both right and wrong.

They’re right, because this is in fact what categories and tags provide.  But they’re wrong in the sense that all knowledge is hierarchical, because all human comprehension is based on sets, and sets are inherently hierarchical.

This proves an earlier point.  It isn’t the case that some knowledge is hierarchical and some isn’t.  It’s just that some topics are members of hierarchies that haven’t been defined yet.  “Mammals” isn’t similar to “animals”.  Mammals are animals.

“No problem,” you rejoin.  “That’s just a bad example.”  To which I reply: prove it.  Show me an example of an association that relates two topics yet isn’t part of a definable hierarchy.  By definition, you can’t, because the presence of an association automatically implies some set within which both topics belong.

Now, when it comes to pouring this into software, an obvious fact springs to mind: we can’t possibly be expected to have a perfectly complete taxonomy available for use within our publishing platform.  Instead, we need a system that is flexible enough to let us build as much or as little hierarchical structure as we need, and then to apply “associative” tags for topics that don’t fit into our structure.  Furthermore, it would be ideal if there was a way to “round up” topics that aren’t part of the structure, and fit them in ex post facto.

Tags, Categories… What’s the Difference?

I’m cooking up a categorization / tagging module for the DotNetNuke blog module, and really would like to get it right the first time.

I want to balance ease-of-use, practicality, and power.  Right away, the issue of managing a “true” hierarchical tag structure with multiple selection capability rears its ugly head.  Building a tree-style hierarchy manager is tricky, and can be confusing to users.  And then, assigning categories from a tree structure can be very clumsy, especially if the tree is very tall.  I’ve seen this done with side-by-side listboxes & tree controls, and it’s always really clumsy.

If we set aside the idea of hierarchy for a second, and just look at good ways to apply tags, I like the way Amazon handles it by using a simple Ajax-enabled auto-suggestion textbox.  You can easily type a few characters of an appropriate tag, and the app will suggest the rest of the word.  More tags can be added with a comma delimiter.  If you want to add a brand new tag, you just type it.  This eliminates the need of a long list with checkboxes or a UI for creating and managing tags.  A significant advantage is that it works for short lists yet scales up very well for long lists.   I strongly prefer this UI for applying tags to blog posts.

Trying to implement this with true hierarchies is trickier.  My idea is the use of a hierarchy delimiter.  The user can create hierarchies as deep as they like, and the auto-suggestion box helps them out.  So a blog post might be tagged up as follows:

Tags: [PlacesDallas; PlacesNew York; PlacesParis; FoodsSteak; FoodsFish; FoodsPizza; Time of DayMorning; Time of DayEvening]

The data-entry UI would automagically suggest each portion of the hierarchy at a time, so when the user types “Pl” the textbox responds “Places”, and the user hits the delimiter key “” to accept and begin entering the next hierarchy component:

Tags: (type) Pl
Tags: (UI responds) Places
Tags: (type) Places (user enters delimiter key)
Tags: (UI presents) Places[drop-down list of places in the places list, if list is < 10 entries]
Tags: (type) PlacesDal
Tags: (UI presents) Places[drop-down list: Dalhart Dallas Dalton]
Tags: (user selects Dalhart)
Tags: (UI presents) PlacesDalhart; (adds delimiter so user can begin entering next tag)

The data can be stored in the database using a traditional n-level-hierarchy (parent-child) structure.

A hierarchical list can then be presented:

..New York

Not Alone

In my latest post, I mentioned that I really view tags and categories as two facets of the same thing.  Turns out I’m not alone.

This interesting post explains how WordPress implements tags and categories.  Like me, it’s apparent that they view both tags and categories ultimately as the same entity type: “terms”.  They then overlay a set of structures that enable various use of “terms” in the WP UI.

I think a similar – if not identical – approach should form the basis of tagging and categorization in the DNN blog module.

Tags, Categories, and Knowledge Management

I’m inclined to view “tags” and “categories” as just two facets of the same thing: knowledge hooks we apply to content to help us find it.  My gut tells me that these things tend to be viewed as two different beasts because of the way they’ve historically been implemented.

While I’d love to get the concept of tags and categories right the first time, and do it in a way that’s cohesive and elegant, in the end, compatibility with other approaches is probably the right solution.  So we may just need both tags and categories regardless of personal beliefs on my part.

Of course, one way to kludge both into a common solution – that isn’t terribly kludgy – is to have a predefined “category” called “tags” (or “Keywords”) that contain all the “tags” and which are wired through API to the correct fields in LiveWriter.  They could be presented in the DNN module as a separate field that works just like “categories” only without hierarchy separators.  It would still reside in the same data structures and produce the same lists and “related entries” capabilities.

I have a lot of experience with knowledge management solutions (I’ve been doing KM since 1993), and think that “best of both worlds” is really the best approach, because it offers the flexibility of tags with the structure of categories – in other words, it’s a lot more like the way the mind manages these knowledge structures.  My experience is that “structured categories” always start out as “flexible keywords”.   At some point, there is sufficient comprehension to establish the structure that was previously invisible, and then your “keywords” become “categories”.

Let’s take a cue from Amazon.  They sell items in categories.  And items can have multiple categories.  They also assign author (artist) and other common attributes.  However, they only offer one “category” structure – a “genre” list.

They also offer tagging.  If you look at what sort of tags get created, it becomes plain that there are three sorts of tags:

1. Spurious tags – tags that duplicate existing attributes and shouldn’t be there in the first place, e.g. “Monty Python” tag associated with the movie “Monty Python and the Holy Grail” which has an artist of “Monty Python”, or tags which just don’t add practical value, like “Movies that Include the word ‘swallow'”.

2. Tags that ought to be part of a data structure that just doesn’t exist – e.g. “Graham Chapman” tag on “Holy Grail”, which really ought to be part of a structure called “Actors” but isn’t provided by Amazon.  Or “British Comedy” which really should have been a subset of the “Comedy” genre, but Amazon didn’t provide this option.  Or “Arthurian Legends” – a tag that could easily have been a category in the “Subject Matter” hierarchy.  Or “Party” – a tag that could well belong in a “Mood” category.

3. Tags that don’t seem like they should be part of a data structure, YET, because not enough material has been tagged on this dimension to understand the dimension.  For example, your hypothetical post that you wanted to tag “suggestions”, could easily have been a node in the “Article Type” category, along with “Ratings”, “Reviews”, “Comparisons”, and “Recipes”

Back in the olden days of doing KM in Lotus Notes, there was often great confusion about the difference between “Categories” and “Keywords” and the reality is that they’re both the same thing, with differing amounts of structure.

IMHO the problem that “tags” have shown up to address is the same one that “keywords” showed up to address – the problem of only offering one category dimension.  People get stuck in the paradigm and can’t get out.  Consider this article covering the subject – it’s clear that the author views “tags” and “categories” not as KM abstractions in and of themselves, but as artifacts of the particular implementation in WordPress.  WordPress has a particular implementation of Categories that lends itself to a limited use – e.g. you can’t have too many, because the sidebar list will be too long.  Well, that’s an artifact of putting the whole thing on a sidebar all at once, and not a consideration of the KM implications at all.

Consider that most books just have an “index”, which is a “subject category” structure.  But some reference books have indices for many different dimensions.  Likewise, most blogs just give you the capability to build the single “subject category” structure.  But if we built just one iota of flexibility into the “category view” module, then you could (for example) have one category hierarchy that is short and highly topical and shown on the sidebar (like WordPress) as well as other hierarchies that are deep and multi-layered, but viewed through a larger display on a different page, and other views like the “Related Entries” views that mine the hierarchies and just return the most relevant entries.

Not only does this offer a lot of UI flexibility for sites with a lot of structured content, but also, given the way the topics, keywords, and URLs are all associated, it’s a real SEO boon.  You can build a very strong semantic map into the linked content.

Having said all that, I will return to the position that I suppose we’ll have to support both tags and categories, but perhaps we can do it in a more powerful, elegant way than just duplicating a bunch of middle-of-the-road functionality.