Semantic Blogging Redux

A while back I posted something about WordPress’ taxonomy model.  At the time I thought it was clever and thought we should use something like it for the DotNetNuke Blog module.  Now, I’m less enamored with it.

Here’s why.

To recap, have a look at this database diagram:

The seeming coolness stemmed from the decision to make “terms” unique, regardless of their use, and to build various taxonomies from them using the wp_11_term_taxonomy structure.  So let’s say you have the term “point-and-shoot”, and you use that as both a tag and a category.  “Point-and-shoot” exists once in the wp_11_terms table and twice in the wp_11_term_taxonomy table – each entry indicating the term’s inclusion in two different structures.  This seems useful because the system “understands” that the tag “point-and-shoot” and the category “point-and-shoot” both mean the same thing.

But is that always a safe assumption?

Consider the case of a photo blog, where the writer is posting photos and writing a little about each.  This photographer has a professional studio, and also shoots portraits in public locations, as well as impromptu shots at parties.

This photographer has set up a category structure indicating the situation in which the photo was taken “Studio/Location/Point-and-Shoot” (meaning, an impromptu photograph) and another structure or set of tags indicating what sort of camera was used “Point-and-Shoot”, as opposed to “DSLR”.

Same term.  Two completely different meanings.  Use that term as a search filter and you will get two sets of results, possibly mutually exclusive.

And so – to truly be “semantic”, the term cannot exist independently of its etymology (as expressed in the category hierarchy) as WordPress attempts to implement.

Putting the “Perma” Back in Permalink

The DotNetNuke Blog module has had a checkered history with Permalinks.  The earliest versions did not use them, so old blog entries never had a Permalink created for them.  Instead, links to entries were generated programmatically, on the fly.

It’s been trouble ever since.

Permalinks were later introduced, but the old code that generated links on the fly was allowed to remain.  In theory, this shouldn’t cause any problems so long as everyone is using the same rules to create the link.  In reality, depending on how a reader navigated to a blog entry, any number of URL formats might be used.  A particular blog entry might reside at any number of URLs.

From a readers point of view, there is really no issue with an entry residing at various URLs.  But from an SEO perspective, it’s a bad idea for a given piece of content to reside at more than one URL: it dilutes the linkback concentration that search engines use to determine relevance.

It’s also a troubleshooting nightmare.  Since there are so many different places in the code where URLs are being created, if a user discovers an incorrect or malformed URL, the source of the problem could be any number of places.

Finally, it’s a maintenance annoyance.  If you are publishing content using the blog, you don’t want URLs that change.  You want the confidence of knowing that when you publish a blog entry, it resides at one URL, and that URL is reasonably immutable.  The old system that generated URLs on the fly was subject to generating different URLs if there were various ways for users to navigate to the blog.

The Permalink Vision

The Blog team has a vision of where we want to take URL handling:

  1. All Blog entries should reside at one URL only (the Permalink).
  2. The Permalink URL for the entry should be “permanently” stored in the database, not generated “on the fly”.
  3. The Permalink should be SEO-friendly.
  4. Once created, the system will never “automatically” change your Permalink URLs for you.

We’ve come really close to achieving this vision in 03.05.x.

With the 03.05.00 version of the Blog module, we have undertaken an effort to ensure that the Permalink (as stored in the database) is always used for every entry URL displayed by the module.  After releasing 03.05.00 we discovered a few remnants of old code, and believe that as of the 03.05.01 maintenance release we will have ensured that all URLs pointing to entries are always using the Permalink stored in the database.

But there was a problem with changing all the URLs to use the Permalink stored in the database.  Since old versions of the Blog didn’t generate Permalinks (and some generations generated broken Permalinks) how could we safely use Permalinks from the database for all entry URLs?  The answer was to force the module to regenerate all the Permalinks on first use.  When you first use the Blog module, it will automatically regenerate all of your Permalinks for the entire portal, ensuring that the database is correctly populated with the appropriate URLs for each entry.

The decision to force all users to regenerate their Permalinks was a measured one.  Obviously, automatically forcing Permalink regeneration violates the third rule listed above, and theoretically could result in URLs for some entries to “move around” depending on how broken their Permalinks were.  But we believed that we required a one-time fix to get all entries on the new Permalink approach, and that this approach was only likely to “move” entries that had truly broken Permalinks in the first place.

Going forward we are confident that this represents the best approach to finally resolving the Permalink issue once and for all.

SEO-Friendly URLs and Permalinks

With version 03.05.00, we introduced SEO-friendly URLs that change the ending of our URLs from “default.aspx” to “my-post-title.aspx”.  We also introduced a 301 redirect that automatically intercepts requests for entries at the old “unfriendly” URL, and redirects to the new “friendly” URL.

When you install 03.05.00, it will by default still be using the old, “unfriendly” URLs.  If you want SEO-friendly URLs, you must enable them using a setting found in Module Options.

When you change the setting, only your new posts will use the new SEO-friendly URLs.  This is consistent with the Third Rule: you shouldn’t click an option and suddenly have all of your existing URLs changed for you.  If you want to make your old entries SEO-friendly, you must change the option, then use the “Regenerate Permalinks” option to apply the change to all entries.

A Couple of Issues

As I mentioned earlier, after the release of 03.05.00, we discovered a few areas in the code where the system was still generating URLs “on the fly” instead of using the Permalink.  So, if you’re using 03.05.00, and change the “SEO-Friendly” setting, you will discover that some of your existing URLs do, in fact, change to the new format.  This is a bug that is being corrected in 03.05.01.

There is one other way that a Permalink URL might change unexpectedly.  If you use the SEO-friendly URL setting, the module uses the post title to create the “friendly” portion of the link.  If, after you post an entry, you change its title, the URL will change.  Fortunately, links to the old URL will be caught by the 301 handler and redirected correctly.  This problem will not be corrected in version 03.05.01 but will probably remain until version 4.

Thoughts About Version 4

Version 4 of the Blog module is still on the back of a cocktail napkin.  No hard and fast decisions have been made yet about its feature set.  But I will preview where I think version 4 might go, at least as regards Permalinks and SEO-friendliness.

In version 4, I believe we will introduce the concept of a “slug” to the blog module.  A slug is simply a unique, SEO-friendly text string that is used to create a portion of a URL and is unchangeable except by the blog editor.  So, for example, given the URL, the slug is “my-post-title”.

How are slugs different from what we have today?  The only difference is that today, the string “my-post-title” is generated automatically from the title, and if the title changes, the string changes.  With a slug, the string would not change automatically if the title changes, but could only be changed manually.  Slugs ensure that once an entry is posted, it stays put unless the publisher expressly decides to move it.

If we do deploy slugs, then there will have to be a few other changes.

First of all, the entire point of using slugs is that, once created, they can only be changed manually.  That means that the “Regenerate Permalinks” functions will have to be removed.  Once each entry has a slug, it can’t be “regenerated” programmatically.  The very idea of “regenerating” becomes moot.

Secondly, the point of a slug is to provide the SEO-friendly ending to each URL.  It presumes that the blog is “SEO-friendly”.  If you aren’t “SEO-friendly” there is no slug.  So for version 4, we may make “SEO-friendliness” mandatory and force it on all blog entries, old and new.

“But wait!” you cry.  “I thought that the point of Permalinks was to ensure that the system would never again change my URLs, and here you are saying that in a future version, you’re going to change all my URLs whether I like it or not!”

Well, yeah.  Guilty as charged.

First off, think of this as the very last step in achieving SEO-friendly Permalinks that are truly and finally “perma”.  Once we achieve SEO-friendly slugs, we have made it all the way to the goal.  And this is really the only way to get there, at least, the only way that is easy to support and not confusing to the end-user.

Secondly, the 301 redirection built into the module should ensure that the transition from old URL to SEO-friendly slug is completely transparent to all users and to search engines.  All the old links will work, and they will correctly report the move to search engines, which will update themselves accordingly.  Thousands of Blog module users are already testing this in version 03.05.x, and I believe that by version 4 we will be confident in this approach.

Of course, all of this is speculative, since version 4 isn’t even in the design stage yet.  But I hope that this information helps illuminate how the Blog team is thinking about the module and where it is likely to go in the future.  And, as usual, your feedback is highly encouraged.

Taxonomy and SEO

Taxonomy is one of the least understood weapons available for SEO.  We all know the basics of effective SEO:

  • URLs constructed with relevant terms, avoiding parameterization
  • Each page can be accessed by only one URL
  • Effective use of keywords in the title tag
  • Use of keywords in H1 tags
  • Links back to the page from other pages

How does taxonomy fit into all of this?

I started a webzine in 1998 called  I built a custom CMS to run it, and spent a few years on SEO back before there was something called “SEO”.  In fact ProRec predates Google.  By the spring of 2000, ProRec consistently ranked in the top 10 search results on all relevant terms, usually in the top 3.  Due to many factors, some beyond my control, ProRec went dark in 2005 and was relaunched on DotNetNuke’s Blog module in 2007.  It no longer enjoys its former ranking glory, but I hope to use the lessons I learned to improve the Blog module in future versions.

One of the lessons I learned was the importance of effective use of taxonomy on SEO.  Designing and properly using effective taxonomy solves several problems:

  1. Populates META tags appropriately
  2. Encourages or enforces consistent use of similar keywords across the site
  3. Forms basis for navigation within the site, linking related pages
  4. Forms the basis for navigation outside the site, linking to other related information

Let’s look at these one at a time.

Populating META Tags

It’s true that META tags are not as important to search engines as they once were, but they are still used, and therefore still important.  Most blogging systems will take the keywords entered as Category or Tags and use them as META tags.  If you’re using DotNetNuke’s blog module, however, you’re out of luck.  The system simply doesn’t comprehend any kind of taxonomy and doesn’t let you inject keywords into the META tags except at the site level.  Opportunity missed.

When it comes to content tagging, a structured taxonomy (categories) offers benefits over ad-hoc keywords (tags).  The obvious reason is that a predefined and well-engineered taxonomy is more likely to apply the “right” words since a user manually entering tags on the fly can easily be sloppy or forget the appropriate term to apply.   The less obvious reason is that as a search engine crawls the site, it will consistently see the same words over and over again used to describe related content on your site.

Why is it important for the search engine to see the same words over and over again?  Because “spray and pray” (applying lots of different related words to a given piece of content) doesn’t cut it.  You don’t want to be the 1922th site on 100 different search terms.  You want to be the #1, #2, or #3 site on just a few.

So think of a search engine like a really stupid baby.  Your job is to “teach” the baby to use a few important words to describe stuff on your site.  Just like teaching a human, the more consistent you are, the more likely the search engine is to “learn” the content of your site and attach it to a small set of high-value terms.

Enforcing Keyword Usage

One of my main complaints about “tags” versus “categories” is that tags added to content on-the-fly tend to be added off the top of one’s head.  That’s fine for casual bloggers who just want to provide some simple indexing.  But if you are a content site with a lot of information about some particular subject, chances are that tagging like this can get you into trouble.  The reason for this is because on-the-fly tags often inadvertently split a cluster of information into several groups because two or three (or more) terms will be used interchangeably instead of just one.

Consider a site with a well-defined and structured taxonomy.  Let’s consider a very common application: a photography site primarily covering reviews of cameras and photography how-tos.  A solid taxonomy structure would probably include four indexes:

  • Manufacturer (Canon, Nikon, Lumix, etc..)
  • Product Model (EOS, D40, TZ3, etc..)
  • Product Type (DSLR, Rangefinder, micro, etc..)
  • Topic (Product Review, Lighting, Nature, Weddings, etc..)

Generally, the product reviews would be indexed by manufacturer, product model, and product type, with the “Topic” categorized as “Product Review”.  How-tos would be indexed by their topic (“Weddings”) as well as any camera information if the article covered the use of a specific camera.  For example, an article called “How to Improve Low-Light Performance of the Lumix TZ3” might be indexed thusly:

  • Manufacturer: Lumix
  • Product Model: TZ3
  • Product Type: Compact Digital
  • Topic: High ISO

Having a system that prompts the user to appropriately classify each article ensures that the correct keywords will be applied.  Getting the manufacturer and model correct is probably pretty easy.  It’s harder to remember the correct product type (“Compact Digital” versus “Compact”).  And remembering the right topic is a real challenge (“High ISO” versus “Low Light” versus “Exposure” or any of a hundred other terms I could throw at it).  Moreover, the user must to remember to apply all four keywords when the article is created.

We can see the value of focused keywords from this example.  At a site level, relevant keywords are at a high abstraction level, like “camera review”.  It’s unrealistic to think a web site could own a top search engine ranking for such a broad term.  At the time of this writing, Google shows almost 14 million web pages in the search result for “camera review”.  But a search for the new Nikon laser rangefinder “nikon forestry 550” returned only 138!  An early review on this product with the right SEO terms could easily capture that search space.

Having a system with four specific prompts and some kind of list is essential to keeping these indexes accurate.  Ideally the system provides a drop down or type-ahead list that encourages reuse of existing keywords.

Creating a Navigation System

Here’s where it all starts to come together.  Once you have a big pile of content all indexed using the above four indexes, the next obvious step is to create entry points into your content based on the index, and to cross-link related content by index.

On ProRec, we had five entry points into the content:

  • Main view (chronological)
  • Manufacturer index
  • Product Model index
  • Product Type index
  • Topic index

Needless to say, when a search engine finds a comprehensive listing of articles on your site, categorized by major topic, it greatly increases the relevance of those articles because the engine is able to better understand your content.  Think about it: right there under the big H1 tag that says “High ISO” is this list of six articles all of which deeply cover the ins and outs of low-light photography.  It’s a search engine gold mine.  Obviously it also helps users navigate your site and find articles of interest, too.

My favorite part of the magic, however, was using the taxonomy to create a “Related Articles” list on each article.  Say you’re reading a review of a Lumix TZ3.  We can use the taxonomy to display a list of articles about other Lumix cameras as well as other Compact Digital cameras.  On ProRec this was even more valuable, because ProRec reviews (and how-tos) many different types of gear and covers a lot of different topics.  Go to a review of a Shure KSM32 microphone, and here’s this list of reviews of other mics.

The “Related Articles” list immediately creates a web interconnecting each article to a set of the most similar articles on the site.  Instantly the search engine is able to make much more sense out of the site.  And, of course, readers will be encouraged to navigate to those other pages, increasing site stickiness.

More SEO Fun with Taxonomy

Once the system was in place I was able to extend it nicely.  For example, I created a Barnes & Noble Affiliate box that used the taxonomy to pull the most relevant book out of a list of ISBNs categorized using this same taxonomy and display it in a “Recommended Reading” box on the page.  So you’re reading an article called “Home Studio Basics” and right there on the page is “Home Studio Soundproofing for Beginners by F. Alton Everest” recommended to you.  The benefit to readers is obvious.  But there are SEO benefits, too, because search engines know “Home Studio Soundproofing for Beginners by F. Alton Everest” only shows up on pages dealing with soundproofing home studios.  Pages with that title listed on them (linked to the related page on Barnes & Noble) will rank higher than those that don’t.

You can start to see how quickly a simple “tagging” interface starts to break down.  You need the ability to create multiple index dimensions (like product, product type, and topic) as well as some system to encourage or enforce consistent use of the correct terms.  Otherwise, you’re doing most of the work, but only getting part of the benefit.

Taxonomy, Blogging, and DNN

Obviously, most casual bloggers don’t want to be forced into engineering and maintaining a predefined taxonomy.  That’s why “tagging” became popular.  Casual bloggers want to be able to add content quickly and easily and anything that makes them stop and think is a serious impediment to workflow.  So you just don’t see blog platforms with well-engineered categorization schemes, and you definitely don’t see any that allow for multiple category dimensions.

In my article “Blog Module Musings” I wondered aloud about what sort of people really use DotNetNuke as a blogging platform in the traditional sense of the word “blogging”.  My guess is that most people using DNN as a personal weblog probably have some personal reason for choosing DNN instead of any of the free and easy tools readily available like WordPress or Blogger.  So I have a belief about DNN that it isn’t a good platform for a “blog” per se, but it’s a great platform for content management and publishing.  My guess is that the DNN Blog module has much greater utility as a “publishing platform” instead of a “personal weblog”.

As such, I think it makes sense that DNN’s publishing module should offer more taxonomy power than the typical blog.  I also think that it’s possible, using well-designed user interfaces, to make a powerful taxonomy easy to manage.  My experience with ProRec demonstrated this.  It was very easy to manage ProRec’s various indices, primarily because I had a fat client to provide a rich user interface.  With Web 2.0 technologies, we can now provide these user experiences in the browser.

Blog Module Moving to Version 4

In a previous post I stated that the Blog module would offer an interim 3.6 release to provide users with a few more features before the team undertook the full-on rewrite to move the module to version 4.

Well, as it turns out, plans change.  The team has decided to go directly to version 4.  There will likely be a 3.5.1 release to patch up any bugs that surface after 3.5 is released, but no 3.6 “feature upgrade”.

This is really great news.  The team has grand plans for this module which are currently stymied by a few factors, including a lot of old deadwood in the code and poor developer productivity in the older VS 2003 environment.  Of course, the key reason is that DotNetNuke has officially left the .NET 1.1 environment so all new releases must be based on .NET 2.0.

New DotNetNuke MSDN-Style Help

Last night I was desperately seeking help for some DotNetNuke core classes, and I came up short.  Fortunately I was able to resolve my problem with a little help from Antonio, but I still wished I had a better help file available.

Well, today I discovered that Ernst Peter Tamminga has put together an MSDN-style help system for DotNetNuke.  Exactly what I was looking for.

If you do serious DNN development, this is a must-have.  Thanks Ernst!

Blog 3.5.0 Set for Release

After a few months delay, the Blog team is set to release the 3.5.0 version of the DNN Blog module.

I won’t go into the details of the reasons behind the holdup.  Our team leader has done a good job of that here, if you’re interested.  Suffice to say, sometimes, there are circumstances beyond one’s control.

I am not sure at this point if there will still be a 3.6 interim, or if we’ll proceed directly to version 4.  I’m sure everyone knows my opinion!  At any rate, it’s good to be back on track.

Blog Team Announces Interim 3.6 Release

The DNN Blog team has announced plans to release an interim 3.6 release to provide some final changes before undertaking the effort to rewrite the code for the version 4.x release.

The 3.6 feature set has not been made official, but current plans are to add support for BlogML, tagging, 301 redirects, and custom RSS URLs.

All effort will be made to minimize scopecreep, since it is a high priority to move forward with 4.x, but we felt that these critical changes needed to happen sooner than could be provided by 4.x.

Exciting New Enhancements to the Blog Module Comments Section

Identification icons are quickly becoming a popular way for bloggers to encourage responsible use of blog comments.  A variety of solutions are available, all of which aim to provide useful benefits to the blog reader.

Identification Icons

Identification icons are quickly becoming a popular way for bloggers to encourage responsible use of blog comments.  A variety of solutions are available, all of which aim to provide useful benefits to the blog reader.

Identification icons allow blog readers to personalize their posts just like a forum avatar.  The additional personalization may encourage responsible posting as well as increased commenting and discussion.  Identification icons prevent users from impersonating one another, leading to more responsible posting.  They also may prevent flaming, since a user tied to an identification icon will have to take additional steps to obfuscate their identity.

Version 3.4.1 of the Blog module supports all popular identification options available today:

  • Gravatar
  • Identicon
  • Wavicon
  • MonsterID

Gravatar, or Globally Recognized Avatar, provides an easy service that allows users to upload an image avatar that follows them from site to site.  This simple solution ties the image to an email address, providing a very easy way for blog software to retrieve the image.

Identicons, Wavicons, and MonsterIDs are automatically generated images that can be used in place of a Gravatar in the event that the user does not want to create one.  These images are generated by a hash of the user’s email address (or IP address, in the event that the user chooses not to enter an email address).

One feature that may be unique to DotNetNuke is the ability for the user to instantly preview their image in the comments section before submitting their comments.  As soon as the user enters their email address and tabs out of the email field, their Gravatar (or other icon) will automatically display in a preview area.  As far as we know, this preview capability doesn’t exist in any other blogging platform.

Other Comments Changes

We’ve included a few other improvements to the comments area as well:

  • Users can now enter a website address with their comment.  This feature can be enabled or disabled by the blog owner
  • Blog owner can show or suppress unique comment titles
  • Blog owner’s comments have a unique CSS tag, enabling them to stand out from other comments

Blog Module Musings

A lot needs to happen to make the DotNetNuke Blog module truly competitive.  Part of the problem is that there are competing needs for the module:

  • Use as a “personal” weblog
  • Use as a publishing platform

Joe Blogger

Of course, the Blog module was originally meant to serve the needs of… bloggers, that is to say, people writing journal-style weblogs.  Like this one.  That’s why it’s called a BLOG module, stupid.  OK, but suffice to say, there are particular needs of a personal weblog application:

  • Easy to use, simple
  • No need for workflow tools
  • Most will be single-author
  • Great looking, easily skinned
  • All the coolest social networking yada yada
  • etc..

In other words, a personal weblog needs to compete effectively with WordPress, Blogger, TypePad, and other popular blogging tools by offering an app that works at least as well (which will be hard, considering that several of these are free, including the hosting).  To that end, some of the features that the Blog module needs to consider are:

  • Built-in skinning (perhaps a set of 5-10 built-in template skins)
  • Email-to-blog capability
  • Metaweblog support (already on its way)
  • Social networking support (already on its way)
  • Categorization & tagging

It might also be nice if there was a way to do a DNN “blog” install, in other words, a single installer that performed the basic DNN install as well as getting the basic Blog module installed and configured.  Of course, a DNN install is still not as simple as it ought to be, and until it is, there really is no point in refining the install process of the Blog module.  DNN itself is already a sufficiently high hurdle that most casual users will shy away from using it just for blogging.

Which raises an interesting question:  are DNN bloggers every really going to be casual users?

Casual DNN Users?

After all, how many people really run DNN just to operate a personal weblog?  Doesn’t the implicit power – and complexity – of DNN in and of itself filter out almost all casual bloggers?  I mean, if I just wanted to start blogging, there’s no way I’d use DNN.  I operate this blog on DNN because I’m already running several DNN sites.  And I still question my logic in setting this up as a DNN blog instead of using WordPress or Blogger.

Seems to me that for most Blog module users, what they have is a website, part of which is a blog.  Think about this.  If they’re running DNN, it’s very likely that they’re doing “other stuff” with it other than just running a blog.

Which raises some interesting points:

  • The blog may be much more likely to be multi-user
  • The blog may be a kind of substitute for the Announcements module or FAQ, providing company information
  • The blog may be a publishing platform more than a weblog

DNN Publisher

Which brings me to the other competing need for the Blog module – the publishing platform.  If you need to manage content – by which I mean significant amounts of printed material – in DNN, then the Blog module quickly becomes your only choice, short of purchasing a publishing tool.  Nothing else in the DNN module base comes close to meeting this need, with the possible exception of the Announcements module.

As a DNN consultant, I always advise against purchsing modules if it is at all possible to conform a preexisting base module to the need.  I see the base modules as part of the open-source draw of DNN, and while they may evolve more slowly than commerical modules, they’re likely to have good quality and ultimately stand the test of time better than commercial modules.  I want to stay on open-source code as long as possible.

That’s why I chose the Blog module for, instead of buying a module or building one of my own from scratch.  The fact is, it meets about 65% of my need.  Really, just barely enough to limp along.  I don’t really want it to look like a weblog, with the calendar and month list being the primary navigation tool.  But I’m willing to make do, because I get so much for free.  Free is good.

The needs of people using the Blog module as a publishing platform (like me) include almost all the needs of the casual bloggers, but add a few twists:

  • Increased need for workflow
  • Different (non-traditional) navigation
  • Better multi-author / multi-department support
  • Different “main page” support

This is by no means comprehensive, but hits the high points.  With only a few improvements, the Blog module becomes “DNN Publisher” – a flexible publishing platform.  Suddenly, this tool can support lots of publishing operations, specifically, content sites (like newspapers and magazines) and multi-department corporate sites.

Push and Pull

All this flexibility will come at the price of complexity.  I believe that with some elegant design, we can minimize the complexity and maximize the flexibility, but increased complexity is probably a given.  So there will be inevitable battles between the people who want to use the module as a simple blog platform, and others who want to use it as a more powerful publishing platform.

Which takes me back to that earlier question: how many people really run DNN just to operate a personal weblog?  Doesn’t the implicit power – and complexity – of DNN in and of itself filter out almost all casual bloggers?  Is it really reasonable to expect DNN to compete with a free WordPress account for the market of people seeking to journal about their trip to Spain?

If there are significant disagreements about the direction of the Blog module, I think there will need to be some sort of informed answer to these questions.  Perhaps a survey or some kind of market research will be in order.  At any rate, I intend to push the Blog module in the direction of “DNN Publisher”, because I think that’s it’s unique value and a better fit with likely DNN users, and if I take a few bullets, well, they’ll be neither the first nor the worst.

The Great Child Blog Opportunity

Currently, the Blog module implements a concept called “child blogs”.  These are sub-blogs of a parent blog, which are broken out in the user interface as separate sections, each with its own administrative rules.

Since the Blog module doesn’t currently support the idea of categories or tagging, a lot of people use the child blog functionality as a means of providing a categorization function.  In fact this is often the recommended use of child blogs.  IT Crossing’s lovely metaPost product takes this one step further, by integrating the categorization function in LiveWriter to the child blog capability in the Blog module.  Unfortunately, this misuse of the existing functionality is going to have to change, and the sooner, the better.

Currently, the Blog module implements a concept called “child blogs”.  These are sub-blogs of a parent blog, which are broken out in the user interface as separate sections, each with its own administrative rules.

Since the Blog module doesn’t currently support the idea of categories or tagging, a lot of people use the child blog functionality as a means of providing a categorization function.  In fact this is often the recommended use of child blogs.  IT Crossing’s lovely metaPost product takes this one step further, by integrating the categorization function in LiveWriter to the child blog capability in the Blog module.  Unfortunately, this misuse of the existing functionality is going to have to change, and the sooner, the better.

What the Hell is a Child Blog Anyway?

I can’t read the minds of the original designers.  However, a peek at the database helps to shed some light on the issue, and it reveals an altogether different (and better) purpose for child blogs than categorization.

Child blogs are implemented as complete blogs in their own right, the only difference being that they have a “ParentID” that points to an uber-blog.  This parent-child relationship allows child blogs to “roll up” into a parent blog in a very useful way.

Why is the child blog structure a poor place into which to pour the concept of a category?

1. Blog entries can belong to more than one category / topic
2. Categories, as a concept, do not possess all the other managerial attributes of a blog (e.g. “allow anonymous comments” is not an attribute of a “category”)
3. Categories do not have “owners”
4. There can be more than one category hierarchy (for example, an auto-review publication might have an index of cars by manufacturer, and another index of cars by car type)

It seems apparent to me that a category scheme for blog entries ought look a lot different from that of a child blog.  In summary, there ought to be an organized list of category “topics”, with a many-to-many relationship between blog entries and topics.

What, then, are child blogs?  And how should we steer their use going forward?

To me, it’s clear that child blogs represent “sub-publications”.  In a large publishing organization, say, a newspaper, different sections of the paper are managed by different organizations.  They all roll up into a single publication, but by and large each publication is an almost stand-alone entity.

Consider the sports section of a major newspaper.  It has its own editorial staff, its own submission policies, maybe even its own index.  There really isn’t any reason that it couldn’t be sold separately as a sports-only specialty newspaper (and some are), except for the fact that it’s bundled together with the rest of the content.

Or consider the test-drive section of a car magazine.  There is a dedicated team of reviewers who do nothing but drive and rate cars.  They have their own procedures for this.  They’re answerable to a Practices editor who manages the review practices for fairness.  They have their own team of people who manage the pipeline of new cars that need to be tested and evaluated.  Really, there could be a publication that is nothing but car reviews, except that the auto magazine has decided to include it with all of the other sections in a bundle.

I think it’s clear that child blogs ought to represent “sections” – pieces of the publication that are broken out for arbitrary business reasons, and not necessarily because they relate to common topics.  Now, a given publication might very well like to split out its publication according to some major topical divisions, and child blogs could be the right way to handle this task.  But in this case it’s simply a coincidence that child blogs are split out according to major topical divisions.

The other thing that a child blog shouldn’t necessarily be is a set of entries by a given author.  Of course, it may be useful in a given publication (a moblog, for example) to give each author his own child blog and let each one be individually managed.  But it shouldn’t be necessary.

Child blogs really map to the idea of organizational structure.  Some companies are organized by product.  Others by function.  Some are organized geographically, others by technology.  Some are split in order to serve government customers separately from private industry.  In every case it’s a mistake to confuse the org structure (e.g. “product organization”) from the thing itself (e.g. “actual products”).  Likewise, child blogs ought to be used to solve the arbitrary managerial or organizational needs of a publication – which may map to high-level topics or authors / author groups, but only by coincidence.

Get to the Point Already

Where am I leading this?  Let’s get to the punch line.  I want to separate the semantics from the presenation.

Categories and tags – which I am increasingly beginning to think about collectively as “topics” – represent ways to chunk the content by its meaning, organizing it hierarchically (as with a category) or flat (as with a tag).  There are different user interface widgets we can employ to get at the content by topic, including widgets that display “index views”, widgets that display “tag clouds”, and widgets that display “related entries”.  Ultimately, however, we’re just talking about different ways of doing the same thing – grouping entries by topic.

Child blogs – which I am increasingly beginning to think about as “sections” – represent ways to chunk the content according to arbitrary managerial need.  Child blogs should allow a large publishing team to segment the ownership of pieces of the publication and roll up the content into a coherent whole.  In fact the current design of the Blog module comes surprisingly close to this goal.  If any workflow capability is attached to the Blog module, it’s clear that child blogs represent the set of structures needed to pull it off, each offering its own set of editors and authors, approval workflows, roll-ups to parent teams, etc..  As with “topics”, there are different user interfaces we can employ to get at the content of “sections”, including widgets that allow individual child blogs to live on different DNN pages (each with its own look and feel) while rolling up to a “summary” page elsewhere, widgets that roll up the child blogs together by section (rather than grouping all the content together chronologically), or widgets that provide category “indexes” within a given section (like a sports section index).  Ultimately, however, we’re just talking about different ways of organizing content according to an arbitrary managerial need.

Authorship – which needs to be thought about independently from “blog ownership” and “sections” in a multi-author publication system.  Authorship is an attribute of an “entry”, while “editorship” is an attribute of a “section”.  While a given publication might create a different “section” for each “author” (e.g. a moblog) I think that ought to be the exception instead of the rule.  A given author might create content in various “sections”.  Yet there remains a need to be able to find content by author.  Ideally, we should provide widgets that can display content by author – even allowing a publication to create “author pages” – without necessarily creating a child blog for eah author.  So, for example, a moblog like CuzWeSaidSo would probably create one section per author – because that’s the idea behind the publication – whereas a product-review magazine like ProRec would have sections like “From the Editor”, “Product Reviews”, and “Industry News”, with a given author publishing content in any or all sections.

The Pain of Change

One reason I mentioned metaPost earlier in the article is because I believe that improvements in the blog module, as well as its support tools like metaPost, will lead more people to choose the DNN Blog module for their publication needs.  Let me be blunt: what we have now is a sort of bastardized system that solves none of the above problems well at all.  The longer we go on without making these critical structural changes, the more we doom the Blog module to a second-class existence.  Conversely, the sooner we tackle these problems, and the sooner we address the pain associated with making these changes, the easier it will be.  Delaying these changes only means more pain later on.

Tags, Categories… What’s the Difference?

I’m cooking up a categorization / tagging module for the DotNetNuke blog module, and really would like to get it right the first time.

I want to balance ease-of-use, practicality, and power.  Right away, the issue of managing a “true” hierarchical tag structure with multiple selection capability rears its ugly head.  Building a tree-style hierarchy manager is tricky, and can be confusing to users.  And then, assigning categories from a tree structure can be very clumsy, especially if the tree is very tall.  I’ve seen this done with side-by-side listboxes & tree controls, and it’s always really clumsy.

If we set aside the idea of hierarchy for a second, and just look at good ways to apply tags, I like the way Amazon handles it by using a simple Ajax-enabled auto-suggestion textbox.  You can easily type a few characters of an appropriate tag, and the app will suggest the rest of the word.  More tags can be added with a comma delimiter.  If you want to add a brand new tag, you just type it.  This eliminates the need of a long list with checkboxes or a UI for creating and managing tags.  A significant advantage is that it works for short lists yet scales up very well for long lists.   I strongly prefer this UI for applying tags to blog posts.

Trying to implement this with true hierarchies is trickier.  My idea is the use of a hierarchy delimiter.  The user can create hierarchies as deep as they like, and the auto-suggestion box helps them out.  So a blog post might be tagged up as follows:

Tags: [PlacesDallas; PlacesNew York; PlacesParis; FoodsSteak; FoodsFish; FoodsPizza; Time of DayMorning; Time of DayEvening]

The data-entry UI would automagically suggest each portion of the hierarchy at a time, so when the user types “Pl” the textbox responds “Places”, and the user hits the delimiter key “” to accept and begin entering the next hierarchy component:

Tags: (type) Pl
Tags: (UI responds) Places
Tags: (type) Places (user enters delimiter key)
Tags: (UI presents) Places[drop-down list of places in the places list, if list is < 10 entries]
Tags: (type) PlacesDal
Tags: (UI presents) Places[drop-down list: Dalhart Dallas Dalton]
Tags: (user selects Dalhart)
Tags: (UI presents) PlacesDalhart; (adds delimiter so user can begin entering next tag)

The data can be stored in the database using a traditional n-level-hierarchy (parent-child) structure.

A hierarchical list can then be presented:

..New York

Not Alone

In my latest post, I mentioned that I really view tags and categories as two facets of the same thing.  Turns out I’m not alone.

This interesting post explains how WordPress implements tags and categories.  Like me, it’s apparent that they view both tags and categories ultimately as the same entity type: “terms”.  They then overlay a set of structures that enable various use of “terms” in the WP UI.

I think a similar – if not identical – approach should form the basis of tagging and categorization in the DNN blog module.

Tags, Categories, and Knowledge Management

I’m inclined to view “tags” and “categories” as just two facets of the same thing: knowledge hooks we apply to content to help us find it.  My gut tells me that these things tend to be viewed as two different beasts because of the way they’ve historically been implemented.

While I’d love to get the concept of tags and categories right the first time, and do it in a way that’s cohesive and elegant, in the end, compatibility with other approaches is probably the right solution.  So we may just need both tags and categories regardless of personal beliefs on my part.

Of course, one way to kludge both into a common solution – that isn’t terribly kludgy – is to have a predefined “category” called “tags” (or “Keywords”) that contain all the “tags” and which are wired through API to the correct fields in LiveWriter.  They could be presented in the DNN module as a separate field that works just like “categories” only without hierarchy separators.  It would still reside in the same data structures and produce the same lists and “related entries” capabilities.

I have a lot of experience with knowledge management solutions (I’ve been doing KM since 1993), and think that “best of both worlds” is really the best approach, because it offers the flexibility of tags with the structure of categories – in other words, it’s a lot more like the way the mind manages these knowledge structures.  My experience is that “structured categories” always start out as “flexible keywords”.   At some point, there is sufficient comprehension to establish the structure that was previously invisible, and then your “keywords” become “categories”.

Let’s take a cue from Amazon.  They sell items in categories.  And items can have multiple categories.  They also assign author (artist) and other common attributes.  However, they only offer one “category” structure – a “genre” list.

They also offer tagging.  If you look at what sort of tags get created, it becomes plain that there are three sorts of tags:

1. Spurious tags – tags that duplicate existing attributes and shouldn’t be there in the first place, e.g. “Monty Python” tag associated with the movie “Monty Python and the Holy Grail” which has an artist of “Monty Python”, or tags which just don’t add practical value, like “Movies that Include the word ‘swallow'”.

2. Tags that ought to be part of a data structure that just doesn’t exist – e.g. “Graham Chapman” tag on “Holy Grail”, which really ought to be part of a structure called “Actors” but isn’t provided by Amazon.  Or “British Comedy” which really should have been a subset of the “Comedy” genre, but Amazon didn’t provide this option.  Or “Arthurian Legends” – a tag that could easily have been a category in the “Subject Matter” hierarchy.  Or “Party” – a tag that could well belong in a “Mood” category.

3. Tags that don’t seem like they should be part of a data structure, YET, because not enough material has been tagged on this dimension to understand the dimension.  For example, your hypothetical post that you wanted to tag “suggestions”, could easily have been a node in the “Article Type” category, along with “Ratings”, “Reviews”, “Comparisons”, and “Recipes”

Back in the olden days of doing KM in Lotus Notes, there was often great confusion about the difference between “Categories” and “Keywords” and the reality is that they’re both the same thing, with differing amounts of structure.

IMHO the problem that “tags” have shown up to address is the same one that “keywords” showed up to address – the problem of only offering one category dimension.  People get stuck in the paradigm and can’t get out.  Consider this article covering the subject – it’s clear that the author views “tags” and “categories” not as KM abstractions in and of themselves, but as artifacts of the particular implementation in WordPress.  WordPress has a particular implementation of Categories that lends itself to a limited use – e.g. you can’t have too many, because the sidebar list will be too long.  Well, that’s an artifact of putting the whole thing on a sidebar all at once, and not a consideration of the KM implications at all.

Consider that most books just have an “index”, which is a “subject category” structure.  But some reference books have indices for many different dimensions.  Likewise, most blogs just give you the capability to build the single “subject category” structure.  But if we built just one iota of flexibility into the “category view” module, then you could (for example) have one category hierarchy that is short and highly topical and shown on the sidebar (like WordPress) as well as other hierarchies that are deep and multi-layered, but viewed through a larger display on a different page, and other views like the “Related Entries” views that mine the hierarchies and just return the most relevant entries.

Not only does this offer a lot of UI flexibility for sites with a lot of structured content, but also, given the way the topics, keywords, and URLs are all associated, it’s a real SEO boon.  You can build a very strong semantic map into the linked content.

Having said all that, I will return to the position that I suppose we’ll have to support both tags and categories, but perhaps we can do it in a more powerful, elegant way than just duplicating a bunch of middle-of-the-road functionality.