Contentation Re-considered

Contentation Re-considered

Stéphane Croisier  //  Sharing ideas on the future of (Open Source) WCM, Portals, ECM and Social Software. Product Strategy Manager at Jahia (www.jahia.com). Follow me on Twitter: http://twitter.com/scroisier

Oct 9 / 1:39am

CMS : Your next content hypercube?

Not so far away the main purpose of acquiring and installing a CMS was to ease site building and page creation. With the apparition of Web 2.0 technology this paradigm is shifting away. They key question is no more to know how to ease content creation and to delegate it to non technical users but rather to offer better content filtering, repurposing and aggregation capabilities (among other kind of additional added values).

One of the previous criteria for CMS classification was based on their “page-based” versus their “content-based” approach. “Page-based” systems were considered as low-end or lower mid-range systems due to their poorer content reuse, multi-channel, multilingual and multi-site capabilities (e.g: 5 ways to improve content reuse). Thanks to the ramping commoditization of a new generation of content repositories (JCR based, CMIS compliant repositories,…) and to their free and open source reference implementations, such a problem tends to disappear bit by bit.

But Content Reuse especially applied into a more and more interconnected and federated content universe is also now brining new challenges.

In many aspects Content Repositories could now be associated to some kind of Business Intelligence oriented hypercubes excepted that they are of course mainly focused on storing non or semi-structured content items rather than very structured data. Each content developers can now access to its “multidimensional content cubes” according to dozens of different criterias (e.g: a sub-site, a language; a content type, a metadata,…). Transversal queries throughout all the content assets or within a dedicated sub-cube are becoming very current.

But such a change does not only affect developers but also end-users. Rather than thinking in term of page (which content do I want to see on which page in my site) users have now to think about better ways to leverage their “content cubes”.

In all the cases, excepted for smaller web sites, more and more web initiatives are now dealing with larger volume of content. A “site” composed of several dozens of thousands of “articles” is facing increasing problems to find the right navigation modes. Accessing to 20 level-depths menus is no more a solution.

The underlying content infrastructure (or the federation of content silos in certain use cases as content interoperability is also rapidly becoming a key success factor) is exposing end-users to much more information than never before. If you combine this with an increase of employee generated content thanks to new Web2.0/E2.0 tools, this will rapidly lead to an explosion in term of volume of information exposed to any employee.

This access to plethora of enterprise information is however currently in total opposition with what employees are doing with their intranets. Forrester recently “found that these intranets were still mostly accessed for basic functions such as company directory, benefits information, and payroll”. This perhaps means that too much information kills information. Or rather, as Clay Shirky said it: "It's not Information Overload. It’s Filter Failure". And this then means that current CMS solutions failed to provide to the users the proper filters.

Indeed if one stops thinking in term of site and page tree but in term of content cubes, this will result in many consequences for your front-end WCM system. So what are the right tools to let knowledge workers properly filter and assemble on the web relevant content leveraging at best Content Reuse capabilities?

1) Dynamic navigations / Faceted Searches
-----------------------------------------------------------------
First we see increasing needs for faceted search and other type of dynamic navigations which could be automatically and dynamically managed by the underlying content management system. These are first steps to automate navigations and let end-users drill down into their content objects universe. Tag clouds and others search refinements capabilities are following quite similar behaviors.

Technically speaking most search engines are now providing faceted search capabilities (e.g: with Lucene/Solr). We are however still far away from certain multidimensional capabilities offered by OLAP/ROLAP technologies leveraging TB of data.

2) Information Dashboards
--------------------------------------
One of the problems with the previous point is that people are lazy. They do not want to look after information any more. They want the information to be automatically pushed to them. This is all about activity streams and other kind of “real-time web” initiatives.

Microsoft answered this summer to a RFP from Newspaper Association of America by stating: “The Next-Generation Newspaper is the user’s information hub, aggregating content from different sources and matching it to the user’s profile, preferences, and context (situation). It is accessible from any device, both online and offline, and helps the user to navigate the content universe through search, links, and recommendations.” (Details: http://www.niemanlab.org/pdfs/Microsoft.pdf).

But applied to articles rather than to 140 characters kind of sentences, this gives new “information dashboards” where all relevant federated information is pushed and assembled on the fly. If you also transpose such a vision in a E2.0 context, we are getting closer to a kind of New Generation personalized Enterprise eZine.

There are several examples of such personalized and dynamic aggregation news engines currently being available on the market which tend to emulate more traditional “paper based magazines”. Google News is of course the most well known. Newscred has perhaps a version which is looking a bit better. If you are used to follow a lot of RSS feeds you are certainly now used to Feedly.com which is also automatically and dynamically recomposing aggregated information dashboards on a per topic basis. At the latst Techcrunch 50 Fresh Sliced News also tried to introduce a RIA version of something nearly similar ( http://www.freshslicednews.com/).

All these solutions need not only to access and extract heterogeneous sources of information but also to apply some valuation mechanisms to the underlying content assets. This is generally done by mixing and weighting some variables such as Google News is already doing:
  • The interests of the current users (according to its profile, browsing history, etc…)
  • The source of the article (applied within a company we could for example take the assumption that a blog post from the CEO or from a top managers is more relevant/trustable than the one from the new internship)
  • Some social rating (e.g: let the users/employees vote for their favorite articles)
  • Other content valuation mechanisms including some manual moderation.
Relevancy is then not only based on a simple scoring function and is a bit more complex to embrace. If you combine that with some current standard limitations in a cross-silo federated environment (e.g: what is really the relevancy of the Score function in the new incoming CMIS standard?) makes your personalized “content cube” not that easy to assemble.

Moreover, due to the increasing numbers of possible facets and the number of content objects to manage, problem of performance will rapidly occurs. It will be interesting to see how such large, complex and federated content requests will be executed especially in a WCM usage where end-users do not usually want to wait more than 1 or max 2 seconds in order to see some results into their browser.

3) Semantic Information Dashboards
----------------------------------------------------
On the long run we could even envision to make such information dashboards more dynamic. Some start-ups are already leveraging semantic data in order to let end-users easily refine and drill-down into their content items according to several dimensions by simple drag and drop (e.g demo of Paggr).

We are then getting even closer to all the Business Intelligence paradigms applied to content. At least from an analogy point of view. Technically speaking it would be interesting to challenge what the OLAP/ROLAP world could bring to Content Repositories be them some JCR ones or some semantic triple stores.

In all the cases, your page based approach is dead, long live to your intelligent content hypercubes!

3 comments

Oct 09, 2009
billycripe said...
Nail meet hammer head! You nailed it from the outset: "They key question is no more to know how to ease content creation and to delegate it to non technical users but rather to offer better content filtering, repurposing and aggregation capabilities (among other kind of additional added values)." YES.
Oct 21, 2009
On the Semantic News aggregation side, I found this interesting site and story:
http://www.klezio.com/about-us/ - OpenCalais powered.
Jan 27, 2010
juanpaalbuja said...
I agree with you, and I think that we are entering in the age of hypercubes. For example I see that current CMSs are including functionalities that you mention here like the use of “Dynamic Navigation”. The big advantage of this is that it is generated according to the site structure and also exist complete modules that permit the easy implementation of dynamic navigation. A sample is how Jahia manage the navigation. It is dynamic and it is updated automatically with site changes. Here is an example http://www.oshyn.com/_blog/Web_Content_Management/post/Jahia_WCM_Quick_Review_Maven,_Templates_and_Navigation.

Leave a comment...