Some thoughts on the Search Industry after the recent Acquia Search Service announcement
Acquia recently released a new SaaS search subscription service for the Drupal WCM users which could act as a substitute for the current embedded Drupal Search engine.
This news is interesting for more than one single reason.
1) End of the PHP WCM vs Java WCM war - Start of a new best of breed WCM approach
First of all this announcement emphasizes the end of the PHP WCM vs the Java WCM war. The new Acquia Search Service is indeed based on an open source Java-based technology stack (Apache Solr/Lucene) and aims to mainly power PHP-based Drupal web sites. This announcement could be also related to other new CMIS-based initiatives (such as the "Alfrescal" prototype) which are now trying to leverage a strong JEE back-end and the strengths of a PHP based front-end.
Dietmar Rietasch, Executive Board Member at New Media Solutions recently wrote to us something similar a few weeks ago and I tend to fully agree with him (Dietmar I hope this is not a problem to repost part of your email in this blog):
"When looking at all those different solutions we see a huge change in the market. PHP is a clear leader in the field of wcms, but things are changing. If you just look at the development of Typo3 5.0 and their move to a JCR compatible content repository architecture ( http://lists.liip.ch/pipermail/jackalope/2009-May.txt; http://blog.liip.ch/archive/2009/05/11/jackalope-started.html)... Almost every PHP wcms solution is getting more professional and most of them don't even run on a shared hosting environment any longer - at least not efficiently and in a fast manner. It seems, that they invent features, protocols and services that have been available in the JAVA world for a long time. So the question is: Why do they (PHP or especially PHP wcms developers) try to reinvent the wheel? I guess, because programming new features and functionalities and exhausting the language is a cool and interesting thing to to. But it's by far not very wise and in terms of doing business not effective. We think, and so do others, that these two languages have to fusion in an effective way. Currently this already happens a lot of times at a lot of global players. Just have a look at http://developers.facebook.com/thrift/, http://www.zend.com/en/products/platform/product-comparison/java-bridge or http://quercus.caucho.com/. Therefore we see a huge potential in a CMS system, which is based on a solid JAVA stack, but also compatible with the PHP world - in terms of templates, additional functionalities,... You could be the first product, that archives this goal and be in a really interesting business segment. This would be almost a small revolution - a solid cms core and a flexible front-end made both for JAVA and PHP developers."
The question is now of course to find the limits between what should be rather Java or PHP-based.
2) New SaaS based On-Demand Search Service vs On-Site search servers
The second interesting information is that Acquia strongly believes that customers are now ready to outsource their search facilities on the cloud. The idea is that embedding a basic search server is not enough and customers are now looking for more complex search oriented user experiences. And such solutions will of course become more complex to setup, to administer and to scale internally over time. This is not a wrong assumption as lots of users, including intranet employees which could benefit from a local and dedicated search appliance, are experiencing severe performance issues while they are used to find public web content in milliseconds by using Google, Bing or Yahoo. Of course the Drupal WCM is mainly powering public-facing web sites and such a service could make a lot of sense for them as the information is already available to everybody. This would be however interesting to know if customers hosting confidential and heavily secured data on Extranets or Intranets will be ready to let their search idexes be stored on the cloud.
Then, from a market positioning point of view, we should also question if this is the core business of companies such as Acquia to offer such a service and to enter into the Search industry. Of course they could offer a better search experience for Drupal users thanks to a more in-depth integration. But according to what I understood nothing forbids Jahia or any other WCM to use the new Acquia Search Web Services. There are dozens of CMS which are now leveraging the Apache Solr/Lucene back-end (cf: CMSWatch article on this topic). So perhaps all these WCM will soon integrate a switch in order to let customers choose on a case per case basis if they want to either host their search indexes locally or remotely on the cloud. But in such a case these WCM actors will finally prefer to use a vendor-neutral search vendor which will not directly compete with their lines of business rather than to use the Acquia services. So according ot me such a hosted Lucene service would make far more sense by being directly managed by Lucid for example (which just raised 6 mio of Serie A funding).
Finally global search vendors will of course certainly not stay passive and will also try to jump into this market by making new API available for every web developers. There are now new initiatives in order to help webmasters better customize their search results (e.g: Yahoo Search Monkey or Google Rich Snippets). But this is of course only a first milestone. Yahoo! Boss (Build your Own Search Service) looks like already much closer to the new Acquia Search Service (for more information about the main goal of BOSS, read this blog entry from Yahoo! Boss architect Vic Singh).
I must say that fighting front to front against Yahoo! or Google looks like a not so brilliant idea at least at first glance.
3) The quest for a standardized Search API
The third point that rapidly comes to my mind is: "Yet another Search API". There are already so much searching and querying API out there that it makes impossible for a web developer to integrate all of them into his web application (be it a simple Portlet or a complex Content Management Platform). Same is true for Yahoo! Boss and for all other Search Vendors. This is also true for standards. CMIS for example implements a new CMIS SQL Query API:
[From CMIS 0.6 spec - p.44] "CMIS provides a type-based query service for discovering objects that match specified criteria, by defining a read-only projection of the CMIS data model into a Relational View. Through this relational view, queries may be performed via a simplified SQL SELECT statement. This query language, called CMIS SQL, is based on a subset of the SQL-92 grammar (ISO/IEC 9075: 1992 Database Language SQL), with a few extensions to enhance its filtering capability for the CMIS data model, such as existential quantification for multi-valued property, full-text search, and folder membership. Other statements of the SQL language are not adopted by CMIS. The semantics of CMIS SQL is defined by the SQL-92 standard, plus the extensions, in conjunction with the model mapping defined by CMIS's relational view."
And this does not include XPath or other ways of building search queries.
The JCR 2.0 introduced a new abstract Query Object Model (QOM) that Jahia already integrates. It would be nice to see now how we could integrate all these different querying system under one single umbrella (or to create a whole new standardized Search API).
Of course multiplying the number of querying system and making them custom to each vendor ensure a certain kind of vendor lock-in. But finally, similar to what is currently happening in the Content Management industry, we will certainly see that one day or the other a need for standardization in the Search industry. I must say that the sooner the better will be the best.
4) What's next for WCM search?
I read yesterday on a blog: "Acquia Search: now that's the most interesting development in the CMS area I've heard of lately". Come on. Facetted Search is just provided as part of Solr/Lucene. All WCM which relies on such a technology stack are already offering the same kind of features for months (including Jahia of course: cf screenshot below)
So the true question is: What's coming next beside facetted search capabilities?
What is sure is that there is currently an increased momentum on search features. They are considered as becoming a more and more important part of a web site experience. Perhaps this is related to the fact that any end-users are now facing information overload issues and are trying to find solutions. This is perhaps certainly related to the fact that end-users are now used to "googlize" any web initiatives rather than to use more classical navigation elements.
So according to me:
- Facetted Searches will rapidly turn into more dynamic navigations allowing anyone to browse content according to several axes.
- There are numbers of new (or older) initiatives which are now trying to better leverage "linked data" (the Web 3.0 or the Semantic Web): Semantic search will rapidly become another hot story.
- Finally the Federated vs the Unified Search question will rapidly becoming (again) a new hot point.
Do you see any other hot search topics?

However we have to remember that the vast majority of our users have sites with at most a few hundred content pages, plus perhaps user forums, and/or online stores with low SKU counts. This probably covers 95% of our "customer base". These sites are, and will continue to run in shared hosting environments for quite some time to come. Any WCMS that moves to Java is abandoning that user base, and any OSS project that decides to go "all Java" can expect a PHP based fork to continue (and thrive).
That said, there is some scale at which PHP becomes the wrong tool for the job. If everyone works to the same standards, then we have the option of swapping out components of a PHP solution for a Java based one. I believe that we're not in an either-or situation, and that the best WCMS solutions will be Java/PHP hybrids. The specific requirements of a site will determine whether or not PHP is being used for "heavy" tasks such as indexing, search and others.