Archive for the ‘SolrSherlock’ Category

SolrSherlock: towards Watson-like agent capabilities

February 12, 2013

SolrSherlock is the name I have given to a project which aims to take advantage of one of the gifts of IBM’s Smarter Planet initiative, the gift of UIMA to the Apache Foundation. The project itself lives, at this time, at DebateGraph, where a structured conversation and topic map are growing around  design ideas, with plenty of room and encouragement for other design ideas to emerge. The DebateGraph site is aimed at the creation of more than a single Watson-like agent.  Eventually, soon, in fact, there will be some components of the codebase at Github.

The underlying thesis of SolrSherlock rests on the marriage of the Apache Solr project with UIMA, as is well documented at the Apache site and around the web, together with a topic map.  That marriage came from a history of writing programs in Forth, where the lessons learned included “let the compiler do the work for you”. What is the mapping from such lessons to this project? A topic map is a kind of compiled information bank, well structured and navigable. The compiler, in this case, is the natural language machine reading to topic map structure harvesting platform. That’s one among many tasks for UIMA and its many kinds of annotators.

The earliest codes to hit Github will constitute these components:

  • A Solr Update Request interceptor which lives in the chain of processes described in a Solr configuration declaration; its task is to send the new document just indexed by Solr out to a society of agents for further processing
  • An agent coordinator system, this one based on the tuple space concept, is accessible over a network; this system accepts documents from the Solr interceptors
  • An agent framework, which provides an API for plug-in agents, some based on UIMA, some completely different. In each case, the agent has access to the coordinator to fetch or return resources (documents), and access through the internet to the Solr platform as needed. One particular plug in agent is that of the topic map’s merge platform. A topic map maintains the one location per topic promise by merging new resources into those topics in which topic identity is identified to be the same. For complex topics such as those created during conversations, merging requires the services of machine reading to map sentences to structures which support comparison.
  • Solr configurations and schema definitions which permit a single Solr installation, or a SolrCloud to behave as a topic map in this environment

The project grew legs when it was suggested that the book Taming Text  already had Apache-licensed code at Github which showed how to use Solr together with another Apache project, OpenNLP, to build a question answering system with Solr. The book is rich in ideas which parallel many of the concepts documented in the rich literature being published by Watson’s creators. That, by no means, signals that the project is a slam dunk. It is not. But, it is a worthy mountain to climb.

Consistent with the SolrSherlock project, others are invited, as explained here, to participate, either in this or in similar or related projects. There will be many tasks associated with the SolrSherlock project; the society of agents is a plug in framework, which means that many different experiments can bloom, have their day in the sun, and maybe grow roots and stay around.

There is much more to say about the project; I hope to do so after setting up the Github repo for this project.