I have been thinking about this question:
Is NoSQL database an alternative for a search engine?
I think I just found an answer here.
Lets talk about some terms and definitions first.
-
NoSQL - Not only SQL means that a NoSQL database differs from a RDBMS in some way.
-
IR - Information Retieval is the science of searching documents, their metadata, and retrieval.
Here we compare a NoSQL storage engine MongoDB, and Information Retrieval library Apache Lucene. Let us enlist features and then compare MongoDB and Apache Lucene.
MongoDB is a document based database having following features ( reference http://www.mongodb.org/ ):
- Document-oriented storage
- Full Index Support
- Replication and High Availability
- Auto-Sharding
- Querying
- Fast In-Place Updages
- Map/Reduce
- GridFS
- Commercial Support
Lucene features ( reference http://lucene.apache.org/java/docs/features.html ) :
- Scalable, High-Performace Indexing ( which is actually quite fast )
- Powerful, Accurate and Efficient Search Algorithms
- ranked searching – best results returned first
- many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- fielded searching (e.g., title, author, contents)
- date-range searching
- sorting by any field
- multiple-index searching with merged results
- allows simultaneous update and searching
- Cross Platform
NoSQL is preferable when database needs to be scalable, highly available, with fast query results. However it doesn’t completely solve the problem of Information Retrieval.
Search (Information Retrieval) isn’t just about grabbing any documents that match, if you want your search results to have any relevance at all you’re going to need something along the lines of TF-IDF, phrase matching (words in a sequence score higher) or any number of other IR techniques to improve search precision.
NoSQL database such as MongoDB dont provide relevance based search results, which is one key point to consider. I think this is the biggest factor to consider when choosing a NoSQL database or a search engine framework.
An another alternative is to couple a database with a search engine to achieve the goals. For example:
- couchdb-lucene provides such an integration with CouchDB and Lucene
- Solr provides integration with RDBMSes ( such as MySQL ) and uses Lucene as its search library.
Thats all for now. Comments and suggestions are welcome :)