Publication Date
Spring 2015
Document Type
Project Summary
Degree Name
Master of Science
Department
Computer Science
First Advisor
Soon-Ok Park, Ph.D.
Second Advisor
Stephen Hyzny, M.S.
Third Advisor
Michael Kelly, M.S.
Abstract
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an "inverse index", which searches orders of magnitude faster.
Solr is a standalone/cloud enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results. The project will be implemented using Amazon/Azure cloud, Apache Solr, Windows/Linux, MS-SQL server and open source tools.
Recommended Citation
Ravikumar, Padmavathy, "Enterprise Search Technology Using Solr and Cloud" (2015). All Capstone Projects. 91.
https://opus.govst.edu/capstones/91
Comments
Student ID number redacted by OPUS staff.