Apache Lucene is a freely available information retrieval software library that works with fields of text within document files. This evolving venture is also called the Apache Lucene Project. Apache is a server that is distributed under an open source license.
The Lucene application program interface (API) stays the same regardless of the format of the file to be indexed. Provided that the text information can be recovered and extracted, Lucene can index practically any type of text-containing document. Lucene has become popular for use in Internet search engines as well as for single-site search operations.
The Apache Lucene Project comprises four main components:
- Lucene Core: Indexing, searching, spell checking, hit highlighting, and tokenization.
- PyLucene: Python port for Lucene Core.
- Open Relevance Project: Free distribution of materials for performance testing and relevance evaluation.