Saturday, January 31, 2015

Lucene inverted index explained


 Lucene Indexer input

 file1.txt
brown fox jumps above lazy dog
 file2.txt
red fox jumps above active cat
 file3.txt
 green fox lives in Pune


Lucene Internal Data Structures


 Document Map (doc map)

 1 = file1.text
 2 = file2.text
 3 = file3.text



 Inverted Index (Lucene Index Structure / Dictionary Structure / MultiMap )

 brown = 1
 fox = 1,2,3
 jumps = 1,2
 above = 1,2
 red = 2
 active  = 2
 cat = 2
 green = 3
 lives = 3
 pune = 3


Search Example 1 :

Query :  Search for "fox"
 
Result :  1,2,3 doc ids

Search Example 2 :

Query :Search for "fox" AND  "brown"

Result :

Result for "fox" = 1,2,3
Result for "brown" = 1

ANDing of result sets = (1,2,3) & (1) =  1 doc id

So query is in file1.txt


When we have to search on millions of files then it will be challenging to handle big result for that we will need distributed set up ..like using Elastic Search  .