Latent Semantic Indexing - part 3 - Searching

  16 August 2012
 

< Continued from Part 2:  Creating Index

In the earlier post, we saw how to create an index from the list of documents. In this post, we will see how to search the index created earlier. This is only a walkthrough of the code - you should be able to map these steps to the code based on the inline comments.

After the user types in a query and clicks Search, the following is performed

Step  1 - Fetch the data from the Index that we stored. In this case, it is the Documents list, Word list, S(k)-inverse, U(k), WTDM (weighted-term-document-matrix)

 

Step  2 - Fetch the query text and filter stop words and apply stemming on this vector. Stemming is done using the same mechanism that we used above.

 

Step  3 - Create a vector using words list - called [q(transpose) - qT].

This is used to create the query vector

 

Step  4 - Normalize qT (simple normalization).

 

Step  5 - Compute the query-vector q = qT x U(k) x S(k)(inverse)

Step  6 - Now that we have the query vector, create a document vector for comparison with the query vector

(I think this document vector creation could be done in Indexing part)

For each document:

Create a vector dT - transpose of document

Compute document vector d = dT x U(k) x S(k)(inverse)

 

Step  7 - Now finally, we compute the similarities in query and document.

Initialize results -  list of similarityArray - (it's just a variable name which can be changed)

For each document:

Fetch the document vector “d”

Compute cosine similarity between query vector “q” and vector “d”

similarityArray[document-ID] = similarity(from above)

 

We then display the similarityArray in a descending order.

Not covering the UI part here. I think its pretty simple to understand the UI created for this

I hope you found this series useful. If you like this series and/or if it was helpful to you, please do rate it and leave a comment below.

Happy researching!

< Part 1:  Undestanding LSI (tutorial, demo, code, references)

< Part 2:  Creating Index




 

Subscribe to our mailing list


© Copyright Anup Shinde. All rights reserved.    |    Privacy Policy