Latent Semantic Indexing - part 3 - Searching
< Continued from Part 2: Creating Index
In the earlier post, we saw how to create an index from the list of documents. In this post, we will see how to search the index created earlier. This is only a walkthrough of the code - you should be able to map these steps to the code based on the inline comments.
After the user types in a query and clicks Search, the following is performed
Step 1 - Fetch the data from the Index that we stored. In this case, it is the Documents list, Word list, S(k)-inverse, U(k), WTDM (weighted-term-document-matrix)
Step 2 - Fetch the query text and filter stop words and apply stemming on this vector. Stemming is done using the same mechanism that we used above.
Step 3 - Create a vector using words list - called [q(transpose) - qT].
This is used to create the query vector
Step 4 - Normalize qT (simple normalization).
Step 5 - Compute the query-vector q = qT x U(k) x S(k)(inverse)
Step 6 - Now that we have the query vector, create a document vector for comparison with the query vector
(I think this document vector creation could be done in Indexing part)
For each document:
Create a vector dT - transpose of document
Compute document vector d = dT x U(k) x S(k)(inverse)
Step 7 - Now finally, we compute the similarities in query and document.
Initialize results - list of similarityArray - (it's just a variable name which can be changed)
For each document:
Fetch the document vector “d”
Compute cosine similarity between query vector “q” and vector “d”
similarityArray[document-ID] = similarity(from above)
We then display the similarityArray in a descending order.
Not covering the UI part here. I think its pretty simple to understand the UI created for this
I hope you found this series useful. If you like this series and/or if it was helpful to you, please do rate it and leave a comment below.
Happy researching!
< Part 1: Undestanding LSI (tutorial, demo, code, references)
Subscribe to Anup Shinde
Get the latest posts delivered right to your inbox