forked from b2l-chennai/project-ideas
-
Notifications
You must be signed in to change notification settings - Fork 0
Entity Extractor from Google SERPS
Dorai Thodla edited this page Nov 22, 2020
·
1 revision
Goal:
Find the alignment of search terms to the top keywords, phrases in result documents
Input:
- A search string (for example: "What is AI")
- Key word/phrase count (n)
Output:
- List the top n unigrams, bigrams, trigrams in the search result pages by frequency (aka SERPs)
Process:
- Get the search term/expression
- Perform a Google query
- Get the results
- For each result page, extract text
- Split text into sentences
- For each sentence, extract terms (unigrams/bigrams/tri-grams)
- Count the term frequency for each document (search result page)
- Store it in a dictionary or database
- Create a report of the top n terms in the descending order of frequency