NLP & Info Retrieval: A Review

Question 1:

a) Using Broder’s taxonomy of web search, identify the correct class for the following web information needs:
1. “I want to find the QUT homepage”.
2. “I want to find the best provider for health insurance for single female, aged 20-40”
3. “I want to find out information about Queensland’s Old Government House. For example: When was it built? Who lived there? At which University is it situated in?”
4. “I want to sign up for a cheap music streaming service that has coverage of songs from the 1980s, 1990s and 2000s”.
b) Using a suitable example, explain the “Ellis Model” in Information Behaviour Models. (You should select your own example to explain each stage of the Ellis Model).
c) Compare the Ellis Model with two other Models of Information Seeking. Your comparison should contain strength and the weaknesses of the selected model.

Question 2:

a) Define the following six areas of natural language processing that are commonly associated with information retrieval. (Max 150 words for each area of NLP)
1) Morphological
2) Lexical
3) Syntactic
4) Semantic
5) Discourse
6) Pragmatic
b) Identify which area or areas of natural language processing are applied in the following algorithms/techniques. (For each simply state the analysis employed)
1) Brill Tagger
2) Porter Stemmer
3) Wordnet
4) Noun Phrase Shallow Parser/Chunker

Categorization of Web Searches

“I want to find the QUT homepage”.

The provided web information need can be classified into the navigational web searches as the main intent of the user in this situation is to reach a particular type of site on the internet.

“I want to find the best provider for health insurance for single female, aged 20-40”

The provided web information need can be classified into the Informational class of web searches as the main intent of the user is acquire the required information present on the internet that are present in any of the webpage available on the internet.

“I want to find out information about Queensland’s Old Government House. For example: When was it built? Who lived there? At which University is it situated in?”

“I want to sign up for a cheap music streaming service that has coverage of songs from the 1980s, 1990s and 2000s”.

The provided web information need can be classified into the transactional class of web searches as the main intent of the user is to perform the web-mediated activity of signing up for the music streaming service which is also cheap and from 1980s, 1990s and 2000s.

Ellis Model

The model described by Ellis for the information-seeking behavior and this as developed in the year1984. The model which has been derived by Ellis contains eight generic characteristics which are used for the recognition of the pattern for seeking the information. The main stages of the mode are Starting, Chaining, Differentiating, Extracting, Verifying, Browsing, Monitoring and Ending.

Starting: The starting means are generally employed by the users for the beginning of the information seeking process such as asking very knowledgeable colleagues. The main activities which are involved with the process are generally the identification of the sources of the interests and the sources are included in the familiar resources.

Chaining: In this process the following leads are taken from the initial sources is generally acknowledged as the backward or forward process. For instances, this is used when the bibliographical tools which are required by them are generally unavailable to them.

Browsing: The process involves the semi-directed or the semi-structured areas for the potential searches. For example the browsing takes place in the internet. For instances any user browses just by looking into the table of contents and other parts of the report.

Differentiating:

The users are required to filter and select the filters by taking note of the nature and the quality of the information which is offered to them.

Monitoring:

An instance in monitoring provides the user with the ability to monitor the web browser while they are surfing the internet.

Ellis’ Information-Seeking Behavior Model

Extracting:

The process of extraction involves the activities that would help in working through any type of particular source in a systematic manner. For instances the field of physics and chemists in the studies modeled on this fits the appropriate behavior.

Verifying:

The accuracy in checking the information which has been obtained from various type of sources has been included in this area.

Ending:

The ending process helps in the conclusion of the processes for the systems.

The comparison of the Ellis model and the Campbell model has been described for reference in this section. The Ellis model has been developed by Albert Ellis and the Campbell model has been developed By Ian Campbell. The Campbell theory has been established regarding the relationship between the core and the irrational beliefs and the derivatives. On the other hand Ellis has emphasized on the Interactional connection in between the core and irrational derivatives.

Morphological

These are the steps of natural language processing. These are separatewords of morphemes and define the class of morphemes. The tasks difficulty depends on the morphology. The morphology followed by English’s simpler than other languages. In language such as the Turkish, an approach towards this is not a better way as there is more than thousands of entry possible for word forms.

Lexical

It is basically a collection of the information that has been made about the language of the categories that they belong to and also a structured collection of the lexical entries.

Syntactic

A Syntactic is a large body of natural language text and it is used for the accumulation of the statics for the natural language text.

Semantics

The lexical semantics is used for understanding the meaning of individual words within the context. The translation is done automatically in these parts of the natural processing language. These area transfers the context from one human language to another. This is considered to be the most difficult problem and need needs to be solved with the help of all the knowledge that are possessed by humans. The next task performed by semantics is that it allows NER that is named entity recognition. The other operations performed by semantics are natural language generation, understanding of natural language, optical character reorganization, and question answering and word sense disambiguation.

Discourse

The features that are supported by the discourse area in natural processing language are automatic summarization. This produces a summary of a certain amount of text. These includes summary of articles based on the newspaper readings. Conference resolution is the next feature that allows searching the entity and matches with the context and provides result accordingly. For example anaphora resolution, this aims at matching the pronouns with the similar nouns available. The next feature is the discourse analysis. This includes a number of related tasks. In addition to this, there is a feature that aims at recognizing and classifying the texts associated with the speech act.

Pragmatic

Pragmatic Analysis is part of the process of extracting information from text. Specifically, it’s the portion that focuses on taking a structures set of text and figuring out what the actual meaning was. It actually comes from the field of linguistics.

1) Brill Tagger - NLTK python part-of-speech pos tagging

2) Porter Stemmer – Stemming Process

3) Wordnet - NLTK corpus reader

4) Noun Phrase Shallow Parser/Chunker - NLTK python part-of-speech pos tagging

Topic models are tools that can be shared via suitable platform technology, allowing users to benefit from the work of others, “long-tail”-style. Topic modeling is the process of building and maintaining topic models.
The three type of topic models are:

Correlated topic model
Dynamic Topic Model
Continuous Time Dynamic Topic Models

LDA is best used for longer texts, but it does seem to work for shorter texts- like tweets. It would be suggested that the user use a bag-of-words approach for the features to use, explore the openNLP/coreNLP tokenizes to get the tokens from the text. Tf-Idf weighing is often used, It is also preferred to also use a GINI based weighing method on the LDA results. The lda.gibbs.sampler function takes a document-term matrix as an input; it can be made using the tm package.
The number of topics in the dataset are specified by the user(or based on some distribution(Poisson) by sampling) which is subjective and doesn’t always highlight the true distribution of topics. The topics are predicted based on the multinomial distribution and then the words are predicted based on another multinomial distribution trained specific to that topic. If the true structure is more complex than a multinomial distribution or if the data to train isn’t sufficient, then it might underfit.

The Information retrieval system would be able to play a very interesting role in current search engine which provides the performance of the search for the User, from which user can make sense of the fundamental and most vital data. The strategy of is generally executed in QA framework for making users inquiry and a few stages are likewise pursued for transformation of inquiries to question shape for finding a correct solution. In calculated pursuit search engine translates the importance of users’ inquiry and the connection among the ideas that archives contains regarding a specific space that produces particular answers as opposed to giving rundown of answers. This constraint might be overwhelmed by another web design known as semantic web which beat the restriction of watchword based inquiry method called reasonable or semantic pursuit strategy. Ontology is basically dependent on the Jena semantic web structure and Semantic Information Retrieval System in which, User enters an info question which the Standard Parser Triplet uses for the triplet extraction calculator.
The main advantage is the simplicity of the ontology and the main disadvantage of ontology is that it is very space consuming.
The use of ontology is done so that the accuracy of the information retrieval system is increased and as a result the efficiency of the system is increased.
There are a variety of candidates that have been proposed for the status of category. I've mentioned two thus far: particular and property. Others include event or activity, process, state of affairs, fact. For example, some ontologists propose that there are particular objects, such as molecules, trees, people and so on, and events take place in these things, and these elements are persistent throughout the changes that takes place within them. The events also consist of entries that are complex in nature and those that are having the event-properties. The particulars also have the capabilities to change which are the basic property that they hold. Or at any rate, the persisting particulars do have capacities, and these are exhibited by their activities. Other oncologists’, however, hold that particulars are an illusion, reducible to strings of events or processes. So these event-ontology take events or processes as basic.

The formula for page rank is provided below:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Explainations of the terms of page rank

PR(Tn) – All the pages provide a self-importance notation. For the first page that’s “PR(T1)” in the web and it extends up to the “PR(Tn)” for the last page

C(Tn) – The pages generally spread their votes for the links that outgoing.

Number of outgoing links for page 1 = “C(T1)”,

Number of outgoing links for page n = “C(Tn)”,

PR(Tn)/C(Tn) – Number of votes for A.

d = 0.85

(1 - d) - The sum of the all the pages on the web would be 1.

Therefore the PR results to (1-d)

The page rank of all the three documents is provided below:

PR(D1) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))

PR(D2) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))

PR(D3) = (1-d) + d (PR(D1)/C(D1) + ... + PR(D3)/C(D3))

Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57.

Conneau, A., Schwenk, H., Barrault, L., & Lecun, Y. (2016). Very deep convolutional networks for natural language processing. arXiv preprint.

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.

Habash, N., Vogel, S., & Darwish, K. (2015). Proceedings of the Second Workshop on Arabic Natural Language Processing. In Proceedings of the Second Workshop on Arabic Natural Language Processing.

Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., ... & Socher, R. (2016, June). Ask me anything: Dynamic memory networks for natural language processing. In International Conference on Machine Learning (pp. 1378-1387).

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2019). Natural Language Processing And Information Retrieval: A Review. Retrieved from https://myassignmenthelp.com/free-samples/ifn647-advanced-information-retrieval-and-storage.

"Natural Language Processing And Information Retrieval: A Review." My Assignment Help, 2019, https://myassignmenthelp.com/free-samples/ifn647-advanced-information-retrieval-and-storage.

My Assignment Help (2019) Natural Language Processing And Information Retrieval: A Review [Online]. Available from: https://myassignmenthelp.com/free-samples/ifn647-advanced-information-retrieval-and-storage
[Accessed 01 June 2025].

My Assignment Help. 'Natural Language Processing And Information Retrieval: A Review' (My Assignment Help, 2019) <https://myassignmenthelp.com/free-samples/ifn647-advanced-information-retrieval-and-storage> accessed 01 June 2025.

My Assignment Help. Natural Language Processing And Information Retrieval: A Review [Internet]. My Assignment Help. 2019 [cited 01 June 2025]. Available from: https://myassignmenthelp.com/free-samples/ifn647-advanced-information-retrieval-and-storage.