1. What is structured data and unstructured data? Give an example of each from your experience with data that you may have used.
2. Give a general definition of information retrieval (IR). What does information retrieval involve when we consider information on the Web?
3. What is meant by navigational, informational, and transformational search?
4. What are the different phases of the knowledge discovery from databases? Describe a complete application scenario in which new knowledge may be mined from an existing database of transactions.
5. What are the goals or tasks that data mining attempts to facilitate?
Structured data are data that is stored into some strict format and structure. For example, data stored in relational database have some rigid structural properties. Hence, these are structural data.
On the contrary, there may be unstructured data, where there is no such format or structure followed while storing data. It has very limited application. An example is a text file containing some data, HTML files with some data etc.
2. As said by, Gerald Salton, IR or Information Retrieval is, “the discipline that deals with the structure, analysis, organization, storage, searching, and retrieval of information”. So, in general, IR is a process or retrieving information from a collection of documents or information in response to some query provided by some user.
IR is mainly related unstructured or semi-structured data and information retrieval.
3. In the case of information used for web searches, there may be 3 types of search. Those are,
- Navigational search that refers to the process of finding some particular piece of information quickly as per user query. An example is, searching for ‘earthquake’ on Google search.
- Informational search that refers to the process of finding out latest information on some topic. For example searching for research activities on IR.
- Transactional search that refers to the process of reaching to some site for further interaction. For example, searching to open a Facebook Account.
4. There are 6 different phases of knowledge discovery from database are,
- Selection of data
- Data cleansing
- Data encoding or transforming
- Data mining
Consider an example of a transaction database for a retailer. The database contains information about the consumers, like name, address, contact number, item purchased, quantity, price, total amount etc.
So, a new sets of different knowledge can be retrieved from this database through KDD. The stages are,
- In data selection, different sets of information on some item or entity will be selected. For example, customers from some geographical area.
- During data cleansing process, the format of the data will be checked. For example, whether the ZIP code is in same and right format or not etc.
- During enrichment, data from other sources like social media, demographics etc. will be added to data.
- During data transformation different encoding can be used to shorten or compact the data formats.
- Data mining will be used to find patterns based on different factors.
- All results will be reported in understandable formats.
5. The goals of the tasks facilitated by data mining attempts are,
- Predictions of the behavior of some data in future.
- Identification of data patterns.
- Classification of data into different partitions or categories.
- Optimization of limited resources like space, time, cost etc. and maximization of output variables like profits etc.
Cellary, W., & T. Morzy, E. G. (2014). Concurrency Control in Distributed Database Systems. Elsevier.
Elmasri, R., & Navathe, S. B. (2013). Fundamentals of Database Systems. Pearson .
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Elsevier.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mullins, C. S. (2013). Database Administration: The Complete Guide to DBA Practices and Procedures. Addison-Wesley Professional.
Özsu, M. T., & Valduriez, P. (2011). Principles of Distributed Database Systems. Springer.
Rahimi, S. K., & Haug, F. S. (2010). Distributed Database Management Systems. John Wiley & Sons.
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2011). Database System Concepts (6th ed.). McGraw-Hill.
Zaki, M. J., & Wagner Meira, J. (2014). Data Mining and Analysis. Cambridge University Press.