Steel L.C. is a family business dedicated to the sale of hardware and machinery products located in Barcelona.
The company has been active for more than one hundred years and has an extensive portfolio of clients whose data make a vast database with valuable information but which is not used correctly. We have data about eighty years ago, approximately. Of course, the data of the first years were recorded in paper format, but little by little they were computerised.
The main problem with all data and information from customers is that these data are not unified. That is, each department has its own database and the values used do not match, in addition to finding data recorded in Spanish and others in Catalan, and in economic terms, some of them are expressed in pesetas and others in euros.
Soon, Samuel, the son of the current owner, will inherit the business. He is aware of the importance of the development of the company to have a good database that allows him to know first-hand how his clients are as well as to be able to make strategic decisions. Therefore, he asks for help to sort and unify the data and check if they are valid and eliminate those that have lost validity, such as duplicate records or have information from customers who have already died.
1. Assessing the situation of the current Steel L.C. database, do you consider it correct to carry out an ETL process? Justify your answer considering the benefits that would bring to Samuel's company. Besides, it will be essential to establish the objectives of the implementation of this process.
2. Taking into account the information that you have about the company collected in the databases, do you think it would be useful to obtain another type of information? What information would you add? Justify your answer.
3. Describe the activities you would carry out in each phase of the ETL process (cleaning, extraction, transformation and loading).
4. Steel L.C. He has been active for more than a hundred years, so it has a large amount of data from most of its clients. This causes that there may be data with erroneous values, poorly entered data, duplicate data, values that do not match, etc. For this reason, it will be necessary to carry out a process to establish the quality of the data and detect the errors. Point out the mistakes that you may encounter in this process. Also, propose how we can solve this error. It is essential that you justify your answer.