This assessment addresses the following Unit Learning Outcomes:
1. Describe the common data sources that exist in organisations and their use in BI
2. Demonstrate practical skills in the processes associated with extraction, transformation and loading (ETL) of organisational data
3. Design and implement a simple data warehouse environment
There are a number of data sources from which you will need to draw your data. Below are the sources and the tables within them that are/may be of interest to you. These are discussed below (Primary Key, Foreign Key):
Source 1: Course Handbook
The Course Handbook is a FileMaker Pro Database. It contains data regarding all courses, units and offerings of units that are offered by Mudrock University. A course is made up of units, and a unit will have at least one offering each year.:
COURSE (CourseCode, Version, CourseName, SchoolName)
UNIT (UnitCode, CourseCode, Version, UnitName)
UNIT_OFFERING (OfferingNumber, UnitCode, Year, TeachingPeriod)
OFFERING_COORDINATOR (StaffID, UnitOfferingNumber)
Source 2: Student Information System
The Student Information System has its data stored in a relational DBMS (Oracle) at present.
STUDENT (StudentID, StudentName, DateOfBirth)
ENROLMENT (EnrolNumber, StudentID, UnitOfferingNumber, Grade)
Source 3: Human Resources System
The HR System is a proprietary system that is owned by the HR Department.
STAFF_MEMBER(StaffNumber, StaffName, SchoolCode)
SCHOOL(SchoolCode, SchoolTitle)
What you have to do:
TASK 1 (30%): Discuss two (2) issues that may be problematic in the creation of the data warehouse that are apparent from the description above. For both, explain what you see as being the issue, why it is problematic, and what you will suggest needs to be done. This should take no more than two (2) pages in total.
TASK 2 (25%): Discuss what you see as being the most appropriate level of granularity for your data warehouse. Your discussion will need to explain why you have made this choice, and why the alternatives have been discarded. This should take not more than one (1) page.
TASK 3 (30%): Assuming that the issues you have raised in TASK 1 have been addressed to your satisfaction, design a Star Schema that will support the analyses as listed above.
TASK 4 (15%): Provide the SQL statements you would use to create the tables if you were to be implementing this design using Oracle.
To Submit:
You will need to submit a single word-processed document that includes
- The written answers for Tasks 1 and 2
- Screen shot of the design you have created for Task 3 (please do not submit your visio file, or whatever file you use, just a screen shot, copied and pasted into a word document please).
Data warehouse is the process by which the organization Mudrock University will change over its raw information into meaningful data that can be represented in various forms in view of its detailing capacities. It must attain better, aggregated, merged vast volumes of information that is amassed in multidimensional information structure to help multidimensional data form. It perceives the requirement for present as well as future information, yet in addition perceives the requirement for recorded information. It enables a business association to control a lot of information in meaningful form. (Al-Debei, M., M., 2011).
The main issue in data warehouse is the data quality which is the integral part to Mudrock University. The data quality must be analyzed before proceeding for data warehouse. By having the knowledge about key dimensions for improving data quality, data warehouse for Mudrock University can be defined effectively. In order to be processable as well as interpretable in a viable and proficient way, information needs to fulfill the value criteria. Various Measurements of information quality normally incorporate accuracy, reliability, importance, consistency, precision, timeliness, understandability, brevity and its usability. The main key measurements which needs to be satisfied are as following:
Completeness - This key dimension helps to manage all the accessible essential data. Any important data must not be missing.
Consistency: The data which must be arranged in data warehouse must be consistent.
Legitimacy: It alludes to the accuracy and sensibility of information and must be fulfilled while arranging data.
Accuracy - The accurate data must represent the real world values as inaccurate data can affect operational as well as investigative applications.
Integrity -Data must not miss any relationship. If data is not properly linked it can have duplicate records. (Pandey, 2014).
Data Security Issue
Mudrock University must concentrate on the security challenges while implementing data warehouse. Requires the security framework that guarantees that the entire staff must be able to access data applicable to their own particular division, but IT administration department can access the entire information. The organization's data warehouse centers stores the employee as well as student's personal data. Protection laws may represent the utilization of such individual data. The adherence to these security laws must be actualized in the data warehouse.
In spite of database, data distribution center must comprises of many more. The whole condition ranges from the extraction of information from operational framework, exchange of this information to the data distribution center, the dispersion of this information to information marts as well as other expository servers, and at last the moving this information to end-clients. The whole network traverses numerous servers and numerous product items and each segment must be secure. (Oracle White Paper, 2005).
Granularity is to check that how much details of the data will be required for future purpose. In the given scenario, Offering_Coordinator, Enrollment will be detailed Fact tables whereas Aggregate Fact tables will be Student, Staff_mamber, Unit. Here, Course, Student,