Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Designing and Building a Data Warehouse for Water Quality Monitoring - A Scenario

Description of the Scenario

Students are required to read through the following scenario and design and build a data warehouse that can suitably reflect on the needs of the problem.

Traditional water quality monitoring involves three steps, namely water sampling periodically, testing and investigation. This technique can be expensive, human centric, time consuming, and only provides data at the point of sampling. Water samples are taken back to the lab for analysis, or expensive handheld devices used in test locations, with each parameter requiring a specific sensor. Lab testing generates accurate spot pollutant measurements, but under normal circumstances, it is not economical for various locations all over the country. In some cases, trend of data or a close to accurate estimation of the future parameter values could be enough for businesses.

For customers the ability to remotely monitor data is the perfect solution, saving both time and money, as all they are interested in is the data, and not necessary how it is collected. Although in location sensors have a typical accuracy of 95% compared to lab results, this is acceptable as the customer now has real time data, trend analysis, and alerts enabling them to act much sooner than before. Multiple nodes can be deployed to build up an extensive picture of water quality. However, the in-location sensors (optical or Ionic) are expensive, require maintenance, and more importantly limited range of detectable pollutants. Customers are asking for pollutant measurement where there is no commercial sensor available, for instance water phosphates (organic, inorganic & total), or where sensors are expensive, Nitrate & Nitrite. In-location sensors are not suitable for long term deployment in water, some 6 weeks life before maintenance. Ionic sensors can be confused from other heavy ions in the water and provide false readings.

To prepare the dataset, samples are taken at sampling points around England and can be from coastal or estuarine waters, rivers, lakes, ponds, canals or groundwaters. They are taken for a number of purposes including compliance assessment against discharge permits, investigation of 
pollution incidents or environmental monitoring.

It has been decided to use the data from the Department for Environment Food & Rural Affairs in the first instance. The Environment Agency use an online Data Service Platform to provide the water quality dataset to public. The Water Quality Archive provides data on water quality measurements. Samples are taken at sampling points around England and can be from coastal or estuarine waters, rivers, lakes, ponds, canals or groundwaters. They are taken for a number of purposes including compliance assessment against discharge permits, investigation of pollution incidents or environmental monitoring. The archive provides data on measurements and samples dating from 2000. The data columns for the system are given below. The collected data from 2000 until 2016 can be found on the student portal or from your course coordinator for Data Warehousing COMP1848under the name WaterQuality_CW.zip. There are many concerns about the quality of data in the database.

You should design the Data Warehouse which will provide information on the following:

• The list of water sensors measured by type of it by month

• The number of sensor measurements collected by type of sensor by week

• The number of measurements made by location by month

• The average number of measurements covered for PH by year


• The average value of Nitrate measurements by locations by year

Shared task [marks equally distributed among group members] (25%)
1.Designing star schema including the Time dimension (10%)
2.ETL: export the data from a Microsoft Access database into Oracle (5%)
3.Queries (10%) 

1.ETL: from staging area to Dimensions and Fact tables, using cursor based on the specific water sensor (10%)

2.Data cleansing (5%)

3.Implementation of Dimensions & Fact (10%)

4.Python programming used for connecting to Oracle, data pre-processing, feature extraction, Machine Learning algorithm (25%)

 

5.Evaluate, present, analyse and explain ML method performance including a discussion pros and cons of it, discussion on both good and bad or unsatisfactory results (20%)


6.Report (5%)

support
close