ML Algos for Real-World Regression - Classification & Mining.

Machine Learning Algorithms for Real-World Problems in Regression, Classification, and Unstructured

Module Learning Outcomes Assessed

Module Learning Outcomes Assessed:Â
On completion of this module the student should be able to:Â
1. Apply supervised and unsupervised learning applications using Gaussian process emulators.Â
2. Apply Dirichlet processes for unsupervised learning applicationsÂ
3. Develop the knowledge and skills necessary to design, implement and apply the Graphical models to solve real world applications.Â
4. Evaluate the applications of fuzzy systems and their usage in hybrid intelligent systems, in combination with evolutionary computing and other machine learning methods.Â
5. Apply evolutionary computing methods to develop solutions for the real world optimisation problems and appraise their advantages and limitations.Â

Task and Mark distribution:Â
This coursework consists of two tasks and you should attempt both and submit one Word or pdf file (or similar) for each task. Each task is worth 50 marks and the marks breakdown for each task is provided with each task. This coursework contributes 100% to your overall module mark.Â
Â
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported.

Task 1: The Machine learning algorithms for solving real-world problems in Regression,Â

During this module, you learned about different advanced machine learning techniques, associated concepts and applications. We explored the Gaussian process model, which is computationally efficient method for Regression, Classification, optimization, etc. We have also covered the Bayesian networks as promising tools for modelling the data with complex dependency structure. Finally, you have learned how to use Dirichlet Latent processes for unsupervised learning applications, particularly text mining.Â

Â In this assignment, you will have to select an application related to a regression, classification, modelling un-structured data, or text mining problem, and explore how best to apply the machine learning algorithms to solve it. The selected application for each of the methods mentioned above should have the following features:Â
Â
1. Gaussian Â Process Â regression Â and Â Classification: Â The Â application Â selected Â for Â any Â of these Â two Â methods Â must Â consist Â of Â at Â least Â four Â input Â variables Â and Â a Â single Â output variable. You must also implement Gaussian process classification by appropriately define a threshold on the output variable to create a binary or multiple classes first, and then apply the Gaussian process classification on the categorized output. Â
2. Bayesian network: If you are choosing an application for this method, this application must consist of at least eight random variables. The random variables could be all discrete or continuous or hybrid. Â
3. There is no restriction on selecting the application to apply the Latent Dirichlet allocation model for topic modelling. There Â are Â some Â potential Â projects Â listed Â below, Â which Â could Â be Â studied Â to Â get Â some Â ideas. However, I strongly recommend you to come up with your own idea(s) by reviewing these project and some other relevant and recent articles. Â
Â
1. This dataset from the UCI repository is quite interesting. The task is to predict the depth in Â the Â body Â (effectively, Â the Â depth Â along Â the Â spine) Â given Â the Â properties Â of Â a Â two-dimensional "slice" of the body. The hard part about this problem is that it is actually the output causing the input rather than the other way around. I have not had luck designing a good regression method for this data. Can you do this?Â
2. Find a Bayesian interpretation of elastic net regularization, and compare this method for regression against "standard" Bayesian regression (with a Gaussian prior) on a dataset of your choosing.Â
3. Probabilistic PCA using Gaussian Process is a Bayesian interpretation of the classical PCA algorithm for dimensionality reduction. Implement Gaussian Process based PPCA in Â Python, Â R Â or Â Matlab, Â and Â compare Â its Â performance Â with Â other Â methods Â (such Â as "standard" PCA) on a dataset of your choosing.Â

Task and Mark Distribution

4. Bayesian Â optimization Â is Â very Â important Â issue Â with Â a Â wide Â range Â of Â applications. However, Â this Â was Â not Â fully Â studied Â during Â lectures, Â but Â it Â can Â be Â easily Â implemented using Gaussian Process. The Python codes and some examples can be found here!Â
5. The squared exponential covariance is widely used for Gaussian process regression. It is probably used in 90+% of all GP publications. That said, it is widely believed to be "too smooth" for many real-world regression tasks. Compare the squared exponential This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Covariance Â versus Â the Â MatÃ©ern Â covariance Â on Â several Â datasets Â via Â Bayesian Â model selection. How often is the squared exponential the right choice?Â

6. Latent Dirichlet allocation (LDA) is a Bayesian method for creating "topic models" of text documents. There are plenty of interesting text datasets available (e.g., DBpedia could be a good resource!). One idea would be to compare the behavior of LDA with other techniques, such as latent semantic analysis. You may be able to get relevant dataset and ideas by visiting the following sites:Â
Â
This Â compentition Â site Â consists Â of Â some Â relevant Â data, Â and Â the Â relevant Â ideas Â could Â be developed by analyzing this data. Check also dataset in Kaggle competitions. This website has a fantastic compilation of 100 interesting, relevant datasets from all sorts of application areas.Â

The Â creators Â of Â libSVM Â have Â also Â compiled Â a Â great Â list Â of Â datasets, Â all Â in Â a Â standardized format. The libSVM codebase also includes libsvmread for reading these in MATLAB. The UCI Machine Learning Repository is a mainstay in machine-learning research. There is Â a Â wide Â range Â of Â datasets Â there Â from Â many Â different Â application Â areas Â and Â with Â many different properties (large, small, high-dimensional, low-dimensional, classification, regression, etc.). Â

Please note, the following guidelines are good practice and should lead to better result, but youÂ
have the freedom to pick whatever is suitable for your style Working in groups of maximum 2 or 3, you have to select a challenging real world problem and one (or more) appropriate data set(s) as suggested above. You Â could also use the followingÂ
links, which have numerous problems and data sets.
Â
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported to
Â
Notes:Â
1. You Â will Â not Â get Â the Â full Â marks Â of Â this Â section Â if Â you submit your proposal late. Â
2. If the final submission of your CW is the different to what you Â propose Â in Â your Â proposal, Â you Â will Â not Â get Â any
marks for parts 2 & 3.
2) Technical qualityÂ
1. Rigour and extent of the experiments.Â
2. Correct application of the selected algorithms and suitability of the methods.Â
3. Data preparation - technical quality.Â
4. Extent Â of Â evidence Â of Â running Â the Â experiments Â provided Â in appendices.

3) EvaluationÂ
1. Evaluation Â and Â discussion Â of Â the Â results. Â Why Â the Â results Â are important? How would the results be useful to other researchers or practitioners?Â
2. Is this a â€œrealâ€ problem or a small â€œtoyâ€ problem? Â How does the paper advance the state of the art?

5) Clarity of the writing:Â
1. Is Â there Â sufficient Â information Â for Â the Â reader Â to Â reproduce Â the results? Is the language used in the paper good?Â
2. References and general presentation; Are results clearlyÂ presented, with appropriate visualisations?Â

Get instant help from 5000+ experts for