Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Machine Learning Algorithms for Solving Real-World Problems in Regression, Classification, and Unstr

Module Learning Outcomes Assessed:

Module Learning Outcomes Assessed: 
On completion of this module the student should be able to: 
1. Apply supervised and unsupervised learning applications using Gaussian process emulators. 
2. Apply Dirichlet processes for unsupervised learning applications 
3. Develop the knowledge and skills necessary to design, implement and apply the Graphical models to solve real world applications. 
4. Evaluate the applications of fuzzy systems and their usage in hybrid intelligent systems, in combination with evolutionary computing and other machine learning methods. 
5. Apply evolutionary computing methods to develop solutions for the real world optimisation problems and appraise their advantages and limitations. 


Task and Mark distribution: 
This coursework consists of two tasks and you should attempt both and submit one Word or pdf file (or similar) for each task. Each task is worth 50 marks and the marks breakdown for each task is provided with each task. This coursework contributes 100% to your overall module mark. 
 
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported.


Task 1: The Machine learning algorithms for solving real-world problems in Regression, 

During this module, you learned about different advanced machine learning techniques, associated concepts and applications. We explored the Gaussian process model, which is computationally efficient method for Regression, Classification, optimization, etc. We have also covered the Bayesian networks as promising tools for modelling the data with complex dependency structure. Finally, you have learned how to use Dirichlet Latent processes for unsupervised learning applications, particularly text mining. 


 In this assignment, you will have to select an application related to a regression, classification, modelling un-structured data, or text mining problem, and explore how best to apply the machine learning algorithms to solve it. The selected application for each of the methods mentioned above should have the following features: 
 
1. Gaussian  Process  regression  and  Classification:  The  application  selected  for  any  of these  two  methods  must  consist  of  at  least  four  input  variables  and  a  single  output variable. You must also implement Gaussian process classification by appropriately define a threshold on the output variable to create a binary or multiple classes first, and then apply the Gaussian process classification on the categorized output.  
2. Bayesian network: If you are choosing an application for this method, this application must consist of at least eight random variables. The random variables could be all discrete or continuous or hybrid.  
3. There is no restriction on selecting the application to apply the Latent Dirichlet allocation model for topic modelling. There  are  some  potential  projects  listed  below,  which  could  be  studied  to  get  some  ideas. However, I strongly recommend you to come up with your own idea(s) by reviewing these project and some other relevant and recent articles.  
 
1. This dataset from the UCI repository is quite interesting. The task is to predict the depth in  the  body  (effectively,  the  depth  along  the  spine)  given  the  properties  of  a  two-dimensional "slice" of the body. The hard part about this problem is that it is actually the output causing the input rather than the other way around. I have not had luck designing a good regression method for this data. Can you do this? 
2. Find a Bayesian interpretation of elastic net regularization, and compare this method for regression against "standard" Bayesian regression (with a Gaussian prior) on a dataset of your choosing. 
3. Probabilistic PCA using Gaussian Process is a Bayesian interpretation of the classical PCA algorithm for dimensionality reduction. Implement Gaussian Process based PPCA in  Python,  R  or  Matlab,  and  compare  its  performance  with  other  methods  (such  as "standard" PCA) on a dataset of your choosing. 

Task and Mark distribution:


4. Bayesian  optimization  is  very  important  issue  with  a  wide  range  of  applications. However,  this  was  not  fully  studied  during  lectures,  but  it  can  be  easily  implemented using Gaussian Process. The Python codes and some examples can be found here! 
5. The squared exponential covariance is widely used for Gaussian process regression. It is probably used in 90+% of all GP publications. That said, it is widely believed to be "too smooth" for many real-world regression tasks. Compare the squared exponential This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Covariance  versus  the  Matéern  covariance  on  several  datasets  via  Bayesian  model selection. How often is the squared exponential the right choice? 


6. Latent Dirichlet allocation (LDA) is a Bayesian method for creating "topic models" of text documents. There are plenty of interesting text datasets available (e.g., DBpedia could be a good resource!). One idea would be to compare the behavior of LDA with other techniques, such as latent semantic analysis. You may be able to get relevant dataset and ideas by visiting the following sites: 
 
This  compentition  site  consists  of  some  relevant  data,  and  the  relevant  ideas  could  be developed by analyzing this data. Check also dataset in Kaggle competitions. This website has a fantastic compilation of 100 interesting, relevant datasets from all sorts of application areas. 


The  creators  of  libSVM  have  also  compiled  a  great  list  of  datasets,  all  in  a  standardized format. The libSVM codebase also includes libsvmread for reading these in MATLAB. The UCI Machine Learning Repository is a mainstay in machine-learning research. There is  a  wide  range  of  datasets  there  from  many  different  application  areas  and  with  many different properties (large, small, high-dimensional, low-dimensional, classification, regression, etc.).  


Please note, the following guidelines are good practice and should lead to better result, but you 
have the freedom to pick whatever is suitable for your style Working in groups of maximum 2 or 3, you have to select a challenging real world problem and one (or more) appropriate data set(s) as suggested above. You  could also use the following 
links, which have numerous problems and data sets.
 
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third parties or posted on any website. Any infringements of this rule should be reported to
 
Notes: 
1. You  will  not  get  the  full  marks  of  this  section  if  you submit your proposal late.  
2. If the final submission of your CW is the different to what you  propose  in  your  proposal,  you  will  not  get  any
marks for parts 2 & 3.
2) Technical quality 
1. Rigour and extent of the experiments. 
2. Correct application of the selected algorithms and suitability of the methods. 
3. Data preparation - technical quality. 
4. Extent  of  evidence  of  running  the  experiments  provided  in appendices.


3) Evaluation 
1. Evaluation  and  discussion  of  the  results.  Why  the  results  are important? How would the results be useful to other researchers or practitioners? 
2. Is this a “real” problem or a small “toy” problem?  How does the paper advance the state of the art?


5) Clarity of the writing: 
1. Is  there  sufficient  information  for  the  reader  to  reproduce  the results? Is the language used in the paper good? 
2. References and general presentation; Are results clearly  presented, with appropriate visualisations? 

support
close