Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave
Comparing Machine Learning Performance on Large Datasets - Project Overview

Key Details, Requirements, and Definitions

1 Project Overview

Produce a portfolio of studies that critically compare the performance of different machine learning methods applied to at least 3 (possibly related) large datasets. The over-arching focus of the project is to develop a portfolio of methods that can reveal insights into the performance and application limitations of machine learning methods in different contexts. The application of each method should be applied in order to answer a specific (small-scale) research question aligned to the overall goal(s) of the project. It is also expected that the application of each method is accompanied by an appropriately sized lit review documenting pertinent and contemporary approaches in the literature that can both inform the application of a method as well as justify its potential merit(s). Projects will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. Code and data sets are to also be submitted with the paper. Algorithms and resources used in a paper should be described as completely as possible to allow reproducibility. This includes experimental methodology, empirical evaluations, and results. The reproducibility factor will play an important role in the assessment of each submission.

2 Key details, requirements, and definitions

Data Requirements Each dataset should be for predictive analytics tasks, i.e. it should have a meaningful easily identifiable response variable. Each dataset should also be suitably large (at least 10000 rows, and at least 10 columns). An example dataset meeting these requirements is the  Effort It is expected that this project require approximately 50-80 hours of work.

Number of methods : in total, you should apply and critically evaluate at least 5 methods of machine or statistical learning for this project to facilitate your discussion.

It is essential that projects unambiguously evidence all of the following.

1. A critical analysis of fundamental data mining and knowledge discovery methodologies in order to assess best practice guidance when applied to data mining problems in the specific context of the project.

2. The extraction, transformation, exploration, and cleaning of datasets in preparation for the datamining and machine learning methods used in the project.

3. The building and evaluation of data mining and machine learning models on a variety of datasets.

4. The extraction, interpretation and evaluation of information and knowledge that is drawn from the datasets as a central theme in the project.

5. The critical review of relevant data mining research to afford the assessment of research methods applied in the project

Data Requirements


Your report should discuss your approach with respect to the application of CRISP-DM [1] or KDD [2], with an emphasis on the critical evaluation of the methods selected. The following structure is suggested for the report (see Table 1 for more detail):

Introduction : remainder of 1st page (+ up to 1 column). Should motivate the work, present and discuss the research question(s) / objective(s) of the project and (optionally) provide a concise overview of the following sections (max 1-2 lines per each).

Data Mining Methodology (can be named differently): how have you approached answering your question. Additional (technical) details can also be discussed here. Essentially, you should recount how you applied either CRISP-DM [1] or KDD [2] (but not both) to facilitate your research question(s).

You should also include here a discussion on key preliminary aspects of the methodology, such as how the datasets have been prepared for study (i.e., the pre-processing, and transformation stages).

Evaluation – how have you used your method(ology) to answer the question (evaluation methodology), i.e. how do you know that a method is good? I.e. what performance measures have you selected and why (discuss how the choice of performance measures is appropriate). If you have to parametrise part of an approach how have you done that, and why were these choices made, and what impacts can different parameterisations have on your results? You should also discuss the results in detail in this section: what are their implications? What do they show / not show? Etc. A discussion on sampling methods is expected here too.

Conclusions and future work : summarise your findings, and discuss limitations / extensions that were you to have more time, you would do next to improve / extend your study. Summarise the (partial) answer to the research question(s) at a high level, and note the key implications of your findings with respect the methods studied.

References Include a list of references used in your report. Note that websites are not references, they should be referred to in footnotes. All referenced works should be locatable in Scopus. Do not use papers from any of the sources noted in this list: https://beallslist.weebly.com; these papers may be plagiarised, low in quality, not subject to rigorous (or any appropriate) peer review, and should generally be held as dubious and untrustworthy. Note that typically, if a paper is in Scopus, it is unlikely to be in this list.

support
close