The major aims of this module are threefold:
- undertake a collaborative research project in computing and develop the appropriate skills to manage and contribute to it;
- identify a range of investigative methods that may be applied to complex problems in computer science, and understanding how those methods may be put to use in the context of an advanced computing project, individually or in a group;
- become familiar with examples of professional and research literature and gain an insight into the setting up, running and presentation of research projects. Toward that end, you will work in groups of five (5) to pose and answer a research question, using a dataset of your choosing selected from a specified set of candidates.
You will be required to use specific tools to achieve this:
- Trello, a tool for creating Kanban-style project plans.
- R, a programming language for statistical analysis.
- Git, a source code control tool for managing artifacts.
In addition, you will complete a set of quizzes that test your knowledge of the background reading, and you will evaluate your group members’ contribution to the final deliverable.
Caution: in case of ambiguity in either the problem statement or assessment criteria, the interpretation of the assignment tutor will apply. As such, if any statement is unclear, seek clarification rather than make assumptions, as your assumptions are likely to differ from the tutor’s interpretation.
Part A–Research Infrastructure and Process
- (10%) Research project infrastructure:
- Form a group of five (5) people.
- Register your group on the “group” or “new_group” tab of the “people”
(b) Project Plan and Trello board (due 2020-11-13)
- Create a work breakdown structure that identifies all of the artifacts you need to create in order to complete the coursework.
- Create a Trello (www.trello.com) board for your group.
- Name the board but replace “NN” with the group number from canvas.
- Invite each group member and Caution: be sure you invite the right John Noll to join your board.
I have several Trello accounts that I use for different purposes; youneed to invite [email protected] herts.ac.uk, which is the one out marking scripts use to download your board.
- Create four columns in your board:
- Backlog - this is your “to do” column, which should initially contain all the tasks you need to do to complete the coursework.
- Doing - this is for tasks that you are actively working on. Each task in this column should have one person assigned to be the person responsible for ensuring it is completed on time.
- Review - this is for tasks that the responsible person thinks are done, but need to be reviewed by another member of the group.
- Done - this is for tasks that have been fully completed: finished, committed, reviewed, and pushed to BitBucket. Caution: be sure you use these exact names for the columns in your board, so our marking scripts can track your progress. You may add additional columns to suit your project, but you must have the columns listed above, exactly as specified.
- Make the board public so it’s easy to download.
- Add each of the tasks from your work breakdown structure as cards
in your Trello Backlog column, one card per task.
- Submit the URL for your trello board via Canvas using the “Trello URL” assignment.
Caution: test this URL by having one of your group members who is not the board owner use it to access the board. If we can’t access the board, you get zero (0) credit.
(c) Git repository.
- Create a Git repository for your coursework artifacts on BitBucket. Invite [email protected] to join your repository.
- Submit the URL for your Git repository via Canvas using the “Git URL” assignment by 23:59 on 2020-11-20. Caution: test this URL by having one of your group members who is not the repository owner clone the repository. If we can’t access the repository, you get zero (0) credit.
- (10%) Process (checked throughout the term):
(a) Update the Trello board frequently (at least every week) with progress.
(b) Update the artifacts in your Git repository regularly (weekly at least).
The Trello board and Git repository will be checked randomly throughout the term. Marks will depend on regular, meaningful activity.
Part B–Research Question and Answer
IMPORTANT! You must use the exact names for files specified below. Assignments will be marked automatically by simple Unix scripts, so if you don’t name your files in the way the script expects, you will get zero (0) credit for that assignment.
- (15%) Research question(s) and dataset (due 2020-11-27).
(a) Choose a dataset from www.kaggle.com. Be sure your choice allows you to ask an interesting question that can be answered via correlation analysis, comparison of means, or comparison of proportions.
(b) Commit and push your dataset (in CSV format) to GitHub.
(c) Formulate one research question that can be answered using correlation analysis, comparison of means, or comparison of proportions.
(d) Specify the null and alternative hypotheses for your research questions.
(e) Write your research question, null, and alternative hypotheses, using correctlyspelled, correctly-punctuated, grammatically correct English, in a plain text, Markdown, or LaTeX file (NO Microsoft Word!) called “research_questions.txt” (for plain text), “research_questions.md” (for Markdown), or “research_questions.tex” (for LaTeX).
(f) Commit and push your file to BitBucket by.
- (15%) Visualization and descriptive statistics (due 2020-12-11)
(a) Create an R script called “visualization.R” that will load your dataset, create an appropriate visualization of your data, and output result in a filecalled “visualization.pdf.”
(b) Commit and push your “visualization.R” file to GitHub by 23:59 on 2020- 12-11. Do NOT commit “visualization.pdf”; your R script will create it for us.
- (20%) Final report (due 2021-01-08).
(a) Create an R script called “analysis.R” that computes appropriate statistics to test your hypotheses. Commit and push this file to GitHub by 23:59 on 2021-01-08.
(b) Write a report, in correctly-spelled, correctly-punctuated, grammatically correct English, using Markdown, LaTeX, or Microsoft Word (Word is OK for this deliverable), comprising the following sections:
- Introduction, describing your data set, your research question, and your null and alternative hypotheses.
- Visualization, containing your data visualization (or visualizations, if more than one) and an explanation of what it shows (or they show if more than one). Visualizations should be imported into your document as images created by your R script; do NOT import screenshots!
Approaching expectations, but some mistakes or omissions.
Examples: Solutions identify the main concepts. Writing may use colloquialisms, but is understandable and mostly free of grammar, punctuation, and spelling errors. References (when needed) are not cited correctly. Diagrams have correct syntax, are readable, and identify the main concepts or interactions. Project plan is updated sporadically.
Marginal fail: Some correct performance, emerging understanding, but mastery not thorough and there are numerous mistakes or omissions. Examples: Solutions in general are missing important elements, and/or have errors. Diagrams have errors and omissions, but show some understanding of the core concepts. Writing lacks focus, uses colloquialisms, is repetitive, and/or contains numerous grammatical, spelling, and punctuation errors. References are missing or not cited correctly. Project plan is not maintained in a way that appears to be useful.