whatsapp

Connect on Whatsapp : +97143393999, Uninterrupted Access, 24x7 Availability, 100% Confidential. Connect Now

Securing Higher Grades Costing Your Pocket? Book Your Assignment at The Lowest Price Now!
loader
Add File

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!

Stuck on Your Question?

Get 24x7 live help from our Top Tutors. All subjects covered.

loader
250 words

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Students Who Viewed This Also Studied

11 Pages
MTH219 Fundamentals of Statistics and Probability

In the endemic phase of a virus spread, testers are employed to carry out a random testing in a community of 400,000 people. There are a total of 8 testers, each tasked to test 20 persons per day rand ...

Course

MTH219

Type

Home Work

Subject

Statistics

University

Singapore University of Social Sciences

55 Pages
FQ520 Quantum Analysis

Task Assignment Scenario Pipelines Engineering Company (PEC) have been operating as a pipeline installer for 40 years in the oil and gas industry. They price their tenders in accordance with the ...

Course

FQ520

Type

Programming

Subject

Statistics

University

College of Contract Management

Season

Fall

13 Pages
SES6005 Research Methods and Applied Statistics

Question: 1. A (hypothetical) experiment is conducted on the effect of alcohol on perceptual motor ability. Ten subjects are each tested twice, once after having two drinks and once after having tw ...

Course

SES6005

Type

Home Work

Subject

Statistics

University

Victoria University

Season

Fall

15 Pages
Small Businesses And The Pandemic

Student ID number: U3223537 Assessment Name: Case study: Importance of data Small Businesses and the Pandemic Covid-19 which is the pandemic which has been sweeping across the world since almost th ...

Course

11165

Subject

Statistics

University

University Of Canberra

Season

Spring

CAP 5768 Introduction to Data Science

Question

Answered

Task:

All analyses must be performed in R using the tidyverse and glmnet packages discussed in class. Fill in all your solutions in the appropriate spaces provided in this Word document, and then upload a PDF copy of your solutions to Canvas. Only PDF copies will be graded.
Brief overview of assignment
In this assignment you will be using the dataset GlobalAncestry.csv, which is available on Canvas. You will be analyzing genetic data from 242 humans sampled across the world from six ancestries. The first column in each dataset, labeled ancestry, takes the following values:
African            San and Yoruban individuals from sub-Saharan Africa
European            Italian and Russian individuals from Europe
EastAsian            Chinese and Japanese individuals from East Asia
Oceanian            Melanesian and Papuan individuals from Oceania
NativeAmerican        Pima and Mayan individuals from the Americas
Mexican            Mexican individuals from the Americas
Unknown1            Unknown ancestry
Unknown2            Unknown ancestry
Unknown3            Unknown ancestry
Unknown4            Unknown ancestry
Unknown5            Unknown ancestry
The GlobalAncestry.csv is a large dataset with genetic data for individuals 242 at 8916 genomic locations. As we discussed in our introductory lecture for this course, each individual will have a value of 0, 1, or 2 at each of these genomic locations, indicating “genotype” that the individual has at this location. 
Training a lasso penalized multinomial regression classifier
The goal is to train a multinomial regression classifier to predict K=5 ancestries (African, European, EastAsian, Oceanian, and NativeAmerican). The training dataset will consist only of individuals with African, European, EastAsian, Oceanian, and NativeAmerican ancestries, and the best classifier will be determined by lasso-penalized multinomial regression and 10-fold cross validation. You will consider 100 tuning parameter values (λ), taking values between 0.001 and 1000 evenly on a base-10 logarithmic scale, as we have highlighted several times in class. You will then choose the classifier that is the simplest classifier that is within 1 standard error of the best classifier.
Predicting ancestry of individuals with unknown ancestry
You will then use this classifier to predict the ancestries of the five unknown individuals (Unknown1, Unknown2, Unknown3, Unknown4, and Unknown5) based on their genetics.
Predicting ancestry proportions of individuals with Mexican ancestry
You will also use predicted class probabilities to estimate the fraction of ancestry that each individual of Mexican descent has from each of the five continental ancestries used to train the classifier. You will then use violin plots to visualize the distributions of these probabilities across the set of individuals of Mexican ancestry, and hypothesize about the historical reasons for the ancestry distributions you observe.
Instructions for loading GlobalAncestry dataset into your RStudio Cloud environment
Recall that to upload a file to RStudio Cloud, you first must download the GlobalAncestry.csv file to your computer. Once the file is downloaded, within the “Files” panel of the RStudio Cloud environment, click “Upload” and browse to the appropriate directory on your computer to upload the GlobalAncestry.csv file. 
The GlobalAncestry.csv file can be loaded using the read_csv() function of the readr package that comes loaded with tidyverse, and assigned to an object called GlobalAncestry as
GlobalAncestry <- read_csv("GlobalAncestry.csv")
If you are having trouble loading the file, then refer back to the video lecture on Linear Regression where this was demonstrated in class.
Note about using glmnet for classification
When using glmnet, you will not need to recode classes as values 1, 2, 3, etc. We only performed this recoding in class to illustrate the connection with using linear regression applied to a response with values 0 and 1, as linear regression requires a quantitative response. Therefore, do not recode the ancestry values in the dataset, and simply use the values as is.
Questions and problems
1. [10%] Load the GlobalAncestry.csv dataset, and split and store the dataset into three separate datasets: training dataset, test dataset of unknown ancestries, and test dataset of Mexican ancestry. That is, create the following three datasets:
1. Training data frame called train, which only includes observations with ancestry values African, European, EastAsian, Oceanian, and NativeAmerican. 
2. Test data frame called test, which only includes observations with ancestry values Unknown1, Unknown2, Unknown3, Unknown4, and Unknown5.
3. Test data frame called testmex, which only includes observations with ancestry value Mexican.
Provide code below:

2. [20%] Apply glmnet to the training dataset train from Question 1, to train a multinomial regression classifier with a lasso penalty across 100 tuning parameter (λ) values, taking values between 0.001 and 1000 evenly on a base-10 logarithmic scale. The response will be ancestry, and the input features will be the values at the set of 8916 genomic locations. Train this lasso-penalized multinomial regression model across the 100 tuning parameter values, and plot the regression coefficients for each of the K=5 classes as a function of log(λ). Based on these results, does it appear that regularization and feature selection is working? Explain your answer.

CAP 5768 Introduction to Data Science

Answer in Detail

Solved by qualified expert

Get Access to This Answer

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.

Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.

Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.

Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.

25 More Pages to Come in This Document. Get access to the complete answer.

MyAssignmenthelp.com boasts a team of talented and highly skilled coursework writers based in UK who assist students in the best possible manner. Students, who need coursework help, find our services fulfilling and effective because we provide high quality help at affordable price. So, students who often search can someone do my coursework cheap or can experts Write my coursework for cheap get perfect solution at MyAssignmenthelp.com. They pay for best coursework and get it from us.

More CAP 5768 CAP 5768 Introduction to Data Science: Questions & Answers

Q
icon

We aren't endorsed by this University

MTH219 Fundamentals of Statistics and Probability

In the endemic phase of a virus spread, testers are employed to carry out a random testing in a community of 400,000 people. There are a total of 8 testers, each tasked to test 20 persons per day randomly selected from the community, over a duration of 90 days. There is no repeated testing of any in ...

View Answer
Q
icon

We aren't endorsed by this University

FQ520 Quantum Analysis

Task Assignment Scenario Pipelines Engineering Company (PEC) have been operating as a pipeline installer for 40 years in the oil and gas industry. They price their tenders in accordance with the methodology of the gulf stream requirements and where require use the form of measurements provided ...

View Answer
Q
icon

We aren't endorsed by this University

SES6005 Research Methods and Applied Statistics

Question: 1. A (hypothetical) experiment is conducted on the effect of alcohol on perceptual motor ability. Ten subjects are each tested twice, once after having two drinks and once after having two glasses of water. The two tests were on two different days to give the alcohol a chance to wear of ...

View Answer
Q
icon

We aren't endorsed by this University

Small Businesses And The Pandemic

Student ID number: U3223537 Assessment Name: Case study: Importance of data Small Businesses and the Pandemic Covid-19 which is the pandemic which has been sweeping across the world since almost the past 2years. The major impact of the pandemic was obviously on the health and lifestyle of the p ...

View Answer

Content Removal Request

If you are the original writer of this content and no longer wish to have your work published on Myassignmenthelp.com then please raise the content removal request.

Choose Our Best Expert to Help You

icon

5% Cashback

On APP - grab it while it lasts!

Download app now (or) Scan the QR code

*Offer eligible for first 3 orders ordered through app!

screener
ribbon
callback request mobile
Have any Query?