Prevent Card Fraud: Data Analytics Assessment

Reduce Credit Card Fraud with Data Analytics: Assessment Brief

Module Learning Outcomes (LOs)

This assessment brief gives you an overview of the formative and summative assessments that are part of this module. The learning outcomes below will be tested in the assessment contained in this brief.

ILO 1: Formulate innovative data driven solutions to commercial problems

ILO2: Critically evaluate the use of algorithms and model when developing analytical solutions

ILO3: Critically appraise the concepts, tools and techniques for data visualisation

A written report, with a code submission included in an appendix or included within a single zip submission. See instructions at the end on how to create a zip file containing your work. Please note: ensure you read the general assessment guidance at the end of this document.

Every year billions of pounds are lost to credit card fraud. While fraud is a substantial cost to the financial system, so too is the cost of detecting credit card fraud. In a competitive market, banks need to balance the cost of fraud against the impact on customer experience of onerous controls. In other words, false positives in the fraud detection process can lead to customer transactions being erroneously declined. This highlights the need for effective detection systems with low rates of error.

Your task, as a data analyst working for a large financial services advisory firm, is to analyse a set of transactions, develop the respective feature inputs and create an appropriate, scalable model for predicting potential fraudulent transactions.

Your formative submission will be a written report (at most 1000 words) that should attempt tasks 1 and 2 as described in the summative submission and select one relevant analytical model models to classify whether a card transaction is potentially fraudulent or not, and critically analyse the model, as described in task 3. You should include a code appendix that performs the associated tasks.

You are provided with a set of 100,000 card transactions* – recorded on 13th and 14th October 2020.

Each transaction feature is listed:

Transaction ID
Date
Time
Type of Card – Visa, MasterCard
Entry Mode – Tap, PIN
Amount
Type of Transaction – Online, POS, ATM
Merchant Group
Transaction Country
Shipping Address
Billing Address
Gender of Cardholder
Age of Cardholder
Issuing Bank

Your submission should be a written report (at most 3000 words) that describes how modelling and visualisation could be applied to reducing the rate of fraud in a financial context. You must also include a code appendix that can perform the following tasks:

Provide a rationale of the steps taken during each step of the Extract, Transform and Load (ETL) phase of the project, discussing any ambiguities, assumptions, and anomalies in the provided data and how you should deal with them (ILO1).

Explain the justification for performing Exploratory Data Analysis (EDA) and the use of appropriate descriptive statistics and visualisations to understand the results of that analysis, and critically analyse how the EDA process will guide your selection of analytical models (ILO3).

Select two relevant analytical models to classify whether a card transaction is potentially fraudulent or not (ILO2). Critically analyse the strengths and limitations of each model with references to the relevant literature. You should choose from the following models:

Logistic regression,
Decision Tree Classifier
Bagging Classifier
Random forest classifier
AdaBoost Classifier
XGBoost
Artificial neural network
Another appropriate state-of-the-art algorithm

Provide a critical evaluation of each model selected in the previous task by using your test data set (ILO2).

Including an explanation of your chosen loss function.

A short discussion of the accuracy metrics.

Cast the accuracy metric, the number of correct predictions, and the number of incorrect prediction results of all the models to a table to allow for comparison.

Based on your findings, make a critically justified recommendation for the use of one model for reducing the rate of fraud.

Communicate your findings (ILO3). Provide several graphical outputs (with commentary) such as a correlation matrix, a heat map or confusion matrix of your results, in order to illustrate your analytical outputs in a visual manner.

A code appendix that performs the following tasks (ILO1):

Import, clean and prepare the data for analysis, ensuring the relevant test,validation and training sets are prepared

Perform Exploratory Data Analysis with appropriate visualisations
Train and test the two analytical models you selected in task 3
Evaluate the two models based on your choice of loss function
Produce appropriate visualisations of your results

Your report should include a list of references used to develop the report and research to support the suggested approach. The list should use only the Harvard Referencing System as highlighted in the General Assessment Guidance section of this document. All the figures/tables used in the report must have captions and, wherever needed, properly referenced, and explained in your submission.

Step 1: Locate the report file and Python notebook (or a folder that contains them both). Select both the files (use Ctrl+click to select both files), or the folder.
Step 2: Once you have selected them, right-click on them, select “Send to”, and then select Compressed (zipped) folder.

A new zipped folder with the same name is created in the same location.

Check that it worked correctly: double click on the compressed file to open it, you should see it contains both files or the folder with the files inside it.