Evolusi Web dan Perbedaan antara Web 2.0 dengan Web 3.0

Web 1.0 adalah web yang dikembangkan untuk pengaksesan informasi dan sedikit interaktif. Secara garis besar, Web 1.0 bersifat read-only yang artinya halaman web hanya untuk menampilkan informasi saja…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Credit Card Fraud Detection using Keras and R

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numeric input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount’. Feature ‘Time’ contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature ‘Amount’ is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature ‘Class’ is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Now lets load the csv file using the read.csv function in R

As you can see it is quite hard to simply distinguish fraudulent from non fraudulent transactions. In general, it is always a very good idea, before applying any fancy and complex classification and regression algorithm, to explore a bit and understand your data! A very useful tool is the ggplotgui package.

Most probably (since you are reading this article) you know that Keras is a high-level neural network API. Key feature of Keras is that it has a very fast learning curve and it is actually fast to run!

So, in RStudio just copy-paste the following code to install Keras and its interface with R

From the term “fraudulent transaction detection” we understand that transactions can be fraudulent or not. Do you hear any bell ringing (metaphorically)? Hope so, because this is a typical binary classification scenario. You have a transaction and you want to classify it either as fraudulent (1) or non-fraudulent (0).

Initially less split the dataset into two parts:
1. Training
2. Test

The first set of data will be used to train the prediction model, while the second set of data will be used to test the result of the prediction model.
These datasets do not have any intersection, i.e. the training data are only used in the training process, while the test data only for the training process.
So let’s do it in R!
We are using the caret package and the “createDataPartition” function. Using the “Class” column from our data we create the indexes to subset (of better split) initial dataset to the two parts.

We’ve created an index that will use 70% of the data on training and the other 30% for a test set.

So next step is to prepare these two parts for Keras. As you see in the code bellow, we also scale the data. This is very important to do, because it helps numerically the neural network to calculate weights that will better fit our data.

Now let’s see all these in R code:

And now let’s run the training!

You will notice that during the training you will get history graph that shows the progress of the training.

Now let’s check out the summary of our model!

Cool? Now it’s time for the truth, let’s evaluate our model with the test dataset

From the resulting table we see that the error of classing a non-fraudulent transaction as fraudulent is only 9/85288 (0.01%), while the error classifying a fraudulent transaction as a non-fraudulent is 36/109 (33%)!

This kind of imbalanced errors makes sense taking into account how imbalanced the dataset is (85297 non-fraudulent vs 145 fraudulent transactions) and the way we splitted the dataset. In order to improve our accuracy, we need to sample more data classified as fraudulent and use less data classified as non-fraudulent. Also by fine tuning the parameters and structure of the neural network, the accuracy will increase.

Hope you enjoyed this article and reached the end!

And here you can download all the code that I used for this project

And here you can find and fork the Notebook in kaggle

Add a comment

Related posts:

Getting started with Azure Event Grid Extension for Visual Studio Code

I love Visual Studio Code and all the extensions that are available. It has become my preferred editor for almost everything (replacing Notepad++), except for when developing in Visual Studio…

How to create an always free K8S cluster in Oracle Cloud

Oracle Cloud Infrastructure (OCI) has an always-free plan (here), where you have access FREE and FOREVER to some computing resources. With this in mind, I started thinking about how I could create a…