Identifying Bank Frauds Using CRISP-DM and Decision Trees | Data Mining | Machine Learning

Publish in

Internet & Technology

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
International journal of computer science & information Technology (IJCSIT)
  International journal of computer science & information Technology (IJCSIT) Vol.2, No.5, October 2010DOI : 10.5121/ijcsit.2010.2512 162 I DENTIFYING B  ANK  F RAUDS U SING C RISP -D M AND D ECISION  T REES   Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil 2 Network Engineering Laboratory, University of Brasilia (UnB), Brasilia-DF, Brazil  A  BSTRACT    This article aims to evaluate the use of techniques of decision trees, in conjunction with the management model CRISP-DM, to help in the prevention of bank fraud. This article offers a study on decision trees, animportant concept in the field of artificial intelligence. The study is focused on discussing how these treesare able to assist in the decision making process of identifying frauds by the analysis of informationregarding bank transactions. This information is captured with the use of techniques and the CRISP-DM management model of data mining in large operational databases logged from internet bank transactions.  K   EYWORDS   Fraud detection, fraud prevention, decision taking, machine learning, decision trees, data mining. 1.   I NTRODUCTION   Banks have strong security systems aimed to protecting the access to internet banking servicesthrough the Internet, but can not guarantee the security of computers that customers use, andhow they are used, to avoid problems of electronic fraud.It is almost impossible to eradicate bank fraud. What can be done is to minimize frauds andprevent them. Quinlan, creator of the ID3 and C4.5 algorithms, described in his book C4.5:Programs for Machine Learning, published in 1993 [1], that many applications of artificialintelligence are based on a model of knowledge that is usually employed by a human specialist.In some cases, the data analyzed by the expert should be classified for better observation, orplaced in certain categories or classes according to their main features. In this paper, studies of classifications and their results are used to help in the prevention of bank fraud. The method of study is called Decision Trees, which will be discussed in the sections below and will beimplemented within a management model of data mining, called CRISP-DM.This paper is organized as follows. Section 2 is aimed at discussing related work. Section 3presents a review of CRISP-DM. In section 4 we discuss the main characteristics of DecisionTrees and the methods to build a good decision tree based on information theory principles. Theimplementation of a decision tree for bank fraud detection is described, as well as the analysisof the results are presented on Section 5. Finally, we conclude our work in section 6. 2.   R ELATED W ORK   There are several types of research works in the domain of fraud detection. They include frauddetection in credit cards, telecommunications, money laundering, and intrusion detection.Usually the proposed techniques use artificial intelligence in general, employing either  International journal of computer science & information Technology (IJCSIT) Vol.2, No.5, October 2010163 individually or conjointly solutions from artificial neural networks, statistical analysis,econometrics, expert systems, fuzzy logic, genetic algorithms, machine learning, patternrecognition, visualization and others.Papers [2] and [3] present broad surveys and discussion of research regarding techniques fortackling various types of frauds. Paper [4] describes the tools available for statistical frauddetection and the areas in which fraud detection technologies are most used, pointing out thefundamental fact that seldom one can be certain, by statistical analysis alone, that a fraud hasbeen perpetrated. Due to this uncertainty, in [5], the discussion is centered on how databases of customer transactions have to be submitted to several data mining techniques that search forpatterns indicative of fraud, a process which represents a challenge in fraud detection given theneed to find algorithms that can learn to recognize a great variety of fraud scenarios and adapt toidentify and predict new scenarios.This paper takes into account these studies and tries to bring an effective fraud detectionsolution based on decision trees and data mining, with tests on large databases logged frombank transactions. 3.   CRISP-DM The Cross Industry Standard Process for Data-Mining – CRISP-DM [6] [7] is a model of a datamining process used to solve problems by experts. The model identifies the different stages inimplementing a data mining project, as described bellow. 3.1. Implementation of the CRISP-DM CRISP-DM is based on the process flow showed in Figure 1. The model proposes the followingsteps:1.   Business Understanding – to understand the rules and business objectives of thecompany.2.   Understanding Data – to collect and describe data.3.   Data Preparation – to prepare data for import into the software.4.   Modelling – to select the modelling technique to be used.5.   Evaluation – to evaluate the process to see if the technique solves the problem of modelling and creation of rules.6.   Deployment – to deploy the system and train its users. 3.2. Business Understanding The first phase of the CRISP-DM is the Business Understanding. For the sake of this paper thisphase is aimed at defining the business objectives of the bank. The proposed goal is to detectfraud from a fraud history log. It should also be aware of the need to extracting data so as toobtain a better understanding of those transactions that may result in fraud. A good assessmentof the current bank situation is also very important, especially in which regards losses that fraudis causing to customers and the bank itself. After the implementation of the model, theevaluation should check if these losses were minimized. Also at the Business Understandingphase risk assessment and a project plan must be developed with the next steps forimplementing the CRISP-DM process.  International journal of computer science & information Technology (IJCSIT) Vol.2, No.5, October 2010164 Figure 1. Phases of the CRISP-DM Process 3.3. Data Understanding The second phase of the CRISP-DM is the Data Understanding. The initial data should becollected and a description of this data must be produced, as well as a verification of its quality.This is where the fraud history of the bank is synthesized, with the required attributes such astime of the fraud, the number of frauds, fraud types, and so on. 3.4. Data Preparation The next step is aimed to prepare the data for import into fraud detection software, so this is theData Preparation phase. In our case study, we are preparing data for use in the algorithms of decision trees. It is the phase to find calculated fields, incorporate external databases, perform agood data cleaning and classify the attributes as irrelevant, categorical and numerical. 3.5. Modelling This phase uses modelling techniques on data that was prepared in the Data Preparation phase,so as to select, try and use an adequate modelling technique, such as neural networks. In ourcase study, we are using decision trees, using a database for training, validation and testing of bank frauds. 3.6. Evaluation In this phase a checking procedure is performed to asses whether we have use the best tool fordata mining and verifies that the data is really portraying the reality understood in the BusinessUnderstanding phase. If more processes are to be modelled, the process returns to the BusinessUnderstanding phase and reiterates the whole process.  International journal of computer science & information Technology (IJCSIT) Vol.2, No.5, October 2010165 3.7. Deployment When we are ready with the design, the implementation is made in the Deployment phase. Itrequires that we must not forget to create artefacts in each preceding phase of the process so asto conduct training sessions with the users of the system.With bank transaction logs being produced continuously and new frauds being forged in a rapidpace, a project of data mining does not last long and should always be updated. Information thatis true today may not be tomorrow, since the data are very volatile and new types of fraud arealways expected. 4.   D ECISION T REES   A decision tree is both a data representing structure and a method used for data mining andmachine learning. This is the technique that is used in this paper for modelling frauds, duringthe CRISP-DM Modelling phase described in the previous section.Let's assume a large amount of data and the need to classify them to find out answers on somesubject. For this, we can use the concept of a decision tree as a model that maps theobservations, taking into consideration a selected attribute as its starting point. The mostdifficult question here is to find the best attribute. The decision trees assist in this work of selecting the attribute that will develop a better performance in finding the required information.The technique Divide to Conquer is used in decision trees, which consists in breaking theproblem into simpler problems, and easier to solve [8]. Furthermore, strategies applied in acertain section of a tree can be applied recursively.As illustrated in Figure 2, a decision tree is composed of the following parts [8],[11],[12],[13]:1.   Node – Contains a test of an attribute2.   Branch – Contains a response to each attribute3.   Leaf – Each leaf is associated with a class4.   Rule – Each route from the root to a leaf corresponds to a classification rule.Figure 2. Structure of a Decision Tree
Related Search
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks