Signature redacted Signature redacted Thesis ... - DSpace@MIT

Report 16 Downloads 738 Views
Analysis and Assessment of Credit rating model in P2P lending An instrument to solve information asymmetry between lenders and borrowers By Yang Yang

B.Sc. Management of Science and Project University of Science and Technology of China, 2007 SUBMITTED TO THE MIT SLOAN SCHOOL OF MANAGEMENT IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR DEGREE OF MASTER OF SCIENCE IN MANAGEMENT STUDIES

ARCHNES MASSACHUSETTS INSTITUTE OF TECHNOLOLGY

AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUN 2 4 2015

JUNE 2015

LIBRARIES 2015 Yang Yang. All rights reserved The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now know or hereafter created.

Signature of Author:

Signature

redacted MIT Sloan School of Management

Signature redacted

May 8, 2015

Certified by:

Christian Catalini Assistant Professor of Technological Innovation, Entrepreneurship, and Strategic Management

Signature redacted Thesis Supervisor Accepted by: Michael A. Cusumano SMR Distinguished Professor of Management Program Director, M.S. in Management Studies Program MIT Sloan School Of Management

I

2

Analysis and Assessment of Credit rating model in P2P lending An instrument to solve information asymmetry between lenders and borrowers

By Yang Yang Submitted to MIT Sloan School of Management on May 8, 2015 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Management Studies. ABSTRACT Since the establishment of the first P2P lending platform in 2005, P2P lending industry has been nibbling the market share of traditional consumer credit. In 2014, Lending Club and Prosper originated over 7 billion personal loans. As one of the biggest traditional banks in the

U.S., Citi issued 25.2 billion USD in 2014. Given the advantages of P2P lending over traditional banks, the market for P2P lending is expected to grow rapidly along with the improvement of the internal system of P2P lending platforms, external regulation and more participation from borrowers and lenders. Given the fact that most P2P lending platforms in China first imitated the business model from either the U.S. or European platforms, they have progressively evolved to incorporate different business models due to legislation, economic or behavioral reasons. Several findings are detected by analyzing the data form Lending Club and Prosper. First, although both platforms progressively improve the default rate each year, currently both platforms offer negative returns for investors. Second, if only considering finished/matured loans, higher credit score doesn't lead to less default risk. Third, on average, a default loan will cost a loss more than twice as much as the interest return offered to investors. Taking this cost matrix into consideration, the optimal data model won't necessarily provide the highest accuracy but maximum return. Fourth, the ex post return offered by the platforms is not enough to cover the potential risk facing investors. Thesis Supervisor: Christian Catalini Title: Assistant Professor of Technological Innovation, Entrepreneurship, and Strategic Management

3

4

Analysis and Assessment of Credit rating model in P2P lending An instrument to solve information asymmetry between lenders and borrowers By Yang Yang SUBMITTED TO THE MIT SLOAN SCHOOL OF MANAGEMENT IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR DEGREE OF MASTER OF SCIENCE IN MANAGEMENT STUDIES AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUNE 2015 PURPOSES OF THIS PAPER It's been almost 10 years since the first P2P lending platform was founded in the UK. While P2P lending has been growing rapidly within the past 10 years, it is still in the infant stage compared to the traditional banking industry. There are over 70 academic papers about P2P lending between 2008 and 2015, but from different perspectives, including analyses of determinants of a loan to be successfully funded by investors, regulations, credit risks, determinants of credit quality and default probability, business model of P2P lending across countries, internal information system and literature reviews. Even though a handful of papers did research on credit risks using data mining methodologies, most of them were focused on explaining the determinants of a loan being successfully funded. Few literature considered cost matrix in the model or compared results from Prosper and Lending Club. P2P lending is a two-sided market. In order to further boost market growth, P2P lending platforms also need to enhance the ability of investors to assess credit risks. By doing this, Platforms can offer higher return, and thus, attract more participation of investors in lending activity. The main purpose of this paper is to identify key determinants of a loan's default probability and respective coefficients, and then build the optimal model to predict the loan's status. This model will act as a way to mitigate information asymmetry on P2P lending and gaming philosophy of borrowers. Besides, this paper will also take a dynamic review of the current development of P2P lending built on previous literature. Another motivation for this paper is that the Chinese government just granted the participation of personal credit rating business from non-state owned companies. The public believes this movement will become the game changer for the internet finance industry, especially the P2P lending segment. This paper will justify whether a 3 rd party credit rating 3

will help investors prevent adverse selections. ABSTRACT

Since the establishment of the first P2P lending platform in 2005, P2P lending industry has been nibbling the market share of traditional consumer credit. In 2014, Lending Club and Prosper originated over 7 billion personal loans. As one of the biggest traditional banks in the U.S., Citi issued 25.2 billion USD in 2014. Given the advantages of P2P lending over traditional banks, the market for P2P lending is expected to grow rapidly along with the improvement of the internal system of P2P lending platforms, external regulation and more participation from borrowers and lenders. Given the fact that most P2P lending platforms in China first imitated the business model from either the U.S. or European platforms, they have progressively evolved to incorporate different business models due to legislation, economic or behavioral reasons. Several findings are detected by analyzing the data form Lending Club and Prosper. First, although both platforms progressively improve the default rate each year, currently both platforms offer negative returns for investors. Second, if only considering finished/matured loans, higher credit score doesn't lead to less default risk. Third, on average, a default loan will cost a loss more than twice as much as the interest return offered to investors. Taking this cost matrix into consideration, the optimal data model won't necessarily provide the highest accuracy but maximum return. Fourth, the ex post return offered by the platforms is not enough to cover the potential risk facing investors. Thesis Supervisor: Christian Catalini Title: Assistant Professor of Technological Innovation, Entrepreneurship, and Strategic Management

4

Table of Contents 1. INTRODUCTION.................................................................................................................................

6

1.1 DEFINITION OF P2P LENDING ...................................................................................................................

7

1.2 How P2P LENDING W ORKS (LENDING CLUB, PROSPER)................................................................................7

2. MARKET REVIEW OF P2P LENDING .............................................................................................

10

2.1 MEARKET SIZE .......................................................................................................................................

10

2.2 KEY PLAYERS AND RESPECTIVE M ARKETPLACE ..........................................................................................

11

2.3 M ARKET OUTLOOK OF P2P LENDING .......................................................................................................

13

2.4 BUSINESS M ODELS OF P2P LENDING........................................................................................................15

3.

DATA ANALYSIS AND M ODELING .............................................................................................

19

3.1 INTRODUCTION ....................................................................................................................................

19

3.2 KEY VARIABLES .....................................................................................................................................

20

3.2.1 Prosper .....................................................................................................................................

20

3.2.2 Lending Club.............................................................................................................................20 3.3 DISTRIBUTION OF DATASET .....................................................................................................................

3.3.1 Prosper .....................................................................................................................................

21

21

3.3.2 Lending Club.............................................................................................................................24 3.4 M ODEL BUILDING AND INTERPRETATION-LENDING CLUB ...........................................................................

26

3.4.1 Data Preparation......................................................................................................................27

3.4.2 M odel Building .........................................................................................................................

29

3.4.3 M odel interpretation ................................................................................................................

32

3.4.4 Robustness Check .....................................................................................................................

34

3.5 M ODEL BUILDING AND INTERPRETATION-PROSPER ......................................................................................

38

3.5.1 Data Preparation......................................................................................................................38 3.5.2 M odel Building .........................................................................................................................

43

3.5.3 M odel interpretation ................................................................................................................

47

3.5.4 Robustness Check .....................................................................................................................

49

5

3.6 COMPARISON OF FINDINGS IN MODEL BUILDING FOR LENDING CLUB AND PROSPER .......................................

53

3.6 .1 Sim ilarities................................................................................................................................5

3

3 .6 .2 Differences................................................................................................................................54 3.6.3 Lessons for China's P2P Lending ......................................................................................... 4. CO NCLUSIO N. ................................................................................................................................. 4.1. CONCLUSION OF THIS PAPER................................................................................................................

55 56 56

4.2. FURTHER RESEARCH PROPOSED.............................................................................................................58

5. REFERENCES....................................................................................................................................

58

1. Introduction Freedman and Zhe Jin (2007) wrote the first academic paper to look into the business of P2P lending. They brought up the question of whether P2P lending would reshape the future of the financial industry or if P2P lending would be a fad that would wane over time. Even though it's been over 6 years since that paper, it's still too early to give an answer to that question, whereas what we see on the market is the emergence of more P2P lending platforms globally and the IPO of Lending Club in December 2014. In addition, the attitude of traditional banks toward this infant industry is also evolving. For instance, in early 2014, one employee of Wells Fargo told the media that one internal email was sent by the principal requesting all employees of Wells Fargo not to get engaged with any business of P2P lending. By contrast, many hedge funds or regional banks are purchasing personal Loan products from P2P lending platforms due to stable and attractive return. In addition, more traditional financial institutions also opened their own P2P platforms to catch up with the trend. 6

1.1 Definition of P2P Lending P2P stands for Peer-to-Peer or Person-To-Person. In P2P lending, platforms act as intermediaries matching lenders and borrowers, and transact the money. P2P lending was first introduced by Zopa in UK, 2005. By the time of this paper, Zopa has originated 713 million GBP and is one of the biggest platforms in the world. The emergence of P2P lending is also a result of applying web 2.0 in financial industry. By reducing the overhead cost and infrastructure of traditional banks, P2P lending platforms can offer lower interest rate for borrowers and accumulate huge traffic within a short period (Dhand et al., 2008). 1.2 How P2P Lending Works (Lending Club, Prosper)

fl~ctdApk*u

Lafure fistimp

k"vma

Borrowers want to apply for personal loans for various reasons. The main reason of personal loans on Lending Club and Prosper is credit consolidation. A borrower applies for loans by providing private information such as loan amount, term, credit rating score, debt-to-income 7

ratio, monthly income, occupation and the loan purpose. Both platforms will then assess the information and decide a fixed interest rate for the loan. After the interest is agreed on by the borrower, the loan will be listed on the platform for investors to browse. Then investors can browse loan information and decide whether to invest and how much to invest. Among the 73 papers on P2P lending between 2008 and 2015, 20 papers discussed how to increase the possibility of loans being successfully funded and what are the key determinants. Compared with unverified variables, verified variables play a much more significant role in determining whether to invest a loan (Gregor, et al., 2010). Also, borrowers who are willing to disclose more information normally pay less interest rate (B6hme et al., 2010). Social ties will increase the chances of having the loan fully funded (Sergio, 2009; Greiner & Wang,

2009; Herrero-Lopez, 2009; Hildebrand & Rocholl, 2010; Lin 2009), reduce the ex post interest charged on the loan, and also decrease the default risk associated with the loan (Lin et al., 2009; Zhensheng, 2014). Furthermore, some research is focused on the contribution of demographic information of borrowers on loan funding such as appearance and gender. Research shows that appearance also does influence the decision of lenders to fund a loan or not (Jefferson et al., 2012). Female borrowers are less likely to get loans funded than are male borrowers. Based on all the information provided by the borrower, investors then need to determine whether to lend and how much to lend. The objective of lending money on P2P platforms is to gain high return and mitigate default risk. Investors on P2P lending platforms are inclined to invest in loans with higher ex post return, which also carry higher default risk. Assessing default risks based on previous loans' performance is another focus of academic papers. 8

There are 8 papers that built models to investigate what are the key determinants of default risk, so investors can use this as a guideline to avoid adverse selection. Loans with lower credit grade and longer terms will result in higher default risk (Riza et al., 2015). This finding is opposite from the result in this paper because in my paper, rather than using either completed loans or matured/finished loans, I used a combination of both. There are discrepancies between risk premiums charged and real default risk associated with loans on P2P lending platforms (Kumar, 2007). This conclusion is supported by the fact that the proof shows that the premium charged by P2P platforms is not enough to cover the potential loss of investors (Riza et al., 2015). Recommendations were also imposed that another way to mitigate default risk of loans is to set up a social reputation system in P2P lending platforms (Everett, 2010; Lin, 2009). Platforms will charge borrowers a loan origination fee once the loan is successfully funded. Investors will also be charged a service fee of managing installment payments from borrowers. A handful of papers were focused on building the internal information system of P2P platforms. For instance, Collier (2010) informed practice and theory on developing community reputation that can improve information asymmetry on Prosper and mitigate adverse selection. Also, as an intermediary in the financial market, platforms are regulated by both SEC and CFPB. 4 papers uncovered the current regulations on P2P lending and inform implications for further development of specific regulation for P2P lending. A multi-agency regulatory approach of P2P lending should be implemented that intimates the approach applied to regulate traditional lending (Eric et al., 2012). Borrowers need to pay monthly installment payments until the the loans reach maturity. If 9

desired, they can also choose to repay all principle payments ahead of the loan's maturity by paying a service fee. Platforms also provide a trading system to investors who want to sell holding loans with a certain discount. This trading system, like an open market, helps platforms to provide more flexibility to investors. However, some loans default in early stages of installment payments. This causes a huge loss for investors as a whole. Investors are inclined not to hire an agency to collect net principle loss due to the small amount of investment (Freedman & Jin, 2008). Further research into after-default management of P2P lending is an urgent need because it can help mitigate net principle loss of investors and improve the risk-adjusted return of platforms as a whole.

2. Market Review of P2P Lending 2.1 Market Size The potential market size of P2P lending could be measured in both micro and macro ways. The market size of P2P lending is mainly the size of unsecured loans, including unsecured personal loans and line of credit. The total amount of consumer credit in the U.S. as of Oct, 2014 is 3.283 trillion USD, as asserted by Federal Reserve G.19 release. Per the E2 Release of Federal Reserve, the total amount of outstanding business loans ranging from $10,000 to $99,000 is 3.4 billion. We can sum up above two components as the potential market size for

P2P lending, which is 3.286 trillion USD purely in THE U.S. market. Currently, Prosper contributes 2 billion in fund lending, and Lending club contributes 6 billion in loans. In a macro way, we can even expand the market to the middle size business loans since lending club also provides business loans up to 300K USD. The total amount of business

10

loans ranging from IOOK to 999K is 12 billion (Donghon, 2014). Conservatively, we can add another 2.4 billion to the potential P2P lending market. This will result in a market with a total amount of 4.288 trillion USD dollars. Investors on P2P lending platforms are about to eat between 25 percent and 30 percent of the business that traditional banks are doing. The overall market of P2P lending will then grow to about $1 trillion by 2025 (Cromwell, 2015).

2.2 Key Players and Respective Ma rketplace Rank

Lending Site

Year Founded

Loan Volume($billion)

Country

1

Lending Club

2007

6

USA

2

CreditEase

2006

3.2

China

3

Upstart

2012

3

USA

4

Prosper

2006

2

USA

5

Zopa

2005

0.8

UK

Lending Club. Lending club which was founded in 2007 has been paying investors $590 million in interest returns. Per the statistic data from Lending Club's websites, by

3 0th

September 2014, 83.17% of Lending Club borrowers reported that they use loans from Lending Club to refinance existing loans or pay off their credit cards. The breakdown of the main purposes of Lending Club loans is shown below.

11

/J

-

Ct

--

.'

F g:ff

Prosper. Prosper, founded by Chris Larsen and John Witchel on February 5, 2006, was the first P2P Lending platform in the U.S. It stays unlisted and is financially supported by several big names in venture capitals. Till now, Prosper had more than 2 million members and generated over 2 billion loans. Upstart. It was founded by ex-Googlers in 2012 in the U.S. and originated more than $3 billion in loans with an annual growth rate of 265%. The major difference that lies between Upstart and other platforms is that when assessing the credit quality of borrowers, Upstart starts with the same information but will further include academic variables to come up with the risk assessment more statistically. CreidtEase. As reported by Peter Renton in 2014, CreditEase is the largest P2P lending platform in China and has generated more than $3.2 billion USD in loans to over 500,000 borrowers. This company was founded in 2006 and is now operating in over 150 cities of China. Zopa. Zopa is the oldest Peer-to-Peer lending company in the world. The company was founded in 2005 in the UK. It has lent $1 billion USD and has helped both borrowers and investors get better rates. 12

2.3 Market Outlook of P2P Lending The emergence of P2P lending exceeded the public's expectation in recent years. P2P lending would increase by 66% to a total size of 5 billion USD by the end of 2013 (Gartner, 2010). Looking at the statistic data of the biggest platforms, I found that lending club experienced over 150% annual growth rate till 2014. Besides, Prosper.com also achieved exponential growth since its establishment. Till the end of 2013, it originated over 300 million USD in loans and moved this number to over 1.5 billion USD in loans by the end of 2014. Despite the fact that it's extremely difficult to estimate the exact growth rate of P2P lending, there are several determinants that can indicate the future trend of P2P lending from a macro perspective. 1) Geographic expansion. Till now, P2P lending is not fully authorized in all states of the U.S. due to the complexity of autonomy. Even in China, the acceptance of P2P lending varies among different regions. Further geographic expansion would be expected in the next few years. 2) More comprehensive legislation. The main reason that certain public authorities or groups are still skeptical about P2P lending is that it is still in its infancy and is less regulated compared to traditional banks. The specific regulations for P2P lending are an urgent need in the market. 3) Challenges from traditional banking. Given the fact that the P2P lending has huge cost-advantage to traditional banks, with the recovery of the U.S. economy, the government is considering loosening the requirement for loan borrowers. This will help traditional banks to regain borrowers who are not entitled to a loan. In China, many financial institutions also introduced their own P2P platforms to gain a piece of the pie. 4) Information asymmetry. Information asymmetry might lead investors to adverse selection (Akerlof, 1970) and moral hazard (Stiglitz and Weiss, 1981). Various efforts are being made in order to 13

mitigate the information asymmetry by the platforms. 5) Bottom line of the economy and employment. The performance of both the economy and employment will impact the further development of P2P lending. As the statistic data from Proper and Lending club, most of the borrowers' purpose is credit consolidation. Stronger economy and improved wages and employment rate indicate that people's financial condition will be better off and the need of credit consolidation will decline accordingly. 6) Institutional investors. P2P lending can provide a higher ROI than many other investments in the financial market. There are institutional investors who purchase loan packages from platforms to gain stable cash flow and return. A simple comparison among different financial investments is listed below. In 2013, P2P lending generated much lower return than NYSE and Dow Jones Industry Composite, but outperformed NYSE and Dow Jones in 2014. However, for P2P lending platforms,

I'm using the official investment return rate while the true risk-adjusted

investment return might vary from this data. Another point worth noticing is that the superior return from stock market in 2013 is due to the recovery from an economic and financial downturn. An ROI around 10% is already very impressive in the financial investment sector. As reported by Bloomberg, the average return of hedge funds was 7.4% in 2013. Investment

Lending club

Prosper

3yr T

NYSE

Dow Jones

2014

10.50%

9.79%

1.10%

4.22%

7.52%

2013

8.75%

9.86%

0.78%

23.18%

26.50%

Till the end of 2014, the total amount of loans originated through P2P lending in China has reached $40 billion with a default rate of 17.46%. 1.16 million borrowers got their loans funded by 630,000 investors, and these numbers increased by 364% and 320% compared 14

with numbers of 2013 respectively. There are 1575 P2P lending platforms in China, and 275 went bankrupt in 2014, implying that one out of six platforms was not sound. The average amount of loans and money that individual investor funded is $35,000 USD and $64,000 USD. This statistics data comes from Wangdaizhijia.com in China.

2.4Business Models of P2P Lending This section will introduce the business models used by major P2P lending platforms in the U.S and China and address the major differences between the two markets. In the U.S. market, the business models of P2P lending platforms are quite similar to each other. Borrowers post their loans on platforms and investors browse and choose loans to invest. The P2P lending platform acts as an intermediary and is responsible for risk rating, determining interest rate, document verification and interest payment management. However, Prosper and Lending Club still varies in several ways as below. 1) Loan type. Prosper only originates personal loans ($2000-$35,000 USD) while Lending Club also originates business loans up to $300,000 USD and personal loans ranging from $1000 to $35,000 USD. Besides, Prosper and Lending Club provides loans with different maturities. Both provide 3-year and 5-year loans. In addition, Lending Club provides a 1-year loan as well. 2) Interest rate. P2P platforms determine the interest rate by considering information reflecting borrowers' credit quality. Both Prosper and Lending Club stipulate the cap and floor interest rate for loans falling into different credit Rating/Grades. However, Interest rate in the same credit category varies between Prosper and Lending Club due to different credit rating logic. 15

3) Credit scoring. Prosper and Lending Club provides a proprietary credit score as a major indicator of loan risk. They both offer 7 rating categories, Prosper from HR (worst) to AA (best) and Lending Club from G (worst) to A (best). 4) Origination Fee. Platforms earn money by charging fees to borrowers. The cap and floor fee rates charged by Prosper and Lending Club are the same, whereas different rates are charged for borrowers in different risk categories. A simple comparison is listed below, including credit rating, respective interest rate and origination fee. Lending Club Rating AA A B C D E HR

Interest Rate Origination Fee 1%2% 6.05%'7.96% 4% 8.19%11.33% 5% 11.56%'14.06% 5% 14.59%'18.27% 5% 19%'22.68% 5% 23.44%27.04% 5% 27.75%31.25%

Rating A B C D E F G

Interest Rate Origination Fee %3% 5.49%'8.19% 4%-5% 8.67%11.99% 5% 12.39%'14.99% 5% 15.59%-17.86% 5% 18.54%21.99% 5% 22.99%-25.5.7% 5% 25.8%'26.06%

5) Affiliate & Referral Programs. Prosper introduces the affiliate program to attract more

borrowers and lenders from referrers and to provide $100-150 USD for borrower leads and $50 for lender leads. Lending Club also introduced the affiliate & Referral program, but detailed bonuses are not provided on its website. 6) Both Prosper and Lending Club provide Notes Trading Platform, where investors can trade their holding notes with each other. Folio is a Broker-Dealer platform which only charges sellers 1%. 7) Early repayment. Borrowers can choose to pay the remaining repayment without paying any penalty, in order to refrain from paying monthly interest in the future. 8) Interest Auction. P2P lending platforms normally regulate the interest rate for loans, 16

based on the information provided by the borrowers. However, in early years, Prosper introduced interest an rate auction in which investors can bid the lowest interest rate they can accept to compete funding the most popular loans. This is the reason why sometimes we can see that the loans were originated with a lower interest rate. Prosper stopped the interest auction service in 2011 and implemented a fixed interest rate like Lending Club. In China's market, P2P lending platforms are basically following the same model as those in the U. S., acting as an intermediary between borrowers and lenders. However, due to differences of economic and legal environment, as well as the customer's behavior, there are unique features which evolved from P2P lending in China. We use Hongling Capital and Creditease as representatives since they are two of the earliest P2P platforms which originated in China. 1) Loan Type. Hongling Capital offers personal and business loans with an amount between $500 and $1,600,000 USD, with maturities between 3 months and 12 months. Creditease offers personal loans of amounts between $1,600 USD and $1,000,000

USD with

maturities between 1 year and 4 years. Obviously, P2P lending platforms in China's market are more aggressive and also bear higher default risk. 2) Interest Rate. Rather than determining the interest rate based on credit score, maturity and amount as P2P platforms in the U.S., China's platforms determine the interest rates simply based on loan type or maturity, because there is no credit agency that can provide a comprehensive credit report for individuals (China's PBOC just authorized certificates for credit agency in January 2015). Hongling Capital regulates interest rate between 8% and 18% and Creditease between 10% and 12.5%. 17

3) Credit Scoring. The only credit report that a borrower can submit is the one provided by PBOC that includes the history of credit card usage and loan repayment. Platforms don't rate borrowers into different credit categories, which differs from U.S. platforms. It's a common practice

for platforms to enable

credits to borrowers/investors

if they

successfully pay the monthly payment or make investment. For instance, Hongling Capital category sorts customers into 5 categories from VI (lowest) to V5 (highest). Investors on Hongling Capital can refer to different categories as a risk indicator. 4) Origination fee. Creditease charges investors 10% of interest earnings and borrowers 10% as service fee. Rates and Fees on Hongli is more complex. Hongli charges investors from 0% to 10% as fees. This charge is determined depending on the categories, which range from V I to V5. For instance, investors in VI need to pay 10% of interest earnings as a service fee, and those in V5 don't need to pay any service fee. For borrowers, Hongli also charges various percentages on loans, as a service fee based on different loan types. The overall range is from 3% to 14.6%. 5) Affiliate & Referral Programs. Creditease doesn't pay the referral bonus, while Hongli pays $6 USD if the referred customer registers as a normal member, and $12 USD if he registers as a VIP. 6) Notes trading. Platforms in China also provide notes trading services to investors. 7) Early repayment. On Creditease, if borrowers want to pay the remaining loan earlier, besides the interest for the current month, remaining loan and service fee, they need to pay a 0.5% of the remaining loan as a penalty to the platform. Similarly, borrowers on Hongli Capital need to pay interest for an extra month as penalty if they want to pay off 18

the remaining loan earlier. 8) Principle Guarantee. The biggest difference between the U. S. and China in P2P lending is that many platforms in China introduce a

3

rd party company to guarantee the safety of

investors' money, just in case any fraudulent funding happens. This is the remedy for the lack of credit score available from borrowers and platforms that will improve the confidence of investors. However,

3 rd

party guarantee is not a catholicon for P2P lending

in China. A certificate of Guarantee Company only costs $1

million USD and there are

cases where owners disappeared with the money, leaving investors to lose all their money.

3. Data Analysis and Modeling 3.1 Introduction There are questions being addressed in this section, including 1) the distribution of PV, rate of bad loans and interest of different credit categories. 2) Whether the risk-return improves from year to year, especially when platforms change their policy. 3) Any behavior difference of borrowers and investors between Prosper and Lending Club. 4) Investigate the contribution of determinant variables to the performance of loans. 5) Build the model to determine the possibility of default using different data mining methodologies. 6) As researched by Riza, Yanbin, Benjamas and Min in 2014, the higher interest rate regulated by Prosper and Lending Club for riskier loans is not enough to reimburse the potential loss exposing to investors. This section will use a FCFF methodology to test this conclusion considering the time value of future cash flow.

19

3.2 Key Variables 3.2.1 Prosper

Variable name

Type

Definition

Credit Rating

Numeric

Proprietary Credit rating by P2P lending platforms

Loan Status

Dummy

Borrower Rate

Numeric

Whether the loan is active, completed or default Interest rate borrower is willing to pay

Borrower APR

Numeric

Actual rate borrower needs to pay considering service cost

Lender Yield

Numeric

Actual rate lenders receive considering service cost

Listing Category

Dummy Numeric

The purpose of the loan The time period of employment till the creation of listing

Current Credit Line

Numeric Numeric

Whether the borrower owns real estate The number of credit lines the borrower owns

OpenRevolvingMonthlyPayment

Numeric

RevolvingCreditBalance

Numeric

The monthly payment of revolving account The current credit balance of revolving account

BankcardUtilization

Numeric

The percentage utilization of revolving credit balance

AvailableBankcardCredit

Numeric

The total amount of bank card credit till the creation of the loan

TradesNeverDelinquent DebtToIncomeRatio

Numeric Numeric

The percentage of delinquency of trades The percentage of debt to income

StatedMonthlyIncome

Numeric

LoanOriginalAmount

Numeric

Monthly income stated by borrowers The original amount of loan originated

Investors

Numeric

Terms

Numeric

Employment Duration Is Borrower home owner

The number of investors who fund the loan The term length of the loan

Both Prosper and Lending Club define "bad loans" as loans that are 60+ days past due within the first twelve months from the date of loan origination.

3.2.2 Lending Club

Variables

Type

Definition

Grade

Dummy

The proprietary credit rating of Lending Club

loan-status

Dummy

int rate

Numeric

The current status of the loan The interest rate the borrower needs to pay

Purpose emplength

Dummy Numeric

home-ownership

Dummy

open acc

Numeric

If the borrower owns or rents an apartment The number of open credit line of the borrower

revol bal

Numeric

The amount of current credit balance

The purpose of the loan The time length of the employment of the borrower

20

revol util dti

Numeric Numeric

annual inc loan amnt

Numeric

installment Term

The current ratio of credit balance utilization The debt to income ratio The amount of annual income The amount of the loan The amount of monthly payment The term length of the loan

Numeric Numeric Numeric

3.3 Distribution of Dataset 3.3.1 Prosper When depicting the distribution of loan's characteristics, we exclude current and cancelled listings that haven't completed and funded. Besides, records with proprietary credit rating "NC" are excluded due to incomplete information, and those loans were originated in early 2006 and 2007 when Prosper was in infancy. There are 113 rows of records that are missing proprietary credit rating. We assume that these records won't influence the validity of our analysis due to the small amount of records.

Amount of Loans Mean Average 9,466 12,000 9,685 11,000 9,764 12,000 8,423 10,000 7,500 6,326 4,000 4,250 3,500 3,056

Successful Credit Category AA A B C D E HR

Credit Category AA A B C D E HR

Rate 30% 23% 25% 29% 47% 49% 76%

1vear 3% 3% 3% 2% 2% 3%

0%

Number of Loans 6,487 10,479 12,023 14,892 15,259 10,286 8,846

Term 3 years 93% 86% 79% 76% 83% 87% 100%

Total 61,402,940 101,490,254 117,411,802 125,436,437 96,539,254 43,717,649 27,031,067

STDEV Default Rate 11% 6,664 16% 6,664 22% 8,345 28% 7,044 31% 5,853 37% 2,629 46% 1,323

5years Interest rate $/investor Credit Score 791 53 4% 8.9% 738 73 12% 11.4% 712 15.4% 87 190/ 104 682 22% 18.9% 667 91 23.6% 15% 640 10% 28.3% 103 621 29.3% 89 0% 21

There are several features of the dataset distribution of Prosper. 1) Surprisingly, the successful rate of a listing being funded to be a loan decreases when credit worsens. This might be caused by the higher interest rate paid by worse credit rating. 2) The majority of loans are from C and D, consistent with our expectation that the major loans on Prosper (even most of the P2P lending platforms) came from borrowers with poor credit record. 3) From the best credit rating to the worst, the average and medium amount of the loan is declining continuously, majorly because the limitation placed by P2P platforms. 4) The default rate climbs when credit getting worse. The default rate of A-loan is 11%, while 46% for HR-loan. 5) As we expected, interest rate increases when credit quality declines. An assessment will be done in the following section to test if the interest rate advised by Prosper is enough to cover the potential loss. 6) There is a trend that for loans with poor credit rating, investors tend to place more money on each investment.

Number of Loans 12,000

18,000 16,000

10,000

14,000 12,000

8,000

10,000

6,000

8,000

NO. of Loans --

4,000

6,000 4,000

2,000

2,000 0 AA

A

B

C

D

E

HR

22

Ave rage Amount

Borrower Rate vs. Prosper Rating

0.3443 03288 03125, 0299 02863 0.2745 0.2623 0.2521 0.2417 0232 02225 0.2127 0.2025 0.1932 3t 0.1839 0 0.1753 0:1679 0.1587 0.1495 0,1424 0.1338 0.1248 0.1162 0.1075 0.0985 0.0911 0.0813 0.0714 0:0623 0

-

h

A

B

AA

Smooth(Borrower Rate)

C

E

D

HR

Prosper Rating

Percentage of Total Loans by amount Year

AA

A

B

C

D

E

HR

Default Rate

2006 2007 2008 2009 2010 2011 2012 2013 2014

7.5% 15.4% 23.3% 21.6% 16.1% 7.3% 7.3% 4.6% 6.8%

7.7% 16.8% 19.5% 24.9% 20.9% 17.5% 17.7% 16.5% 18.3%

9.3% 19.9% 23.2% 6.9% 14.7% 16.9% 18.1% 24.5% 24.0%

11.2% 21.3% 17.4% 17.9% 9.0% 9.1% 22.8% 31.2% 29.4%

9.8% 15.3% 11.2% 13.6% 19.5% 27.1% 18.9% 14.9% 1.4%

8.9% 6.2% 3.0% 5.2% 8.3% 16.7% 5.4% 6.8% 6.5%

45.6% 5.2% 2.5% 9.8% 11.5% 5.5% 9.9% 1.5% 13.6%

39.2% 39.5% 33.0% 15.2% 16.7% 22.6% 31.2% 23.6% 24.5%

7) Year by year, more investors switch to riskier loans from A or AA classes, especially to loans in B and C. This trend might be caused by investors seeking higher interest rate as well as the improved loan default rate under each credit category. 8) Both the overall default rate and the default rate for each credit category decreased continuously. However, investors are becoming more risk-averse. This improvement can be explained by the effort that Prosper is better off in risk screening and verification. (When calculating the default rate, loans that originated after Q2 2014 are excluded from the dataset, because no loans could be past due more than 60 days, and when they do, they are considered as default) 23

Default rate YoY Year

AA

A

B

C

D

E

HR

Overall

2006

8.8%

16.7%

24.7%

36.2%

35.8%

48.8%

64.8%

39.2%

2007

14.3%

25.8%

33.3%

41.1%

42.8%

53.2%

62.2%

39.5%

2008

18.3%

25.6%

32.9%

33.4%

37.4%

43.6%

52.5%

33.0%

2009

6.0%

9.3%

16.8%

15.4%

22.4%

22.3%

23.7%

15.2%

2010

3.9%

9.8%

11.2%

15.3%

21.4%

24.9%

25.4%

16.7%

2011

2.9%

9.4%

15.5%

14.9%

24.8%

32.1%

31.0%

22.6%

2012

8.1%

9.3%

14.1%

20.1%

23.9%

25.9%

28.5%

31.2%

2013

4.1%

2.8%

4.6%

7.5%

10.8%

13.1%

13.6%

23.6%

2014

8.7%

0.4%

0.7%

1.2%

1.6%

2.5%

1.7%

24.5%

3.3.2 Lending Club Amount of Loans Credit Category

Successful Rate

Number of Loans

Total

Average

STDEV

Default Rate

A B C D E F G

32.6% 28.8% 26.7% 28.3% 29.1% 33.6% 33.6%

20,076 33,882 27,641 17,980 8,484 3,772 916

213,245,525 402,115,200 352,094,900 246,222,500 148,964,150 73,021,450 20,171,950

10,622 11,868 12,738 13,694 17,558 19,359 22,022

6,586 6,861 7,769 8,426 9,505 9,225 8,417

8.5% 17.2% 24.2% 30.8% 36.4% 43.5% 43.2%

1) There is no significant difference of successful rate listing being funded across different credit categories in Lending Club, 2) Loans are more concentrated on good-credit loans from A to D in terms of number of loans and total amount. 3) What is different from loans on Prosper are lower-credit loans on LC which tend to have bigger amount than higher-credit loans. This is an indicator that LC considers amount as a contributor when rating loans. 4) There is no significant switch of investors' risk aversion year by year on lending club. 5) The default rate of LC is much lower than Prosper in each year and under each category, but this doesn't mean that the overall risk return that Lending Club generates is higher than Prosper as a whole. More detail will be interpreted in the 24

following sections. 6) Interest rate for loans among the same credit rank on LC and Prosper is similar. 7) There is a trend of improvement regarding default rate from 2007 to 2010. I don't involve years after 2011 into consideration since most loans are still under regular payment process, whereas for loans originated in early years, most of them are either fully paid or went default. Percentage of Loans by credit grade-LC Year

A

B

C

D

E

F

G

2007 2008 2009 2010 2011 2012 2013 2014

22.7% 18.9% 25.0% 24.3% 26.5% 20.4% 13.1% 14.2%

24.3% 32.5% 28.9% 30.7% 30.2% 34.7% 32.7% 26.6%

29.9% 28.0% 25.3% 21.4% 18.1% 22.3% 28.3% 28.1%

14.7% 14.2% 13.9% 14.0% 12.9% 13.7% 15.3% 18.9%

5.6% 4.8% 5.0% 6.9% 8.0% 6.0% 6.7% 8.7%

2.8% 1.3% 1.4% 2.1% 3.3% 2.5% 3.3% 2.8%

0.0% 0.3% 0.5% 0.8% 0.9% 0.5% 0.6% 0.8%

Default Rate YoY-LC Year

A

B

C

D

E

F

G

Overall

2007 2008 2009 2010 2011 2012 2013 2014

1.8% 5.8% 6.7% 4.7% 6.6% 6.3% 1.7% 0.5%

13.1% 14.6% 11.4% 11.1% 11.5% 11.0% 4.4% 1.1%

18.7% 17.8% 14.8% 14.5% 16.8% 15.1% 7.4% 1.8%

40.5% 24.3% 17.4% 18.6% 20.9% 19.1% 10.8% 2.8%

35.7% 16.0% 21.6% 22.5% 23.8% 23.4% 12.8% 3.8%

28.6% 47.6% 17.2% 30.0% 28.1% 25.6% 17.0% 5.8%

0.0% 50.0% 34.8% 28.4% 31.5% 30.7% 16.6% 5.8%

17.9% 15.8% 12.6% 12.6% 14.1% 13.2% 6.9% 1.9%

Number of Loans by Risk Category

25

Number/Amount of loans 40,000

35,000 30,000 25,000 20,000

Number of Loans

15,000

-U-Average amount

10,000 5,000

A

B

C

E

D

F

G

Interest Rate Range by Risk Category 02509 0.24S 0.2352 0.229 0.2215 02159

Column 2 vs. Column 1 Smooth(Colu.m. 2)

0.1939 0.1891 0.171 0162

U.324 0.1261 0.12183 0.1172 0.1141

-

als 014. .40.1426

00432 0.0781 0.0692 Credit Grade

3.4 Model Building and Interpretation-Lending Club This section contains five steps. First, prune the datasets of Lending Club and Prosper for the model building. Second, select variables and build the logistic model to predict the default probability. Third, try to interpret the significance of each variable and compare the estimates with the expectation. Fourth, Choose alternative data models to predict the loan status, as 26

well as net profit/loss, and try to compare the result with conclusion made by logistic regression. Last, as a robustness check, I will test the linear assumption between predicting variables and target prediction, and try to explore the nonlinear relationship between target prediction and each individual predicting variable.

3.4.1 Data Preparation In the data preparation, I tried to only incorporate parameters that can be somewhat verified. There are definitely some variables such as loan purposes that borrowers can fabricate subjectively. Even though we can build a model with a good performance using those subjective parameters, the reliability of the model is questionable. 1) Homeownership. The original options for this variable include "rent", "own", "Mortgage", "None", "Other". We create dummy variable, considering 1 as "own" or "mortgage" and 0 for the rest. Answers of "own" and "Mortgage" are considered as 1, and the rest as 0. 2) There are over 300,000 rows of data; all current listings are excluded from the dataset since we're aiming to detect any indicators of risks from an investor's perspective. 3) Loan Status is the target to predict. Loan status. Loan status of "0" represents active loans that already finished all payment or that are still in payment process. "I" represents default loans including charged-off, default, or delinquencies more than 31 days (since there are only two categories for delinquent loans, less or equal to 30 days or more than 31 days). Initially, there are 87880 "completed" loan listed on Lending Club, while my interest is to look at loans that either finished all payments or declared default already. Keeping that in mind, I further split completed loans into two categories - paid and in-process. Within completed loans, there are only 5509 loans that already finished all 27

payments. The remaining 82371 completed loans are still in payment process. However, as shown in the below graph, 50% of bad loans declared default before Ih month. Or 75% of bad loans declared default before 171 month. This implies that within those 82371 loans that didn't finish all payments, there is a great chance that they will eventually pay off all installments. Therefore, in order to provide a reliable data model and mitigate bias toward completed loans, I treat completed loans that have paid at least 17th installments as finished loans, and assume that they won't go default in future. By doing this, I get 38555 good loans (finished all payments) and 24871 bad loans (default or charged off). 65

60 60

NO. of Month Paid vs. loan status 3 NO. of Month Paid

00

55

00

50 45 40 35 30 Z)25

20 15 10

0 1

0

loan_status

4) Income verified. "0" represent that the income is not verified while "1" means income verified. 5) Independent variables involved in the regression: Loan amount, term, employment length, homeownership, annual income, if the income is verified, debt to income ratio, FICO credit score, open account, revolving credit balance, the utilization ratio of revolving credit balance, total account. I excluded the variable "purpose" from the model due to the 28

low reliability of the value that borrowers put when they applied for the loan. 6) The whole dataset will be divided into training and validation. The whole dataset is randomly partitioned into 43426 training rows and 20000 validation rows 7) Profit/Cost matrix. I need a cutoff value in order to classify the predictions into 0 or 1. To do that, I need to compute firstly the profit/cost matrix for Lending Club. There are 63426 loans in the dataset, including 38555 good loans and 24871 bad loans. Good loans generate $108,339,408 out of the total original amount $450,364,975, representing a ROI of 24.1%. Bad loans cost investors a total loss of $219172141, out of the total original amount $350771625, representing a negative ROI of 62.5%. Finished loans as a whole causes a loss of 110,832,732 out of the total amount $801,136,600, representing negative ROI of 13.8%. You might be surprised that the real ROI that Lending Club offers to investors is actually much lower than the one it advertises on the website. The profit/cost matrix should be as below. Profit Matrix Actual

Predicted

Loan Status

0

1

0

1

-1

1

-2.6

0

3.4.2 Model Building Before building the model in each step, I selected variables based on R-Square, AIC and BIC rules. Then I compared the performance of models using different variable combinations. 1) R-Square oriented stepwise selection intends to remove open acct from the model. 2) A minimum AIC recommend further removing home-ownership from the data model. 3) 29

Selecting to use Minimum BIC also gives the same result of excluding open acct and homeownership from the model. Detailed results are listed below.

Entered [X] [X] [X] [X] [XI [XI [XI [XI [XI [X] [X]

Entered

Maximize Rsquare Parameter

Intercept[1] loanamnt term emplength homeownership annualinc isincv dti FICOScore openacc revolbal revolutil Minimum AIC Parameter

Sig Prob 1 8.30E-70 3.00E-233 5.00E-15 0.51441 1.30E-41 6.81E-09 3.20E-84 0 0.88003 3.76 E-09 4.57 E-06

Sig Prob I

[X] [X] [X] [X]

Intercept[I] loanamnt Term emplength home ownership

[X] [X] [X] [X]

annualinc isincv Dti FICOScore open acc

8.30E-70 3.OOE-233 5.OOE-15 0.51441 1.30E-41 6.81E-09 3.20E-84 0 0.88003

[X] [X]

revolbal revolutil

3.76E-09 4.57E-06

Entered [XI [XI [XI [X] [X]

Minimum BIC Parameter

Sig Prob

Intercept[1]

1

loanamnt term emplength homeownership

8.30E-70 3.OOE-233 5.OOE-15 0.51441 1.30E-41

annualinc 30

[XI

isincv

6.81E-09

[XI [X]

dti FICOScore open_acc

[XI [X]

revolbal revolutil

3.20E-84 0 0.88003 3.76E-09 4.57E-06

Based on the result from data selection, I ran the logistic regression Estimates of parameters under slightly different variable combinations are listed below. There is no significant value or sign difference between the two results. Besides, RSquare-oriented variable combination offers a RSquare of 0.2135, while AIC/BIC selected variable combination gives only a

slightly lower RSqure -- 0.2134. Estimate Term

Maximize Rsquare

Minimum AIC/BIC

Intercept loanamnt Term empjength homeownership annualinc isincv Dti FICOScore revolbal revolutil

-10.66162 -0.00003 -0.03942 -0.02573 0.01513 0.00001 -0.13985 -0.03298 0.01900 0.00001 0.21735

-10.67306 -0.00003 -0.03937 -0.02533 N/A 0.00001 -0.13967 -0.03296 0.01902 0.00001 0.21590

Since the model using parameters selected by RSquare stepwise offers slightly better result, I computed the formula as below accordingly.

1 P(Default) = 1 +

eO-(-0.66162+PiXi)

fli: Coeff cient of parameter X1 : Parameters The confusion matrix generated from two combinations is listed below. Both models achieve 31

the best performance under a cutoff value of 0.44, meaning that if the default probability equals to or is bigger than 0.44, the loan will be determined as default, vice versa. The overall accuracy rate of the two combinations is close to 69.1% for RSqure combination and 68.8% for AIC/BIC. The former one does a better job in identifying good loans, while the latter one is more accurate in identifying bad ones. Both combinations can improve the overall ROI of Lending Clubto negative 1.2% by AIC/BIC combination and to negative 1.7% by RSquare combination. Even though the risk return after enhancement is still negative, a progressive step has been made by imitating 12% loss. Not surprisingly, there is a price paid to improve the overall risk adjusted return to investors. Applying this model means the overall volume of loan origination will decline by 37.8%, while this improvement in risk adjusted return can help amass the credit worthiness for P2P platforms and attract more investors thus borrowers in the long run. Confusion Matrix-RSquare Actual Predicted loan Status 0 1

0 9180 3256

1 2923 4621

Confusion Matrix-AIC/BIC Actual Predicted loan Status 0 1

0 8959 3099

1 3144 4778

3.4.3 Model interpretation In this section, I will analyze the estimates of parameters concluded in model building, and compare the result with business intuitions held by the public. To make the interpretation more clear, when a

32

parameter is claimed to have a positive impact to default rate, it means the higher the value the parameter have, the higher default probability the loan involves, and vice-versa. Several papers also tried to interpret the impact of parameters. FICOScore has a negative impact to default rate, while debt-to-income ratio and credit line utilization have a positive impact (Riza, Yanbin, Benjamas and Min, 2015). However, when looking at the result from the model that only included the finished loans, some of estimates of variables are not intuitive. This section will start from interpreting variables that are counter-intuitive with our expectation, and then go through those that match the expectation. 1) "Loan amnt" has a negative impact to the default probability. Normally, a higher Loan amnt gives people an image of involving higher risk, while it turns out that this is not the case. 2) The same to "term". There are two time length allowed on Lending Club - 36 and 60 months. Generally speaking, given all the other features constant, 60-month loan doesn't contain a higher default risk than 36-month. This might explain that Lending Club only approves a longer term loan if the borrower is more qualified. 3) "Home_ownership". Owning a real estate doesn't necessarily mean that you're more credit worthy. It's actually the opposite. 4) "Annualinc". A higher income put by the borrower when applying for a loan won't guarantee a better consequence. The impact of this variable should be considered with " is incv", which has a negative impact to the default rate. 5) "dti-debt" to income ratio. This ratio also has a negative impact to the default rate. This impact could be explained that some income information of borrowers is fictive. Further research in the paper will only include loans with verified income to detect any different result. 6) One most surprising finding is that "FICOScore" has a positive impact to the default rate. People might think that borrowers with higher FICOScore normally have better credit quality, since the credit score backed by a

3 rd

party agency is

normally very reliable. However, on Lending Club (and also later mentioned in Prosper's model), 33

FICOScore is not a good indicator of the credit quality. Lenders can't simply make the decision based on this score, which is actually what lots of investors are doing. 7) "revol_util" and "revolbal" have positive impact to default rate, which is consistent with expectation. Because the majority of borrowers on Lending Club are applying for loans to coordinate personal credit lines, a higher balance and utilization ratio indicate a higher financial pressure of paying back the balance.

3.4.4 Robustness Check Besides building the model to predict nominal target parameter, I also considered using the same predicting variables to predict the numeric parameter-net profit/loss, to check the numeric regression outperforms logistic regression. The same as the previous section,

I prune the predicting variable

combination oriented by RSqure, AIC and BIC and list the result below. Three ways to rule out variables give the U.S. the same result-to keep all variables in the linear regression model. Entered

Parameter

Estimate

[XI [XI [XI [X] [X] [X] [X] [XI [XI [XI

Intercept loan_amnt term emplength annualinc is_inc_v dti FICOScore revolbal revolutil

-13687.535 -0.1754355 -106.27022 -44.95143 0.00523282 -239.96839 -96.287356 27.2358601 0.01584572 1126.27626

Looking at the estimates of variables in a linear regression, it makes more intuitive sense than the result from the logistic regression. For instance, "loanamnt", term and" dti" have a negative coefficients with net profit in a sense that the higher value the variables have, the lower profit or higher loss that the loan will cause investors. By contrast, FICO_Score, and annual_ inc place positive to the loan's net profit/loss. The model generates an RSquare of 0.1072, which is significantly lower 34

than the value by logistics model. To further test which model is superior to the other one, I also draw the confusion matrix for linear regression model by setting up a profit/loss value as cutoff of good or bad loans. Under a cutoff value of net profit/loss of negative $2,100, the model achieves the highest accuracy of 67%, which could be further broken down to 74% of identifying good loans and 55% accuracy of identifying bad loans. However, the performance of this model is still worse than the logistic model. Confusion Matrix-RSquare Actual Predicted loan Status 0 1

0 9152 3422

1 3146 4258

The different coefficient of the same parameter to default probability and net profit can be understood by twofold way. First, the amount of net loss outweighs that of net profit significantly, therefore the positive impact imposed by FICOScore or annualinc can't bring enough profit to push the net P/L to positive numbers. 2) However, it's true that higher FICOScore and annual inc can reduce the net loss if loans go default, and can also increase the positive return if loans are proved to be good. I also used discriminant and neural network to classify good and bad loans and got confusion matrix listed below. Literally, both models outperform logistic model in the overall accuracy and net profit if applying the cost matrix to the results below. The overall accuracy of discriminant is 68% with a further breakdown of 70% accurate for good loans and 65% for bad loans. Using neural network, the accuracy turns out to be 69%, with 76% accurate for good loans and 59% for bad ones. However, there are two key disadvantages of discriminant and neural network. One is that the structure of the model is non-transparent and user can't interpret the importance of each parameter. Investors can't apply the model easily when making investment decisions. Another disadvantage is both model need

35

to be changed dynamically whenever there is a new data entering the original dataset. Confusion Matrix-Discriminant Actual Predicted Loan Status 0 1

0 8608 2680

1 3690 5000

Confusion Matrix-Neural Network Actual Predicted loanstatus 0 1

0 10268 3932

1 2030 3748

Another key robustness check is to test the assumption of a linear relationship between predicting parameter and target prediction (P/L). To do that, I test the optimal structural relationship.

I use

RSquare as the rule to judge the optimal exponentiation. Detailed results are listed below. Predictor

Formula

Rsquare

LoanAmnt Term emplength homeownership Annual inc isinc v dti FICOScore open acc revolbal revol uti

Quintic Linear Quintic Linear Logistic 3P Linear Quintic Quartic Quartic Quintic Logistic 3P

0.0567 0.053 0.0043 0.0005 0.0019 0.0172 0.0244 0.0169 0.0077 0.0082 0.0055

After having the formula of each parameter, I return firstly to linear regression model by using newly formularized parameters together to predict net Profit/loss, and compare with the previous linear regression model to check if the performance is better off. Coefficients and selection result are listed below. Entered

Parameter 36

Estimate

[X] [X] [X] [X] [X] [X] [X]

Intercept Loanamnt Term Emplength Annual income dti 2 FICOScore 2 open acc 2

[X] [X] [X] [X]

Revolbal Revoluti homeownership isincv

2103.486 0.777 0.612 0.404 -0.555 0.778 1.021 0.000 -0.124 -0.687 174.240 -96.392

By using new formularized parameter linear regression model achieves the best performance under a cutoff value of negative 2300 loss, meaning that loans with a potential loss that equals or are bigger than 2300 will be marked as default, otherwise as good loans. The overall accurate rate equals to 68.4%, with an accuracy of 80% of good loans and 50% accurate in identifying bad loans. This model outperforms previous linear model by subtle advantage, while it is still not as good as discriminant or neural network. Confusion Matrix-RSquare Actual Predicted loan Status 0 1

0 9828 3843

1 2470 3837

I further tried to use the newly formularized parameter to predict loan status by using logistic regression, and received the below result. This model outperforms the previous logistic model by 16%, with an overall accurate rate of 70%. I haven't went extra miles to explore if the discriminant and neural network using new parameters, but a reasonable guess would be the performance of these two models will also improve if doing so. Testing formula or additional data structure uncovers the necessity of investigating the nonlinear relationship between predictors and target parameter.

37

Confusion Matrix Actual Predicted loan Status 0 1

0 9609 3377

1 2689 4303

Risk Premium by Lending Club Another important topic is to assess if the interest rate charged to borrowers on Lending Club is enough to compensate the potential loss of investors. To achieve this, I used IRR and FCFF to compute the rate and PV for each loan (including current loan). When using IRR, we also need to estimate the number of terms that investor can receive installments on average, and then combine it with the probability of default for each loan. For FCFF, it's important to find the proper discount rate for each loan. Due to the distinctions among loans, the discount rate that needs to be used is also identical. The higher the risk, the higher the discount rate should be. I used the interest rate computed from the regression model as the discount rate with the possible number of terms that investor can receive installments.

3.5 Model Building and Interpretation-Prosper

3.5.1 Data Preparation There are 230448 rows in Prosper's dataset, including 151903 current loans and rest are either completed or default loans. Even though the data layout of Prosper is slightly different from Lending Club, major variables are still available for Logistic Regression and I will compare features between Lending Club and Prosper in the end of this section. We will build the prediction model for Prosper, following the same rule with Lending Club, and interpret and visualize what we conclude from the model. 38

1) We also take out all current loans and just look at the completed and default loans of Prosper. In addition, I checked if the one specific loan has completed its payment terms by cross checking column "month since loan origination" and "term". Given this purpose of only analyzing finished loans, I also excluded all loans with a status of "completed", "past due
Recommend Documents