The Missing Trade of China: Balls and Bins Model

Report 0 Downloads 35 Views
The Missing Trade of China: Balls and Bins Model Huimin Shi∗ The Ohio State University May 11, 2010

Preliminary and Incomplete Abstract Based on Customs transaction record of China, the paper builds up several stylized facts on the extensive margin of Chinese trade for trade partner, trade product and trade firm. The paper applies the balls-and-bins model from Armenter, Koren (2008) to Chinese trade data. The sparse nature of trade data is crucial to understand prevalent zeros among product-country trade flows. Balls-and-bins model matches zero trade flow at the country-product level. However, at the firm level, balls-and-bins model cannot predict the percentage of small traders which is different from the finding of the U.S. data. Keywords: Shipment; Balls-and-Bins; Chinese Trade



Email: [email protected].

1

1

Introduction

Shipment level data of Customs transaction record is the most disaggregated trade data available for research. Currently, the most widely exploited datasets are U.S. and French trade and firm data. The newly issued Chinese Customs transaction data opens the door for us to understand the largest developing trader in the world. Trade of China is important to the world trade and the Chinese economy itself. China has the third biggest trade value in 2005. The measure of economy openness, (Export + Import)/GDP ratio, is 64% for China 2005. U.S. and German are the first and second biggest traders in 2005. However, their trade reliances are below 20%. Trade plays a far more important role in the Chinese economy than any other big economy in the world. The paper builds up several stylized facts on the extensive margin of Chinese trade and compares these facts with the counterparts of the U.S. They include: 1. Most potential trade flows are missing for product-country combinations. 2. The incidence of non-zero trade flows of product-country follows a gravity equation. 3. Most firms export a single product to a single country. However, the percentage of this type small exporter is much smaller in China than in the U.S. 4. Most exports are done by multi-product, multidestination exporters. I apply the balls-and-bins model of trade from Armenter, Koren (2008) to check the predication of the model on these facts for the Chinese data. Balls-and-bins model takes the trade data structure as given without explaining why certain countries have more trade volume or number of shipments. Starting from Chinese trade structure, balls-andbins builds up only upon the aggregate trade pattern. Surprisingly, first, we can see that for the country-product combination, balls-and-bins model can generate similar number of zeros for the missing trade at different product classification levels. It also matches the gravity property of the data: China exports and imports more types of goods from closer and richer trade partners. Second, for the firm-country combination, balls-and-bins can 2

also match the number of zeros and gravity properties in the data. The paper confirms the firm selection effect of trade to different trade partners which is modeled from Helpman, Melitz, Rubinstein (HMR) (2008). From the two points above, even for balls-and-bins model which is a model out of randomness, as long as it captures the aggregate properties of trade share across countries and products, it can match the extensive margin of trade for country-product, country-firm combinations. Third, for firm-product combination, taking advantage of the more detailed data, contrary to U.S. data, the paper finds that balls-and-bins cannot predict the number of small exporters who sell a single product or trade with a single destination country. The economical power that shapes the aggregate data of trade for a country cannot be the only main power to shape exporters’ product scope or trade partner scope. Balls-and-bins model provides us a benchmark model to understand the structure of the extensive margin of trade. How balls fall into bins only depends on randomness and the uniform distribution assumptions. I am not arguing that firms and countries are exporting or importing out of randomness. What I argue here is that as long as ballsand-bins can match any certain stylized facts to some extent on the extensive margin of trade, to be considered as a success, any theoretical trade model should perform better in matching data in that dimension. This type of work has been done in the exchange rate prediction field by comparing random walk model and any exchange rate model for the power of predication. It is still new in trade field. The first strand of related literature is the recent discussion of Chinese trading firms. Manova, Zhang(2009a) documents a lot of facts of Chinese trading firms in terms of products, trading partners and trading firm ownership. Manova, Zhang (2009b) documents how firms of different ownership behave differently in firm exports under credit constraints. Manova, Zhang (2009c) uses Chinese trade data to explain quality heterogeneity across firms and export destinations. My paper focuses on the extensive margin missing trade

3

of China at the country, product and firm levels. The second strand of related literature is the discussion of the extensive margin of trade. The world trade grows along intensive and extensive margins. Intensive margin means the trade volume of existing trade products, trade partners and trading firms while extensive margin means the number of trade products, partners, and firms. Neglecting extensive margin of trade generates a bias for gravity estimation by only looking at positive trade flows. Felbermayr, Kohler (2004) propose “a corner solution version of the gravity equation” to explain movements along both margins and resolves “distance puzzle” (elasticity of trade volume to distance has increased which is counter to the evidence that trade cost has dramatically decrease over periods). More and more empirical work have pointed out the importance of the extensive margin while more and more theoretical work try to include the extensive margin of trade. Existing literature have discussed extensive and intensive margins at three levels: country, product, firm. At the country level, HMR (2008) find that the rapid growth of world trade from 1970 to 1997 was predominantly due to the growth of the trade volume among countries that traded with each other in 1970 rather due to the expansion of trade among new trade partners. Although this paper does not discuss trade dynamics, the top 5 destination countries of China takes 57.66% of total export value, while top 5 source countries of China take 51.53% import value. The median export destination country takes only 0.018% while the median import source country takes only 0.019%. At the country level, the intensive margin of trade relationship with some important trade partners matters more for the Chinese trade. At the product level, Hummels, Klenow (2005) state that the extensive margin accounts for around 60% of the greater exports of larger economy. They also compare the implications of Armington model, Krugman model, quality differentiation model, and fixed costs of exporting to a given market model on the extensive margin. Baldwin, Har-

4

rigan (2007) divide the trade value data into the number of goods traded, the amount of each good that is shipped, and the price of the good. For missing trade in the U.S. data, they find that the incidence of export zeros is strongly correlated with distance and importing country size. My paper finds similar result for Chinese data. China exports and imports more types of products to and from closer and richer trade partners. At the firm level, HMR(2008) builds up a model based Melitz (2003) with fixed cost, variable cost, endogenous firm entry and export. Firms endogenously select themselves to export to closer and richer country. They emphasize that neglecting this selection effect leads to bias in the gravity equation estimation. The result I find for Chinese data support their framework of selection effect. The number of export firms to a certain destination or source country is positively related to country’s GDP and negatively related to the distance from China. The same is also true for import firms. The third strand of related literature is about the product scope of trading firms. Although multi-product, multi-destination firms dominate international trade, little research has modeled their production and export decision in general equilibrium. Bernard, Redding and Schott (2009) develop a general equilibrium model based on Melitz model (2003) that features selection across firms, products and countries by involving productivity draw and consumer taste draw. They include in firm level product decision as well as export decision. They find that more productive firms export more products to more destination countries. In the opposite direction of multi-product multi-destination exporters, my paper here tries to find features of single-product, single-destination exporters in the market. For each firm, the number of shipments, and the trade value are explicitly known for China. That is the advantage of Chinese dataset compared with the U.S. dataset used by Armenter, Koren (2008). With specific firm level shipment number data, balls-and-bins under-predict the number of small exporters. Approximation method change the predication result from balls-and-bins model on small exporters. Some other

5

economical forces beyond what shapes the aggregate trade flows and randomness must be working in shaping the product scope of exporters. My paper dissuses the extensive margin of trade for all these three dimensions. Basically, for the country and product level, balls-and-bins model performs well. However, for the firm level, firm decision needs to be introduced in to explain the data. For the fourth strand of related literature, the paper gets some interesting findings that might be important for trade cost literature. On the one hand, at the shipment level, the Chinese export data contains 31.52% low-value (below $2500) shipments. For the U.S. data, the shipment data are reported from above $2500. Low value shipment implies low unit value cost or low trade cost. Trade cost is related to distance as well as types of goods exported. A lot of work has been done on identifying trade cost using different levels of trade data. On the other hand, there are much fewer small exporters(single-product single destination) in China compared with U.S. while they are bigger in trade value percentage. It means for exporters, China has a less granular distribution compared with the U.S. in terms of trade value. As analyzed in Giovanni, Levchenko (2009), in a granular world of firm size, like the U.S., lowering variable cost is more important than lowering trade entry cost. In a non-granular world, trade entry cost matters for welfare.Lowering entry cost is more important for a less granular economy, like China. By assuming there is a certain correlation between trade value and firm size, this finding points out a direction for future research that lowering trade entry cost might increase the trade volume and the welfare of China. Overall, from data, I find very different data structure for shipment and small firms for China comparing with the U.S. The rest of the paper is organized as follows: Section 2 presents the balls and bins model. Section 3 applies balls and bins model to export and import data. Section 4 makes a conclusion.

6

2

Model

A good example of balls-and-bins model is Birthday Paradox. The problem can be stated as: To get a probability of 50% chance that two people were born on the same day of the year, how many people do we need to have? Assuming the uniform distribution of birthdays across 365 days of the year, the answer is surprisingly few: 23. To restate the problem in balls-and-bins way: suppose there are equal sized k bins (k = 365), throwing balls (people) randomly into bins until there is a certain bin that contains at least two balls, how many balls do we need? Consider the process of tossing n balls into k bins. All balls are equal points without any weight which means sizes of balls do not matter. Each round of tossing is at random and independent from each other. Thus for each ball, whether it falls into a certain bin only depends on the size of the bin. The uniform distribution assumption means that the probability that one ball fall down at any point within the range is the same. Given k equally-sized bins, after the nth toss, the expected number of bins containing exactly m balls is 1 1 E(m, n) = E(m, n − 1) − E(m, n − 1) + E(m − 1, n − 1) k k

(1)

The number of bins contain m balls at (n−1)th round of toss subtract of expected number of bins that get a new ball becoming (m + 1) balls plus the expected number of bins which has (m−1) balls but get a new ball at nth round becoming m balls bin. It can be rewritten as: E(m, n) =

Cnm

1 1− k



n−m  m 1

k

(2)

It is laborious to check how many balls that each individual bin has. But to get the expected number of non-empty bins, all-ball bin (a single bin that takes all the balls) is enough for the questions on the extensive margin of trade: the incidence of non-zero trade flows, single-product, single destination exporting firm. For unequal sized bins, si is the 7

probability that a ball falls into bin i, which is the size of the bin. mi denotes the number of balls in bin i. The probability that bin i receives a least one ball is:

E (di |n) = 1 − P r (mi = 0|n) = 1 − (1 − si )n

(3)

The expected total number of non-empty bins is:

E (k|n) =

K X

[1 − (1 − si )n ]

(4)

i=1

The probability that single bin take all balls is

P r (k = 1|n) =

K X

P r (mi = n|n) =

K X

sni

(5)

i=1

i=1

The interactions between trade partner, firm, product are important to understand the trade structure . To capture more properties in data, two dimensional balls-and-bins model is built up for the convenience of conditional probability calculation. Data is divided into two dimensions(K ∗ T bins in total). The bin size on the two dimensions are {sj }Tj=1 {vi }K i=1 .

E (k|n) =

K T X X

[1 − (1 − vi sj )n ]

(6)

j=1 i=1

Given the number of balls information on one dimension {n1 , n2 , ..., nT }, the expected total number of non-empty bins is

E (k|n1 , n2 , ..., nT ) =

T X K X

[1 − (1 − si )nj ]

(7)

j=1 i=1

The expected number of single non-empty bins (one bin take all the balls on one 8

dimension, for example i dimension) is

P r (kt = 1|n1 , n2 , ..., nT ) =

T X K X nj

si

(8)

j=1 i=1

3

Data

3.1

Dataset Introduction

The shipment level trade data of China (2003-2008), released by the Chinese Customs Office, contains a number of variables: destination country/source country, HS-8 digit, firm ownership type, means of transportation, the local customs office, unit measure of goods, quantity, unit price, value, firm name and code and contact information of firms. For the purpose of the balls-and-bins model, I only use destination/source country, HS 8-digit product code, firm code, and value of shipment for the analysis. The dataset is organized monthly. For each entry of shipment, the data contains a unique destination/source country, HS 8-digit product code and firm code. For example, if a firm exports two types of goods ten times respectively in a month, in the dataset, there should be twenty entries to document all the transactions. Difference in HS classification systems is not a problem for any conclusion of balls and bins when I only discuss the Chinese data. It might be a problem when I make any crosscountry comparsions between China and the U.S. The harmonized system of product classification is comparable at 6-digit level across countries. For more sub-categories, different countries have their own classification systems accordingly. The HS classification system is defined by the U.S. at the 2-,4-,6-,10- digit level for both imports and exports. In China, HS classification is defined at 2-,4-,6-,8- digit level for both imports and exports. I only compare the export data of China and the U.S. Roughly, I claim Chinese HS-8 system is comparable with U.S. HS 10-digit system for product heterogeneity. There are

9

about 9, 000 different export while 18, 000 import HS codes in U.S. The Chinese HS-8 system has about 12, 000 codes. They are comparable in numbers of codes. In table 1 and table 2, I also list an example of trade classification located under Chapter 0101 of China and U.S. HS system. Since the paper only focuses on the extensive margin(not any particular product, trade partner or firm )as long as the heterogeneity of products is similar, comparison across countries makes sense. [Table 1 Here] [Table 2 Here]

3.2

Shipment Data

In the census data of the U.S. trade, only export shipments with value above $2500 and import shipments with value above $250 are reported. One of the advantages of Chinese trade data is that it is not truncated. However, there are 3.8% of export shipments below $100, mysteriously low value for international trade. On the one hand, the shipment value is directly related to the types of goods one country exports. On the other hand, there might be fixed effect related to certain products, customs office or possible mistakes in documenting. To understand the structure behind these low-value export shipments is independently an interesting topic to understand the entry cost and fixed cost of trade in China. Therefore, focusing on the balls-and-bins model, I drop the export shipments below $500, which take 11.83% in the total number of shipments, 0.04% in total export value. In table 5, I document the export products median shipment value of China. Fewer than 1.8% in export products have been dropped. [Table 3 Here] [Table 4 Here]

10

For the import data of China, first, I drop shipment entries below $250 using the U.S. standard, which makes up about 19.43% in the total number of shipments. Second, I drop the data that has China listed as the source country. Finally, I get the dataset to do the exercise of balls-and-bins model. [Table 5 Here]

3.3

Summary of China’s Exports and Imports

Table 6 summarizes the extensive margin of China’s imports and exports. I also cite the extensive margin of U.S. export(2005) from Armenter, Koren(2008) to make it easier to compare. China and the U.S. are similar traders in terms of the number of products, destination countries and firms. The U.S. has twice many shipments as China. China import data is smaller in every dimension. [Table 6 Here]

4

Trade Partner, Trade Product

Among the 233 export destination countries that China exports to, the top 5 destination countries make up 57.66% of China’s export value while the top 10 make up 69.55%. Among the 175 import source countries that China imports from, the top 5 trade partners make up 51.53% of import value while the top 10 make up 66.06%. At the country level, export and import data are skewed toward a few big countries. [Table 7 Here] [Table 8 Here] The top 5 China’s export categories at the 2-digit level makes up 54.39% of export value. Electronic machinary is the largest part. The top 5 import categories take only 13.09% 11

of import value. Roughly, from these two dimensions, China plays the role of the world plant which imports resources and exports manufacture products. [Table 9 Here] [Table 10 Here] In 2005, China exports 7054 types and imports 6951 types HS 8-digit goods. I focus on the 8-digit level products for most of the analysis. There are so many categories of products at 8-digit level. The biggest category individually only takes 3.79% of exports and 9.57% of imports.

5

Country-Product Combination

Most potential trade flows exist as zeros in data. The zero export flow of a country-product combination is defined as a certain product that has been exported from China to at least one country but not all. The zero export flow indicates that destination countries have zero demand for the product or that high trade cost prevents it. Similarly, zero import flow is defined for a product that has been imported by China from one country but not from all countries. The zero import flow indicates no supply of a certain good from a potential source country or high trade cost prevents it. Although I cannot identify trade cost reason from no supply or no demand reason with balls-and-bins, looking into zeros of trade flows can help us understand the gravity property of trade on the extensive margin.

5.1

Zeros at Country-Product Combination

The majority of country-product trade combinations are zeros. The more disaggregated levels we divide the products into, the more zeros come out. In the export data, zeros stop at the 2-digit level which means that China exports a relatively wide range of products 12

at the 2-digit level. In the import data, there are still a lot of zeros even at the 2-digit level which means that China imports from a relatively small range of suppliers. To predict the number of empty bins of trade flows, I use the trade share (the share of the number of shipments) across products to build up product bins, and the number of shipments each country has to get the number of balls for result report. I aslo check the result by using trade value share to build up the bins which assumes the uniform distribution of shipments across trade value. The results are very similar under both cases. By assuming that countries buy or sell the same proportion of goods from and to China, the balls-and-bins model starts from matching the aggregate trade shares across products and trade partners. [Table 11 Here] [Table 12 Here] Surprisingly, the balls-and-bins model captures the number of zeros well at different product levels for both imports and exports(see table 11 and table 12). The birthday paradox predicts that the first bin with more than one ball comes very early. Thus, to hit all the 365 birthday bins, a number much larger than 365 of balls is needed. Although the number of shipments are larger than the number of country-product bins for both import and export, it is still not enough to fill in the large number of bins. To hit a small bin, a much larger number of balls is needed. For the U.S. data, it also has prevalent country-product zeros both in data and model. Meanwhile the balls-and-bins model also systematically under-predicts the number of zeros to some extent at all levels except for import data at the 2-digit level. Why the ballsand-bins under-predicts zeros? The balls-and-bins assumes independence across balls to fall into bins. However, in real world, there is positive correlation between shipments to certain countries due to fixed trade cost. If China exports the first shipment to the U.S., 13

it is more likely that for the second shipment, China exports it to the U.S. rather than Mali due to fixed cost of building up trade relationship. By adding in positive correlations between balls falling into bins, more zeros will be generated. To capture the correlation, a theoretical trade model is required to specifically model fixed cost. Why the balls-and-bins model generate more zeros for import data at the 2-digit level than in data? One possible argument is for import, risk of trade plays a more important role than exports. China has more incentive to diversify source country trade partners than destination country trade partners. To capture these features requires a much richer model.

5.2

Gravity Equation for Country-Product Combination

Gravity equation is very powerful in matching trade volume. Trade volume between countries can be divided into types of goods traded, price and qunantity traded. Here, I focus on the first channel. Do gravity factors affect how many types of products China exports or imports to and from a certain country? [Figure 1 Here] [Figure 2 Here] The graphs show that more types of products are traded with closer economy. The ballsand-bins model starts from the number of shipments across countries, which is highly correlated with gravity factors. Therefore, no surprise it captures the incidence of nonzero trade flows well. For estimation, I use the gravity equation to describe trade partners’ effect on the incidence of non-zero trade of China. Real GDP and real GDP per capita of 2005 are extracted from Penn World table. They are used as the proxy for country level demand and supply effect. Distance is the proxy for trade cost. The distance data is from CEPII’s(the main independent French institute for research into international economics) geographic 14

dataset which uses the great circle distance. I divide the distance into four groups at the natural gaps: 1–4, 000km, 4, 000–8, 500km, 8, 500–14, 000km, over 14, 000km. The distance groups are in the appendix. I use the top 100 trade partners of China with avaible measure of distance and GDP which make up the majority of trade volume of China. The closer and the richer the trade partner is, the more types of goods will be traded which means the larger the incidence of non-zero trade flow is. [Table 13 Here] [Table 14 Here] The balls-and-bins model matches the gravity equation qualitatively for the sign of real GDP and distance. Controlling for the number of shipments of each trade partners to get the number of balls for each row, assuming that they buy the same proportional of goods as China’s total number of shipments share of goods to build up product bins, I calculate out the expected number of non-empty bins for each trade partner of China, which is the estimation of the incidence of non-zero trade flows. I take the expected number of non-empty bins as the dependent variable and check its relationship with gravity variables. Comparing the data result and balls-and-bins prediction, the balls-and-bins model matches well qualitatively for both imports and exports. The distance effect is under predicated for data by balls-and-bins. The same type of underpredication also exists for the U.S. data. Why the balls-and-bins under estimate the influence of distance? It might be related to the assumption that countries buy or sell the same share of products from and to China. Different product shares across countries are correlated with the distance effect which has not been captured by the balls-and-bins model.

15

6 6.1

Firm-Country Combination Zeros at Firm-Country Level

The number of trade firms is much larger than the number of trade products which means there are many more bins for the firm-country combinations than the product-country combinations. Meanwhile, the median firm only exports to three destination countries. Therefore, there are more firm-country zeros in data. Firm bins are built up by using firm trade share (number of shipments share). The number of shipments for each country is used as the number of balls for each row. The balls-and-bins model captures the number of firm-country zeros well. Once again it under predicts the number of zeros. Firm-country result is robust for the balls-and-bins model across the U.S. and China. [Table 15 Here]

6.2

Gravity Equation of Firm-Country Combination [Figure 3 Here] [Figure 4 Here]

From figure 3 and figure 4, we can see that fewer firms trade with farther destination or source countries controlling GDP. Assuming there is no special difference among firms in determining the trade partners other than the destination and source country properties. The incidence of non-zero trade flow for firm-country combinations (trade firm number in China for a certain trade partner/ total number of trade firms)is regressed on the gravity variables using linear probability model. Data shows that more firms trade with richer and closer trade partners as listed in the first column of table 16 and table 17. [Table 16 Here]

16

[Table 17 Here] For the balls-and-bins model, the expected number of non-empty bins for each country can be calculated out once the bin and number of balls are known. The incidence of nonzero firm-country trade flow estimate of balls-and-bins model is the expected number of non-empty bins divided by the total number of firms. The predication of balls-and-bins is consistent with the finding in data qualitatively. The specious result of the table is the parameter in front of the distance dummy(8, 500