Advances in Patent Research: Data, Tools, and Policy
Academy Of Management PDW
August, 2013 Orlando, Florida 1
Today’s workshop • 2:30pm-‐3:00pm: Introduc)on and Fundamentals of Patent Research – Kwanghui Lim (with Michael Roach):
• 3:00pm-‐4:15 pm: Presenta)ons (20 minutes plus 5 minutes Q&A each) – Stuart Graham: USPTO as Data Provider – Lee Fleming: Tools for Patent Analysis – Kenneth Huang: Ins7tu7onal regime shi< in intellectual property rights and paten7ng strategies of firms in China
• 4:15 pm-‐5:00 pm: Future Direc)ons for Patent Research and Panel Discussion – Rosemarie Ziedonis: Current state and future direc7ons in patent research – Panel discussion (by presenters and discussant) and audience-‐ driven Q&A
2
IntroducKon and Fundamentals of Patent Research
Kwanghui Lim, Melbourne Business School & IPRIA.ORG and Michael Roach, Duke University
August, 2013 2:30 – 3:00 pm
3
Agenda
• Brief introducKon to patent data – What is a patent? – Why patent based measures? Why not? – How is patent data used?
• The use of patent citaKons – LimitaKons of citaKon-‐based measures – Sources of measurement error – Lessons from Roach and Cohen.
• Lessons from a personal example – Hsu & Lim’s ‘knowledge brokering’ paper (Organiza7on Science, forthcoming)
4
I. BRIEF INTRODUCTION TO PATENT DATA 5
What is a Patent?
• A form of Intellectual property
– The right to exclude others for a limited period of Kme, in exchange for disclosing an invenKon. – Criteria: Novel, useful, non-‐obvious. – Intangible, unlike a house or car. – Tradable: you can buy, sell or license a patent. – Granted at the naKonal level (no such thing as an “internaKonal” patent). Extent of protecKon and enforcement varies by jurisdicKon.
• Apart from being used to measure innovaKon related acKviKes, patents are interesKng in and of themselves: – Recent controversy over patent thickets, gene patents, sobware patents, business process patents, etc... 6
Patents as Measures of InnovaKon • Advantages of patent-‐based measures
– Readily available and standardized – Archival across firms, industries, and over Kme – Can be used in creaKve ways to measure many phenomena
• Used extensively to measure… – – – – – –
InnovaKve outputs and firm innovaKve performance Knowledge flows into firms, knowledge flows out of firms Firm knowledge search strategies Inventor mobility PatenKng acKviKes, patenKng strategies, signals of quality ScienKfic human capital, collaboraKon Kes
• Drama)c growth in use of patent data
– Title/abstract/keyword search “patent” results: SMJ (463), Management Science (229), Research Policy (347), OrganizaKon Science (137), AMJ (138) 7
Legal versus Management Approaches • In a legal sense, what mamers most is the claims secKon. • But for management research, the background and cita)on secKons are quite important, containing informaKon about: – Inventors, Assignees (names, addresses). – ApplicaKon and issue dates. – Technology classes. – CitaKons.
• Perhaps this is why patent databases are so messy – they were not designed for management research in the first place! 8
Anatomy of a Patent References Non-Patent
Patent References
9
Claims for US patent 6673986 : transgenic mouse What is claimed is: • 1. A transgenic mouse comprising in its germline a modified genome wherein said modificaKon comprises inacKvated endogenous immunoglobulin heavy chain loci in which all of the J segment genes from both copies of the immunoglobulin heavy chain locus are deleted to prevent rearrangement and to prevent formaKon of a transcript of a rearranged locus and the expression of an endogenous immunoglobulin heavy chain from the inacKvated loci. • 2. The mouse of claim 1 wherein said modificaKon further comprises an inacKvated endogenous immunoglobulin light chain locus in which all of the J segment genes from at least one copy of an immunoglobulin light chain locus are deleted to prevent rearrangement and to prevent formaKon of a transcript of a rearranged locus and the expression of an endogenous immunoglobulin light chain from the inacKvated locus. • 3. The mouse of claim 1 wherein said modificaKon comprises inacKvated endogenous immunoglobulin light chain loci in which all of the J segment genes from both copies of the immunoglobulin light chain locus are deleted to prevent rearrangement and to prevent formaKon of a transcript of a rearranged locus and the expression of an endogenous immunoglobulin light chain from the inacKvated loci. 10
Resources • Patent PDW reading list – hmp://patentpdw.wordpress.com – Contains a comprehensive reading list, links to patent databases and websites, and a historical record of AOM patent PDWs.
• Free datasets – NBER datafiles • … plus Inventor matching data by Lee Fleming and others – USPTO, EPO and other patent offices. Ask Stuart Graham for details. – NUS-‐MBS – Patstat (global) – Google patents, including bulk source file downloads
• Commercial datasets: – Thompson Reuters: Derwent/Delphion – Patsnap, etc … 11
II. THE USE OF CITATION BASED MEASURES 12
CitaKon Measures of Knowledge Flows • Jaffe, Trajtenberg & Henderson (QJE 1993) – Seminal paper that established use of citaKons (to patents) as “paper trail” to measure spillovers to firms – Key finding is that nonpecuniary spillovers are geographically localized – 4,663 cites on Google Scholar (2012). 5,352 cites as of Aug 2013.
• Uses of cita)on-‐based measures – Knowledge flows within and between firms (Almeida & Kogut 1999; Rosenkopf & Almeida 2003; Duguet & MacGarvie 2005; Singh 2005; Agarwal, Ganco & Ziedonis 2009; Singh & Agrawal 2011) – Knowledge flows from universi7es (Narin et al. 1997; Henderson et al. 1998; Mowery et al. 2002; Gimelman & Kogut 2003; Sorenson & Fleming 2004) – Knowledge flows across geographic boundaries (Duguet & MacGarvie 2005; Singh 2005; MacGarvie 2006; Huang & Li 2012) 13
Several Causes for Concern • Not all inven)ons are patentable or patented – Unpatented invenKons are different than patented invenKons – Unpatented invenKons may use different knowledge – Firms vary, even within industry, in their propensity to patent
• Not all knowledge flows are citable or cited – Tacit knowledge and know-‐how are not in citable form – Not all codified knowledge is publicly available (trade secrets) – Duty to disclose prior art is different from norms to cite publicaKons
• Cita)ons may be influenced by firm strategies – Appropriability strategies influence both patenKng and ciKng – CiKng strategies to strengthen patents against liKgaKon
• Patent examiners add cita)ons – >40% of citaKons are added by examiners; systemaKc difference between examiner and firm added citaKons (Alcacer & Gimelman 2006; Alcacer, Gimelman & Sampat 2009) 14
AssumpKons & Validity of CitaKons • Key assump)ons
– CitaKons are useful, but imperfect, measure of knowledge flows – ImperfecKons are simply “noise” and are not correlated with other observable or unobservable variables, nor with other variables unrelated with knowledge flows – Coefficient esKmates likely amenuated, but unbiased
• Limited evidence on validity of assump)ons
– Jaffe, Fogarty & Banks (1998) and Jaffe, Trajtenberg & Fogarty (2002) find evidence of a systemaKc relaKonship between citaKons and knowledge flows, but much of the variaKon is unexplained and we know limle about whether it is noise or systemaKc bias
• Cri)cal ques)ons:
– Are cita7ons noisy, or are they systema7cally biased? – If cita7ons are biased, what are the sources of bias? 15
Roach and Cohen (Mgt Sci 2013) • “Lens or Prism” paper – Evaluates patent citaKons as a measure of knowledge flows from public research – Matched patent-‐survey data • Compare backward cita)ons to a different measure of knowledge flows, gathered using the Carnegie Mellon survey of R&D managers.
• Empirical Analysis – EsKmate measurement model to explore where citaKons and survey measures agree and disagree, suggesKng possible sources of measurement error 16
Measurement Error in CitaKons • “True” knowledge flows: k = β1X1 + β2X2
– Assume that some dimensions of knowledge flows are captured by citaKons (X1), and other dimensions are not (X2)
• Cita)ons as a measure of k: kc = α1X1 + µc
– If factors in X2 are correlated with true knowledge flows but not citaKons, then they are potenKal source of error (errors of omission) – CitaKons may be correlated with factors not related to true knowledge flows (e.g., patent strategies), denoted as P, that are oben omimed in regressions (errors of commission) – Thus, the composite error term is: µχ = γcP – β2X2 + vc – Both X2 and P are possible sources of nonclassical measurement error that may not only bias esKmates of kc but also other correlates of either knowledge flows or patenKng 17
Data • Matched patent-‐survey data
– Carnegie Mellon Survey (CMS) matched to NBER/Delphion patent data and Science CitaKon Index (SCI) publicaKons for non-‐patent refs. – We assume that citaKons and CMS are independent measures of knowledge flows and not subject to the same sources of error
• Measures of knowledge flows (LHS variables)
– CitaKons to patent references (PR) assigned to university, federal lab, or research insKtute (count) – CitaKons to non-‐patent references (NPR) where author is affiliated with university, federal lab, or research insKtute (count) – Survey-‐reported fracKon of R&D projects that use public research (5-‐point categorical scale)
• Objec)ves of analysis
– IdenKfy elements of X1 (variables citaKons should reflect and do) – IdenKfy elements of X2 (variables citaKons should reflect but do not) – IdenKfy elements of P (variables citaKons should not reflect but do) 18
Mean Comparison by Industry % Patents that Cite Public Research (Patent)
100%
Biotechnology 80%
60%
Pharmaceuticals
Aerospace 40%
Medical Devices Chemicals Semiconductors/Computers
20%
Automobiles 0% 0%
20%
40%
60%
80%
100%
% R&D Projects that Use Public Research (Survey)
Rescaled to midpoints for comparison
19
Independent Variables
§ Variables correlated with knowledge flows (X1, X2) – Channels of knowledge flows • Open science, private/contract-‐based interacKons, industrial S&E PhDs
– Uses of knowledge flows in firm R&D
• SuggesKng new R&D project or soluKons to exisKng R&D projects
– Composi7on of firm R&D ac7vity • Basic research
§ Variables not correlated w/knowledge flows (P) – Appropriability mechanisms • EffecKveness of patents and secrecy
– Strategic ci7ng prac7ces • Firm ciKng propensity
20
Summary of Results • What cita)ons capture (X1) – Knowledge flows through channels of open science (e.g. publicaKons) that tend to suggest new project ideas for firm’s applied research (i.e. open, codified knowledge with direct impact)
• What cita)ons don’t capture, but should (X2, errors of omission) – Knowledge flows through private, contract-‐based rela)onships, especially that used to solve technical problems (i.e., undisclosed documents, tacit knowledge and know-‐how with indirect impact) – Knowledge flows to firm basic research acKvity (i.e., not patented)
• What cita)ons capture, but shouldn’t (P, errors of commission) – Differences in appropriability and firms’ ciKng strategies – CitaKon pracKces Ked more to scienKfic norms to cite prior research may overstate the actual use of public research 21
ImplicaKons for CitaKon Measures • General findings from assessment of cita)ons – (Backward) patent citaKons are not simply noisy measures of knowledge flows but also biased, although magnitude will vary with specificaKon
• Errors of Omission
– Dimensions of knowledge flows that citaKons do not reflect but should, such as tacit knowledge and intermediate knowledge inputs to firm R&D – Likely correlated with other predictors of innovaKve performance, and thus are potenKal sources of omimed variable bias
• Errors of Commission
– Factors associated with patenKng and ciKng strategies that are not related with knowledge flows are – PotenKal source of nonclassical measurement error that may bias esKmates in either direcKon, especially when patent-‐based measures are used on both the LHS (citaKon-‐weighted patent counts) and RHS (backward citaKons) 22
ImplicaKons • Sources of errors of omission and commission should guide thinking about limitaKons in citaKons and possible controls – To the extent that geographically proximate knowledge flows are tacit or not cited, citaKons likely understate proximity effects and may be systemaKcally biased
• Scholars should give careful consideraKon to what citaKons can and cannot measure and interpret results accordingly • Use citaKons as specific, rather than broad, measures and discuss possible limitaKons 23
III. LESSONS FROM HSU & LIM PAPER ON ‘KNOWLEDGE BROKERING’ 24
Knowledge Brokering • Hsu and Lim, “Knowledge Brokering and OrganizaKonal InnovaKon: Founder ImprinKng Effects” – First presented at this AOM PDW circa 2005! – Forthcoming, Organiza7on Science aber many revisions and improvements. – A difficult journey but we hope the lessons we learned will be useful to you.
• Knowledge brokering is the ability to effecKvely apply knowledge from one technical domain to innovate in another. – Do founding condiKons shape its use in firms? – what are the organizaKonal consequences (citaKons)?
25
“I gave that Citroen suspension system as a quiz in acoustics to my students," Dr. Bose says. "The same mathematical principles apply.” Source: http://www.automobilemag.com/news/0410_bose/
The approach • We invesKgate how organizaKonal innovaKon outcomes vary by founders’ iniKal mode of venture ideaKon. – Compare how firms started with knowledge brokering-‐ based ideaKon differ in their methods of sustaining ongoing knowledge brokering capacity as compared to firms not started in such a manner. – Track all the start-‐up biotechnology firms founded to commercialize the then-‐emergent rDNA technology (iniKal knowledge brokers) together with a contemporaneously founded biotechnology firms that did not license rDNA technology (iniKal non-‐brokers). 27
Key results A. Ongoing knowledge brokering has an inverted U-‐shaped relaKonship with innovaKve performance in general –
More is not necessarily bemer.
B. IniKal knowledge brokers have a posiKve imprinKng effect on their organizaKons’ search pamerns over Kme, resulKng in superior performance relaKve to non-‐brokers –
the means by which entrepreneurial opportunity idenKficaKon takes place has long-‐lasKng performance consequences
C. IniKal non-‐brokers rely more on external channels of sourcing knowledge, such as hiring technical staff, relaKve to iniKal brokers, reinforcing the imprinKng interpretaKon 28
Results Predicted no. of forward citations within 5 years
3
Initial brokers 2.5
2
1.5
Initial non-brokers 1
0.5
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Ongoing firm knowledge brokering
0.8
0.9
1 29
Measuring knowledge brokering • Premise: the extent to which patents and firms cite patents in different technical areas indicates the degree of knowledge brokering from those areas. – Firm-‐level knowledge brokering: For firm i in year t, firm knowledge brokering is defined as (the number of backward citaKons to patents in primary US classes firm i did not patent in during year t) divided by (the number of backward citaKons made by firm i in year t) – Firm knowledge brokering stock: StarKng from its founding year, each firm’s knowledge brokering stock is calculated as the cumulaKve sum over previous years of (firm knowledge brokeringit x number of patentsit) • We include an exponenKal depreciaKon parameter in compuKng these stocks. We vary the depreciaKon parameter from 0 to 20% to test robustness, in line with the 20% rate used by Macher and Boerner for the pharmaceuKcal industry and the 15% depreciaKon rate for patent stocks used by Hall et al. (2005) to accommodate the possibility that there could be a degree of organizaKonal “forgewng” over Kme.
Patent-‐related methodological notes • Use of a new citaKon based measure. – Earlier versions of the manuscript also include a vector-‐based measure. Many robustness checks.
• IdenKfying and exploiKng an appropriate empirical sewng: firms founded to commercialize the Cohen-‐Boyer invenKon, and compared to a sample of non-‐brokering firms that were not set up as brokers. • A new measure of technical labor mobility based on prior tech overlap. • Overlap with iniKal technology focus, which is defined as the share of firms’ patents with the same technology classes with those applied for in its first three years since founding. 31
Key Lessons from our journey (1) • Facing familiar challenges in patent research – Controls for many other patent measures including patent originality, the Fleming-‐Sorenson complexity and ease of recombinaKon measures, citaKons to non-‐patent references, wholesale vs parKal brokering. – Use linked databases at mulKple levels of analysis: • • • •
Firm: alliance, venture capital, and founder data, therapeuKcs. Patent level: technology classes, dates. Inventor level data: idenKty, mobility. Backwards and forward citaKons: dyads and removing self-‐cites.
– Matching names of inventors across patents (hard!) – High computaKonal intensity of database, costly to make changes during R&R if you are not enKrely organized.
32
Key Lessons from our journey (2)
• Simple measures have to convince
– Using a simple measure, based on proporKon of prior cites in the same class, versus the angular measure based on vectors in earlier drabs.
• Tie patent data to non patent data – We use a rich set of alternates including comprehensive firm level profiles. It isn’t just a ‘patent’ study although we use patent data extensively. – It is also not just about large samples: ours is a study of a small number of firms (but many patents & citaKons). • But we know the story of each one of them and are confident of what is really happening “behind the data”.
• Ruling out alternaKve explanaKons – We use all sorts of techniques to rule out counter arguments, including threshold regressions, controls for self-‐citaKons, measures for patent thickets, hand-‐collected firm histories (how do we know firm x really did not broker). – i.e., just showing a correlaKon is not enough! 33
Finally • Patent data may have limitaKons, but it is remains valuable. – E.g., citaKons may be problemaKc but someKmes the alternaKve is no measure at all. Even if a survey is possible, it may also be subject to biases, e.g., recall bias. – E.g., Kapoor & Lim (AMJ, 2007) – we trace inventors aber their firms get acquired using patent data. It would be hard doing it almost any other way, and we are able to offer one of the few studies extant that looks at acquisiKons from the acquired firm’s point of view, rather than the acquiring firm’s point of view. 34