Advances in Patent Research: Data, Tools, and Policy

Advances  in  Patent  Research:   Data,  Tools,  and  Policy      

Academy  Of  Management  PDW      

August,  2013   Orlando,  Florida   1

Today’s  workshop   •  2:30pm-­‐3:00pm:  Introduc)on  and  Fundamentals  of  Patent   Research   –  Kwanghui  Lim  (with  Michael  Roach):  

•  3:00pm-­‐4:15  pm:  Presenta)ons  (20  minutes  plus  5  minutes   Q&A  each)     –  Stuart  Graham:  USPTO  as  Data  Provider     –  Lee  Fleming:  Tools  for  Patent  Analysis     –  Kenneth  Huang:  Ins7tu7onal  regime  shi<  in  intellectual  property  rights  and   paten7ng  strategies  of  firms  in  China    

•  4:15  pm-­‐5:00  pm:  Future  Direc)ons  for  Patent  Research  and   Panel  Discussion     –  Rosemarie  Ziedonis:  Current  state  and  future  direc7ons  in  patent  research     –  Panel  discussion  (by  presenters  and  discussant)  and  audience-­‐  driven  Q&A    

2

IntroducKon  and  Fundamentals  of   Patent  Research  

     

Kwanghui  Lim,  Melbourne  Business  School  &  IPRIA.ORG     and   Michael  Roach,  Duke  University    

August,  2013   2:30  –  3:00  pm  

3

Agenda  

•  Brief  introducKon  to  patent  data   –  What  is  a  patent?     –  Why  patent  based  measures?  Why  not?   –  How  is  patent  data  used?  

•  The  use  of  patent  citaKons   –  LimitaKons  of  citaKon-­‐based  measures   –  Sources  of  measurement  error   –  Lessons  from  Roach  and  Cohen.  

•  Lessons  from  a  personal  example     –  Hsu  &  Lim’s  ‘knowledge  brokering’  paper  (Organiza7on   Science,  forthcoming)  

4

I.  BRIEF  INTRODUCTION  TO  PATENT   DATA   5

What  is  a  Patent?  

•  A  form  of  Intellectual  property  

–  The  right  to  exclude  others  for  a  limited  period  of  Kme,  in   exchange  for  disclosing  an  invenKon.     –  Criteria:  Novel,  useful,  non-­‐obvious.   –  Intangible,  unlike  a  house  or  car.   –  Tradable:  you  can  buy,  sell  or  license  a  patent.   –  Granted  at  the  naKonal  level  (no  such  thing  as  an   “internaKonal”  patent).  Extent  of  protecKon  and  enforcement   varies  by  jurisdicKon.  

•  Apart  from  being  used  to  measure  innovaKon  related   acKviKes,  patents  are  interesKng  in  and  of  themselves:   –  Recent  controversy  over  patent  thickets,  gene  patents,   sobware  patents,  business  process  patents,  etc...   6

Patents  as  Measures  of  InnovaKon   •  Advantages  of  patent-­‐based  measures  

–  Readily  available  and  standardized   –  Archival  across  firms,  industries,  and  over  Kme   –  Can  be  used  in  creaKve  ways  to  measure  many  phenomena  

•  Used  extensively  to  measure…   –  –  –  –  –  – 

InnovaKve  outputs  and  firm  innovaKve  performance   Knowledge  flows  into  firms,  knowledge  flows  out  of  firms   Firm  knowledge  search  strategies   Inventor  mobility   PatenKng  acKviKes,  patenKng  strategies,  signals  of  quality   ScienKfic  human  capital,  collaboraKon  Kes  

•  Drama)c  growth  in  use  of  patent  data  

–  Title/abstract/keyword  search  “patent”  results:  SMJ  (463),   Management  Science  (229),  Research  Policy  (347),  OrganizaKon   Science  (137),  AMJ  (138)     7

Legal  versus  Management  Approaches   •  In  a  legal  sense,  what  mamers  most  is  the  claims   secKon.   •  But  for  management  research,  the  background  and   cita)on  secKons  are  quite  important,  containing   informaKon  about:   –  Inventors,  Assignees  (names,  addresses).   –  ApplicaKon  and  issue  dates.   –  Technology  classes.   –  CitaKons.  

•  Perhaps  this  is  why  patent  databases  are  so  messy  –   they  were  not  designed  for  management  research  in   the  first  place!   8

Anatomy  of  a  Patent   References Non-Patent

Patent References

9

Claims  for  US  patent  6673986  :  transgenic  mouse   What  is  claimed  is:     •  1.  A  transgenic  mouse  comprising  in  its  germline  a  modified  genome  wherein  said   modificaKon  comprises  inacKvated  endogenous  immunoglobulin  heavy  chain  loci   in  which  all  of  the  J  segment  genes  from  both  copies  of  the  immunoglobulin  heavy   chain  locus  are  deleted  to  prevent  rearrangement  and  to  prevent  formaKon  of  a   transcript  of  a  rearranged  locus  and  the  expression  of  an  endogenous   immunoglobulin  heavy  chain  from  the  inacKvated  loci.   •  2.  The  mouse  of  claim  1  wherein  said  modificaKon  further  comprises  an   inacKvated  endogenous  immunoglobulin  light  chain  locus  in  which  all  of  the  J   segment  genes  from  at  least  one  copy  of  an  immunoglobulin  light  chain  locus  are   deleted  to  prevent  rearrangement  and  to  prevent  formaKon  of  a  transcript  of  a   rearranged  locus  and  the  expression  of  an  endogenous  immunoglobulin  light  chain   from  the  inacKvated  locus.   •  3.  The  mouse  of  claim  1  wherein  said  modificaKon  comprises  inacKvated   endogenous  immunoglobulin  light  chain  loci  in  which  all  of  the  J  segment  genes   from  both  copies  of  the  immunoglobulin  light  chain  locus  are  deleted  to  prevent   rearrangement  and  to  prevent  formaKon  of  a  transcript  of  a  rearranged  locus  and   the  expression  of  an  endogenous  immunoglobulin  light  chain  from  the  inacKvated   loci.   10

Resources   •  Patent  PDW  reading  list   –  hmp://patentpdw.wordpress.com   –  Contains  a  comprehensive  reading  list,  links  to  patent  databases  and   websites,  and  a  historical  record  of  AOM  patent  PDWs.  

•  Free  datasets   –  NBER  datafiles     •  …  plus  Inventor  matching  data  by  Lee  Fleming  and  others   –  USPTO,  EPO  and  other  patent  offices.  Ask  Stuart  Graham  for  details.   –  NUS-­‐MBS   –  Patstat  (global)   –  Google  patents,  including  bulk  source  file  downloads  

•  Commercial  datasets:     –  Thompson  Reuters:  Derwent/Delphion   –  Patsnap,  etc  …   11

II.  THE  USE  OF  CITATION  BASED   MEASURES   12

CitaKon  Measures  of  Knowledge  Flows   •  Jaffe,  Trajtenberg  &  Henderson  (QJE  1993)   –  Seminal  paper  that  established  use  of  citaKons  (to  patents)  as     “paper  trail”  to  measure  spillovers  to  firms   –  Key  finding  is  that  nonpecuniary  spillovers  are  geographically  localized   –  4,663  cites  on  Google  Scholar  (2012).  5,352  cites  as  of  Aug  2013.  

•  Uses  of  cita)on-­‐based  measures   –  Knowledge  flows  within  and  between  firms  (Almeida  &  Kogut  1999;   Rosenkopf  &  Almeida  2003;  Duguet  &  MacGarvie  2005;  Singh  2005;   Agarwal,  Ganco  &  Ziedonis  2009;  Singh  &  Agrawal  2011)   –  Knowledge  flows  from  universi7es  (Narin  et  al.  1997;  Henderson  et  al.   1998;  Mowery  et  al.  2002;  Gimelman  &  Kogut  2003;  Sorenson  &   Fleming  2004)   –  Knowledge  flows  across  geographic  boundaries  (Duguet  &  MacGarvie   2005;  Singh  2005;  MacGarvie  2006;  Huang  &  Li  2012)   13

Several  Causes  for  Concern   •  Not  all  inven)ons  are  patentable  or  patented     –  Unpatented  invenKons  are  different  than  patented  invenKons   –  Unpatented  invenKons  may  use  different  knowledge   –  Firms  vary,  even  within  industry,  in  their  propensity  to  patent  

•  Not  all  knowledge  flows  are  citable  or  cited   –  Tacit  knowledge  and  know-­‐how  are  not  in  citable  form   –  Not  all  codified  knowledge  is  publicly  available  (trade  secrets)   –  Duty  to  disclose  prior  art  is  different  from  norms  to  cite  publicaKons  

•  Cita)ons  may  be  influenced  by  firm  strategies   –  Appropriability  strategies  influence  both  patenKng  and  ciKng   –  CiKng  strategies  to  strengthen  patents  against  liKgaKon  

•  Patent  examiners  add  cita)ons   –  >40%  of  citaKons  are  added  by  examiners;  systemaKc  difference   between  examiner  and  firm  added  citaKons  (Alcacer  &  Gimelman  2006;   Alcacer,  Gimelman  &  Sampat  2009)   14

AssumpKons  &  Validity  of  CitaKons   •  Key  assump)ons  

–  CitaKons  are  useful,  but  imperfect,  measure  of  knowledge  flows   –  ImperfecKons  are  simply  “noise”  and  are  not  correlated  with  other   observable  or  unobservable  variables,  nor  with  other  variables  unrelated   with  knowledge  flows   –  Coefficient  esKmates  likely  amenuated,  but  unbiased  

•  Limited  evidence  on  validity  of  assump)ons  

–  Jaffe,  Fogarty  &  Banks  (1998)  and  Jaffe,  Trajtenberg  &   Fogarty  (2002)  find  evidence  of  a  systemaKc   relaKonship  between  citaKons  and  knowledge  flows,   but  much  of  the  variaKon  is  unexplained  and  we  know   limle  about  whether  it  is  noise  or  systemaKc  bias  

•  Cri)cal  ques)ons:  

–  Are  cita7ons  noisy,  or  are  they  systema7cally  biased?   –  If  cita7ons  are  biased,  what  are  the  sources  of  bias?   15

Roach  and  Cohen  (Mgt  Sci  2013)   •  “Lens  or  Prism”  paper   –  Evaluates  patent  citaKons  as  a  measure  of   knowledge  flows  from  public  research     – Matched  patent-­‐survey  data   •  Compare  backward  cita)ons  to  a  different  measure  of  knowledge   flows,  gathered  using  the  Carnegie  Mellon  survey  of  R&D   managers.  

•  Empirical  Analysis   –  EsKmate  measurement  model  to  explore  where  citaKons   and  survey  measures  agree  and  disagree,  suggesKng   possible  sources  of  measurement  error   16

Measurement  Error  in  CitaKons   •  “True”  knowledge  flows:    k  =  β1X1  +  β2X2  

–  Assume  that  some  dimensions  of  knowledge  flows  are  captured   by  citaKons  (X1),  and  other  dimensions  are  not  (X2)  

•  Cita)ons  as  a  measure  of  k:    kc  =  α1X1  +  µc  

–  If  factors  in  X2  are  correlated  with  true  knowledge  flows  but  not   citaKons,  then  they  are  potenKal  source  of  error  (errors  of   omission)   –  CitaKons  may  be  correlated  with  factors  not  related  to  true   knowledge  flows  (e.g.,  patent  strategies),  denoted  as  P,  that  are   oben  omimed  in  regressions  (errors  of  commission)   –  Thus,  the  composite  error  term  is:  µχ  =  γcP  –  β2X2  +  vc     –  Both  X2  and  P  are  possible  sources  of  nonclassical  measurement   error  that  may  not  only  bias  esKmates  of  kc  but  also  other   correlates  of  either  knowledge  flows  or  patenKng   17

Data   •  Matched  patent-­‐survey  data  

–  Carnegie  Mellon  Survey  (CMS)  matched  to  NBER/Delphion  patent   data  and  Science  CitaKon  Index  (SCI)  publicaKons  for  non-­‐patent   refs.   –  We  assume  that  citaKons  and  CMS  are  independent  measures  of   knowledge  flows  and  not  subject  to  the  same  sources  of  error  

•  Measures  of  knowledge  flows  (LHS  variables)  

–  CitaKons  to  patent  references  (PR)  assigned  to  university,  federal  lab,  or  research  insKtute   (count)   –  CitaKons  to  non-­‐patent  references  (NPR)  where  author  is  affiliated  with  university,  federal   lab,  or  research  insKtute  (count)   –  Survey-­‐reported  fracKon  of  R&D  projects  that  use  public  research  (5-­‐point  categorical   scale)  

•  Objec)ves  of  analysis  

–  IdenKfy  elements  of  X1  (variables  citaKons  should  reflect  and  do)   –  IdenKfy  elements  of  X2  (variables  citaKons  should  reflect  but  do   not)   –  IdenKfy  elements  of  P  (variables  citaKons  should  not  reflect  but   do)   18

Mean  Comparison  by  Industry   % Patents that Cite Public Research (Patent)

100%

Biotechnology 80%

60%

Pharmaceuticals

Aerospace 40%

Medical Devices Chemicals Semiconductors/Computers

20%

Automobiles 0% 0%

20%

40%

60%

80%

100%

% R&D Projects that Use Public Research (Survey)

Rescaled to midpoints for comparison

19

Independent  Variables  

§ Variables  correlated  with  knowledge  flows  (X1,  X2)   –  Channels  of  knowledge  flows   •  Open  science,  private/contract-­‐based  interacKons,   industrial  S&E  PhDs  

–  Uses  of  knowledge  flows  in  firm  R&D  

•  SuggesKng  new  R&D  project  or  soluKons  to  exisKng   R&D  projects  

–  Composi7on  of  firm  R&D  ac7vity   •  Basic  research  

§ Variables  not  correlated  w/knowledge  flows  (P)   –  Appropriability  mechanisms   •  EffecKveness  of  patents  and  secrecy  

–  Strategic  ci7ng  prac7ces   •  Firm  ciKng  propensity  

20

Summary  of  Results   •  What  cita)ons  capture  (X1)   –  Knowledge  flows  through  channels  of  open  science  (e.g.   publicaKons)  that  tend  to  suggest  new  project  ideas  for  firm’s   applied  research  (i.e.  open,  codified  knowledge  with  direct  impact)  

•  What  cita)ons  don’t  capture,  but  should  (X2,  errors  of  omission)   –  Knowledge  flows  through  private,  contract-­‐based  rela)onships,   especially  that  used  to  solve  technical  problems  (i.e.,  undisclosed   documents,  tacit  knowledge  and  know-­‐how  with  indirect  impact)   –  Knowledge  flows  to  firm  basic  research  acKvity  (i.e.,  not  patented)  

•  What  cita)ons  capture,  but  shouldn’t  (P,  errors  of  commission)   –  Differences  in  appropriability  and  firms’  ciKng  strategies   –  CitaKon  pracKces  Ked  more  to  scienKfic  norms  to  cite  prior   research  may  overstate  the  actual  use  of  public  research     21

ImplicaKons  for  CitaKon  Measures   •  General  findings  from  assessment  of  cita)ons   –  (Backward)  patent  citaKons  are  not  simply  noisy  measures  of   knowledge  flows  but  also  biased,  although  magnitude  will  vary  with   specificaKon  

•  Errors  of  Omission  

–  Dimensions  of  knowledge  flows  that  citaKons  do  not  reflect  but   should,  such  as  tacit  knowledge  and  intermediate  knowledge  inputs   to  firm  R&D   –  Likely  correlated  with  other  predictors  of  innovaKve  performance,   and  thus  are  potenKal  sources  of  omimed  variable  bias  

•  Errors  of  Commission  

–  Factors  associated  with  patenKng  and  ciKng  strategies  that  are  not   related  with  knowledge  flows  are     –  PotenKal  source  of  nonclassical  measurement  error  that  may  bias   esKmates  in  either  direcKon,  especially  when  patent-­‐based   measures  are  used  on  both  the  LHS  (citaKon-­‐weighted  patent   counts)  and  RHS  (backward  citaKons)   22

ImplicaKons   •  Sources  of  errors  of  omission  and  commission  should   guide  thinking  about  limitaKons  in  citaKons  and   possible  controls   –  To  the  extent  that  geographically  proximate  knowledge   flows  are  tacit  or  not  cited,  citaKons  likely  understate   proximity  effects  and  may  be  systemaKcally  biased  

•  Scholars  should  give  careful  consideraKon  to  what   citaKons  can  and  cannot  measure  and  interpret   results  accordingly   •  Use  citaKons  as  specific,  rather  than  broad,  measures   and  discuss  possible  limitaKons   23

III.  LESSONS  FROM  HSU  &  LIM   PAPER  ON  ‘KNOWLEDGE   BROKERING’   24

Knowledge  Brokering   •  Hsu  and  Lim,  “Knowledge  Brokering  and   OrganizaKonal  InnovaKon:  Founder  ImprinKng   Effects”   –  First  presented  at  this  AOM  PDW  circa  2005!     –  Forthcoming,  Organiza7on  Science  aber  many  revisions   and  improvements.   –   A  difficult  journey  but  we  hope  the  lessons  we  learned  will   be  useful  to  you.  

•  Knowledge  brokering  is  the  ability  to  effecKvely   apply  knowledge  from  one  technical  domain  to   innovate  in  another.   –  Do  founding  condiKons  shape  its  use  in  firms?   –  what  are  the  organizaKonal  consequences  (citaKons)?  

25

“I gave that Citroen suspension system as a quiz in acoustics to my students," Dr. Bose says. "The same mathematical principles apply.” Source: http://www.automobilemag.com/news/0410_bose/

The  approach   •  We  invesKgate  how  organizaKonal  innovaKon   outcomes  vary  by  founders’  iniKal  mode  of  venture   ideaKon.   –  Compare  how  firms  started  with  knowledge  brokering-­‐ based  ideaKon  differ  in  their  methods  of  sustaining   ongoing  knowledge  brokering  capacity  as  compared  to   firms  not  started  in  such  a  manner.   –  Track  all  the  start-­‐up  biotechnology  firms  founded  to   commercialize  the  then-­‐emergent  rDNA  technology  (iniKal   knowledge  brokers)  together  with  a  contemporaneously   founded  biotechnology  firms  that  did  not  license  rDNA   technology  (iniKal  non-­‐brokers).   27

Key  results   A.  Ongoing  knowledge  brokering  has  an  inverted  U-­‐shaped   relaKonship  with  innovaKve  performance  in  general   – 

More  is  not  necessarily  bemer.  

B.  IniKal  knowledge  brokers  have  a  posiKve  imprinKng  effect   on  their  organizaKons’  search  pamerns  over  Kme,  resulKng   in  superior  performance  relaKve  to  non-­‐brokers     – 

the  means  by  which  entrepreneurial  opportunity  idenKficaKon  takes   place  has  long-­‐lasKng  performance  consequences    

C.  IniKal  non-­‐brokers  rely  more  on  external  channels  of   sourcing  knowledge,  such  as  hiring  technical  staff,  relaKve  to   iniKal  brokers,  reinforcing  the  imprinKng  interpretaKon   28

Results   Predicted no. of forward citations within 5 years

3

Initial brokers 2.5

2

1.5

Initial non-brokers 1

0.5

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ongoing firm knowledge brokering

0.8

0.9

1 29

Measuring  knowledge  brokering   •  Premise:    the  extent  to  which  patents  and  firms  cite  patents  in   different  technical  areas  indicates  the  degree  of  knowledge   brokering  from  those  areas.   –  Firm-­‐level  knowledge  brokering:  For  firm  i  in  year  t,  firm   knowledge  brokering  is  defined  as  (the  number  of   backward  citaKons  to  patents  in  primary  US  classes  firm  i   did  not  patent  in  during  year  t)  divided  by  (the  number  of   backward  citaKons  made  by  firm  i  in  year  t)   –  Firm  knowledge  brokering  stock:    StarKng  from  its   founding  year,  each  firm’s  knowledge  brokering  stock  is   calculated  as  the  cumulaKve  sum  over  previous  years  of   (firm  knowledge  brokeringit  x  number  of  patentsit)   •  We  include  an  exponenKal  depreciaKon  parameter  in  compuKng  these   stocks.  We  vary  the  depreciaKon  parameter  from  0  to  20%  to  test   robustness,  in  line  with  the  20%  rate  used  by  Macher  and  Boerner  for  the   pharmaceuKcal  industry  and  the  15%  depreciaKon  rate  for  patent  stocks   used  by  Hall  et  al.  (2005)  to  accommodate  the  possibility  that  there  could   be  a  degree  of  organizaKonal  “forgewng”  over  Kme.  

Patent-­‐related  methodological   notes   •  Use  of  a  new  citaKon  based  measure.   –  Earlier  versions  of  the  manuscript  also  include  a  vector-­‐based   measure.    Many  robustness  checks.  

•  IdenKfying  and  exploiKng  an  appropriate  empirical  sewng:   firms  founded  to  commercialize  the  Cohen-­‐Boyer  invenKon,   and  compared  to  a  sample  of  non-­‐brokering  firms  that  were   not  set  up  as  brokers.     •  A  new  measure  of  technical  labor  mobility  based  on  prior  tech   overlap.   •  Overlap  with  iniKal  technology  focus,  which  is  defined  as  the   share  of  firms’  patents  with  the  same  technology  classes  with   those  applied  for  in  its  first  three  years  since  founding.   31

Key  Lessons  from  our  journey  (1)   •  Facing  familiar  challenges  in  patent  research   –  Controls  for  many  other  patent  measures  including  patent   originality,  the  Fleming-­‐Sorenson  complexity  and  ease  of   recombinaKon  measures,  citaKons  to  non-­‐patent   references,  wholesale  vs  parKal  brokering.   –  Use  linked  databases  at  mulKple  levels  of  analysis:     •  •  •  • 

Firm:  alliance,  venture  capital,  and  founder  data,  therapeuKcs.   Patent  level:  technology  classes,  dates.   Inventor  level  data:  idenKty,  mobility.   Backwards  and  forward  citaKons:  dyads  and  removing  self-­‐cites.  

–  Matching  names  of  inventors  across  patents  (hard!)     –  High  computaKonal  intensity  of  database,  costly  to  make   changes  during  R&R  if  you  are  not  enKrely  organized.  

32

Key  Lessons  from  our  journey  (2)  

•  Simple  measures  have  to  convince  

–  Using  a  simple  measure,  based  on  proporKon  of  prior  cites  in  the  same   class,  versus  the  angular  measure  based  on  vectors  in  earlier  drabs.  

•  Tie  patent  data  to  non  patent  data   –  We  use  a  rich  set  of  alternates  including  comprehensive  firm  level   profiles.  It  isn’t  just  a  ‘patent’  study  although  we  use  patent  data   extensively.   –  It  is  also  not  just  about  large  samples:  ours  is  a  study  of  a  small  number   of  firms  (but  many  patents  &  citaKons).     •  But  we  know  the  story  of  each  one  of  them  and  are  confident  of  what  is  really   happening  “behind  the  data”.  

•  Ruling  out  alternaKve  explanaKons   –  We  use  all  sorts  of  techniques  to  rule  out  counter  arguments,  including   threshold  regressions,  controls  for  self-­‐citaKons,  measures  for  patent   thickets,  hand-­‐collected  firm  histories  (how  do  we  know  firm  x  really  did   not  broker).     –  i.e.,  just  showing  a  correlaKon  is  not  enough!     33

Finally   •  Patent  data  may  have  limitaKons,  but  it  is  remains   valuable.   –  E.g.,  citaKons  may  be  problemaKc  but  someKmes  the   alternaKve  is  no  measure  at  all.  Even  if  a  survey  is  possible,   it  may  also  be  subject  to  biases,  e.g.,  recall  bias.   –  E.g.,  Kapoor  &  Lim  (AMJ,  2007)  –  we  trace  inventors  aber   their  firms  get  acquired  using  patent  data.  It  would  be   hard  doing  it  almost  any  other  way,  and  we  are  able  to   offer  one  of  the  few  studies  extant  that  looks  at   acquisiKons  from  the  acquired  firm’s  point  of  view,  rather   than  the  acquiring  firm’s  point  of  view.   34