WaterlineDataInventory API Reference

Report 8 Downloads 226 Views
 

Waterline Data Inventory

API Reference Product Version 2.1 Document Version 3.18.2016

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

Table  of  Contents  

Waterline  Data  Inventory  

Table  of  Contents   Preface  ............................................................................................................................................  4   Programming  guide  .........................................................................................................................  5   Authentication   ..................................................................................................................................  5   Error  messages  ..................................................................................................................................  5   Programming  languages  ....................................................................................................................  7   Waterline  Data  Inventory  “resources”  ...............................................................................................  7   cURL  syntax  .......................................................................................................................................  7  

Sample  applications  ........................................................................................................................  8   Importing  tags  ...................................................................................................................................  8  

REST  Calls  ..............................................................................................................................  10   auth/login  ......................................................................................................................................  12   POST  ................................................................................................................................................   12  

/auth/logout  ..................................................................................................................................  14   POST  ................................................................................................................................................   14  

/v1/metadata/collection  ...............................................................................................................  15   POST  ................................................................................................................................................   16   DELETE  .............................................................................................................................................   18   GET  ..................................................................................................................................................   19  

/v1/metadata/datasource  .............................................................................................................  21   GET  ..................................................................................................................................................   21  

/v1/metadata/lineage  ...................................................................................................................  23   POST  ................................................................................................................................................   23   DELETE  .............................................................................................................................................   25  

/v1/metadata/lineage/children  ....................................................................................................  27   GET  ..................................................................................................................................................   27  

/v1/metadata/lineage/parents  .....................................................................................................  29   GET  ..................................................................................................................................................   29  

/v1/metadata/origin  .....................................................................................................................  31   POST  ................................................................................................................................................   31   PUT  ..................................................................................................................................................   33   DELETE  .............................................................................................................................................   34  

/v1/metadata/origin/allorigins  .....................................................................................................  35   GET  ..................................................................................................................................................   35  

/v1/metadata/origin/landing  ........................................................................................................  36   POST  ................................................................................................................................................   36   DELETE  .............................................................................................................................................   38  

/v1/metadata/origin/origins  .........................................................................................................  40   GET  ..................................................................................................................................................   40  

/v1/metadata/origin/resources  ....................................................................................................  42   GET  ..................................................................................................................................................   42  

/v1/metadata/resource  ................................................................................................................  44   GET  ..................................................................................................................................................   44  

/v1/metadata/tagdomain  .............................................................................................................  50   POST  ................................................................................................................................................   50   PUT  ..................................................................................................................................................   51   DELETE  .............................................................................................................................................   52   GET  ..................................................................................................................................................   54  

/v1/metadata/tag  .........................................................................................................................  55   2

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

Table  of  Contents  

POST  ................................................................................................................................................   55   DELETE  .............................................................................................................................................   57   GET  ..................................................................................................................................................   59  

/v1/metadata/tagassociation/field  ...............................................................................................  61   POST  ................................................................................................................................................   61   DELETE  .............................................................................................................................................   63  

/v1/metadata/tagassociation/field/fields  .....................................................................................  65   GET  ..................................................................................................................................................   65  

/v1/metadata/tagassociation/field/tags  .......................................................................................  68   GET  ..................................................................................................................................................   68  

/v1/metadata/tagassociation/resource  ........................................................................................  71   POST  ................................................................................................................................................   71   DELETE  .............................................................................................................................................   73  

/v1/metadata/tagassociation/resource/resources  .......................................................................  75   GET  ..................................................................................................................................................   75  

/v1/metadata/tagassociation/resource/tags  ................................................................................  77   GET  ..................................................................................................................................................   77  

/version  .........................................................................................................................................  79   GET  ..................................................................................................................................................   79  

 

 

 

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

3

Preface  

Waterline  Data  Inventory  

Preface   Waterline  Data  Inventory  provides  a  Representational  State  Transfer  (REST)  API  to  access   information  discovered  and  collected  on  files,  folders,  and  Hive  tables.  The  same  API   allows  applications  to  insert  information  into  the  Waterline  Data  Inventory  repository   such  as  tags,  tag  associations,  and  lineage  relationships.  The  API  provides  access  to  the   same  operations  available  from  the  Waterline  Data  Inventory  browser  application.   The  API  uses  JSON  objects  as  request  and  response  payloads.  The  HTTP  call  returns  a   general  pass/fail  status  message;  calls  that  fail  at  the  Waterline  Data  Inventory  server   return  an  failure  response  message  with  an  error  code  and  more  detailed  message.   The  API  uses  the  HTTP  basic  authentication  scheme.  It  accepts  the  same  user  credentials   as  the  Waterline  Data  Inventory  browser  application.  Before  sending  API  calls,  an   application  would  send  an  authentication  request.  The  server  responds  to  a  successful   request  with  a  session  cookie,  which  the  application  then  uses  in  the  header  of  API  calls.   The  token  is  valid  for  the  length  of  the  session.   This  documentation  organizes  the  components  of  the  API  in  the  following  sections:  

4

!

Programming  guide  

!

Sample  applications  

!

REST  Calls  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

Programming  guide  

Programming  guide     Authentication   This  API  uses  the  HTTP  basic  authentication  scheme.  It  accepts  the  same  user   credentials  as  the  Waterline  Data  Inventory  browser  application,  any  authenticated   users.  Before  sending  API  calls,  an  application  would  send  an  authentication  request.   The  server  responds  to  a  successful  request  with  a  session  cookie,  which  the  application   then  uses  in  the  header  of  API  calls.  The  token  is  valid  for  the  length  of  the  session.   For  example,  to  create  an  authentication  token  saved  in  the  text  file  “cookie.txt”  where   the  Waterline  Data  Inventory  web  server  is  running  on  an  edge  node  and  the  HDFS  root   is  on  a  different  node:   curl -H "Content-Type:application/json" -X POST \ -d '{ "username": "waterlinedata", "password": "waterlinedata" }' "http://edge.productionsystem.com:8082/waterlinedata/auth/login" -c cookie.txt

\

The  token  would  then  be  used  in  subsequent  API  calls:   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=hdfs://master.productionsystem.com:8020" \ --data-urlencode "path=/user/waterlinedata/Landing/data.gov/mo_alc_lic.csv" \ --data-urlencode "verbose=true" \ "http://edge.productionsystem.com:8082/api/v1/metadata/resource" \ -b cookie.txt

Error  messages   This  API  returns  standard  HTTP  status  message  to  indicate  the  success  or  failure  of  a  call.   Successful  calls  are  always  met  with  the  HTTP  code  "200".  Calls  that  succeed  through   the  HTTP  protocol  but  fail  to  produce  the  expected  response  from  Waterline  Data   Inventory  are  marked  with  the  HTTP  code  "500"  and  specific  information  about  the   failure  is  provided  in  the  response  payload.  Other  failure  codes  may  be  returned   through  HTTP  before  Waterline  Data  Inventory  receives  the  request.  For  example:   { "error":4, "message":"Existing lineage already present between the pair of data resources" }

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

5

Programming  guide  

Waterline  Data  Inventory  

HTTP  errors    The  HTTP  errors  you  can  expect  are  as  follows:      

HTTP   Errors  

API   Errors  

Description  

200  

NA  

OK.  Request  is  successful.  

403  

NA  

FORBIDDEN.  The  user  is  not  yet  authenticated  with  the  server.  Call  POST  auth/login  to  request   an  authentication  session  cookie.  

500  

1  

BAD  REQUEST.  Returned  when  parts  of  the  parameters  or  request  payload  are  not  specified   correctly  or  are  missing.  

500  

2  

NOT  AUTHORIZED.  The  user  is  not  authorized  to  perform  the  operation  or  access  the  resource   based  on  the  system  access  permissions.  

500  

3  

NOT-­‐FOUND.  The  request  cannot  locate  the  asset  based  on  the  parameters.  For  example,  the   request  specifies  a  file  or  tag  that  does  not  exist  in  the  repository.  

500  

4  

CONFLICT.  There  is  a  conflict  in  satisfying  the  request.  For  example,  the  request  is  trying  to   create  a  tag  that  already  exists  or  to  create  a  duplicate  lineage.  

Typical  Waterline  Data  Inventory  errors   Some  typical  errors  you  may  encounter  are:   !

Data  resource  cannot  be  found  by  dataSource  .   {"error":3,"message":"\"from\" data resource cannot be found by dataSource <maprfs:///>, path , path = <default.tablename>."}

It  is  critical  to  make  sure  the  data  source  value  provided  matches  the  value  found  in   the  repository.  Use  the  specific  string  that  is  returned  by  the  GET   /v1/metadata/datasource  call.  With  Hive  data  sources,  the  correct  data  source  string   may  not  be  intuitive.  In  addition,  the  path  value  for  a  Hive  table  includes  the   database  name  and  the  table  name,  separated  by  dot  (.).   !

User  cannot  access  the  resource.   {"message":"User cannot access the resource."}

Make  sure  you  have  a  valid  authentication  cookie.  You  will  see  this  error  if  you  have   refreshed  the  cookie,  but  the  valid  cookie  isn't  in  the  local  directory  (or  as  specified   by  the  command).    

6

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

Programming  guide  

Programming  languages   While  you  can  make  RESTful  calls  from  many  programming  languages,  the  examples   provided  in  this  documentation  are  written  in  Java.  

Waterline  Data  Inventory  “resources”   This  API  uses  the  term  “resource”  to  indicate  an  HDFS  file,  folder,  or  Hive  table  in  the   Waterline  Data  Inventory  repository.  The  Waterline  Data  Inventory  concept  of  a   collection  can  also  be  a  resource.   Because  REST  uses  the  term  “resource”  differently,  there  may  be  some  awkwardness  in   how  we're  using  the  term.  For  example,  there  are  REST  resources  described  in  the  API   (such  as  /v1/metadata/tag)  that  refer  to  objects  that  are  not  considered  resources  in   the  Waterline  Data  Inventory  sense.  We  hope  this  doesn't  cause  too  much  confusion.   To  completely  identify  a  resource,  there  needs  to  be  a  resource  path  including  the  name   of  the  resource  and  a  data  source.  In  this  API,  you  can  query  Waterline  Data  Inventory   for  a  list  of  data  sources  (GET  /v1/metadata/datasource).  Data  sources  can  be  an  HDFS   root  (hdfs://:<port>)  or  Hive  (jdbc:hive2://:10000).   Resources  are  typically  identified  by  their  path  within  the  data  source.  For  example,  to   retrieve  HDFS  file  metadata  from  Waterline  Data  Inventory  for  the  file   “public_art_inventory.csv”  found  in  the  Waterline  Data  Inventory  sandbox,  the   application  would  provide  the  following  query  parameters  to  identify  the  file:   ! !

dataSource:  "hdfs://finance.acme.com:8020"   path:  "/user/purchasing/Landing/data.gov/public_art_inventory.csv"  

To  retrieve  metadata  for  a  Hive  table,  identify  the  table  as  follows:   ! !

dataSource:  "jdbc:hive2://hive.hostname.com:10000"   path:  "sales.2015revenue"  

cURL  syntax   The  calls  in  this  guide  include  examples  of  cURL  and  Java  calls.  The  following  guidelines   may  help  you  use  these  examples  more  efficiently.   POST  and  PUT  operations  require  that  you  format  the  request  payload  as  a  JSON  object;   however  GET  and  DELETE  operations  can  take  advantage  of  the  cURL  "-­‐-­‐data-­‐urlencode"   options  to  list  the  parameters.   POST  and  PUT  syntax   curl -H "Content-Type:application/json" -X POST \ -d '{ "dataSource":"", "path":"/user/waterlinedata/Landing/restaurant_inspections/elp", "type":"PARTITION" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

7

Sample  applications  

Waterline  Data  Inventory  

!

Line  continuation  markers  (\)  are  allowed  between  cURL  parameters,  but  not  used   inside  the  JSON  object  

!

JSON  object  elements  are  enclosed  in  double-­‐quotation  marks  with  colon  between   the  element  and  the  value:  "type":"PARTITION"  

!

Commas  are  used  to  separate  elements  inside  the  JSON  object  

GET  and  DELETE  syntax   curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/elp" \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt

!

Parameters  are  separated  by  white  space.  

!

Line  continuation  markers  (\)  are  used  as  needed  at  any  point  in  the  command  

!

Data  parameters  are  introduced  by  "-­‐-­‐data-­‐urlencode"  

!

The  option  "-­‐G"  enables  the  "-­‐-­‐data-­‐urlencode"  qualifier  

!

The  data  parameters  are  enclosed  in  double-­‐quotation  marks  with  an  equal  sign   between  the  name  and  the  value:  "type=PARTITION"  

 

Sample  applications     We've  provided  the  following  sample  application  to  make  it  easier  for  you  to  implement   your  own  application.  This  app  is  written  in  Java.   The  application  is  set  up  to  be  managed  and  built  using  Maven.  We've  also  provided  the   complete  application  so  you  can  run  it  without  having  to  build  it.  The  application   requires  that  Waterline  Data  Inventory  be  installed  and  the  repository  (Derby)  and  web   server  (Jetty)  be  running.  

Importing  tags   This  sample  application  imports  a  list  of  tags  and  their  descriptions  into  Waterline  Data   Inventory's  repository.  The  input  is  a  comma-­‐delimited  text  file  with  tags  and  their   descriptions.  The  output  is  a  message  indicating  success  for  each  tag  creation.     Tags  names  can  indicate  hierarchical  nesting  using  dot  (.)  notation.  Waterline  Data   Inventory  generates  tags  for  each  item  in  the  hierarchy  if  it  doesn’t  already  exist  in  the   repository.  For  example,  if  the  input  includes  a  tag  named   “Organization.Property.Brand”,  Waterline  Data  Inventory  produces  three  tags  organized   hierarchically.   The  application  includes  the  following  logic:   !

8

Collect  input  from  the  command  line   © 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference   ! ! ! ! !

Sample  applications  

Retrieve  the  Waterline  Data  Inventory  session  authentication  token  (GET  /v1/auth)   Check  for  the  existence  of  an  input  file   Read  an  entry  (tag  and  description)  from  the  input  file   Send  the  entry  to  Waterline  Data  Inventory  (POST  /v1/metadata/tag)   Repeat  for  each  entry  in  the  input  file  

The  sample  also  includes  a  script  that  provides  a  usage  message  and  collects  input  from   the  command  line.   To  run  the  application:   1. Create  a  tag  list  file  or  update  the  sample  tag  list  file:     /waterlinedata/samples/TagManager/Sample-Tag-List.txt

Be  sure  that  there  are  no  empty  lines  at  the  end  of  the  data  in  the  file.     2. Navigate  to  the  script  location:     /waterlinedata/samples/TagManager/target

3. Call  the  importTag  script:     tagManager importtags <user> <password>

where   "  is  the  host  name  or  IP  address  for  the  node  on  which  Waterline   Data  Inventory  is  running   " <user>  and  <password>  are  valid  credentials  for  a  user  account  on  the  Linux  file   system  where  the  cluster  is  installed   "  is  the  list  of  tags  to  create.   For  example:   ./tagManager importtags 192.168.1.249 waterlinedata waterlinedata ../Sample-TagList.txt

To  build  the  application:   cd /waterlinedata/samples/tagManager mvn clean install

Check  to  see  that  the  script  file  target/tagManager  has  execute  permissions.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

9

 

REST Calls o o o o o o o

login, logout collections datasources lineage, origins resources tag domains, tags version

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

 

  This  API  supports  a  REST  model  for  accessing  a  set  of  resources  through  a  fixed  set  of   operations.  The  following  resources  are  accessible  through  the  RESTful  model:     ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

auth/login   /auth/logout   /v1/metadata/collection   /v1/metadata/datasource   /v1/metadata/lineage   /v1/metadata/lineage/children   /v1/metadata/lineage/parents   /v1/metadata/origin   /v1/metadata/origin/allorigins   /v1/metadata/origin/landing   /v1/metadata/origin/origins   /v1/metadata/origin/resources   /v1/metadata/resource   /v1/metadata/tagdomain   /v1/metadata/tag   /v1/metadata/tagassociation/field   /v1/metadata/tagassociation/field/fields   /v1/metadata/tagassociation/field/tags   /v1/metadata/tagassociation/resource   /v1/metadata/tagassociation/resource/resources   /v1/metadata/tagassociation/resource/tags   /version  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

11

auth/login  

Waterline  Data  Inventory  

auth/login   Waterline  Data  Inventory  API  uses  HTTP  basic  authentication  scheme.  It  accepts  the   same  user  credentials  as  the  Waterline  Data  Inventory  browser  application.  These  calls   control  the  creation  and  destruction  of  a  session  cookie  to  validate  other  API  calls.     The  session  cookie  is  valid  for  the  UI  timeout  duration,  which  defaults  to  120  minutes.   You  can  override  this  duration  by  editing  the  file:   jetty-distribution-9.2.1.v20140609/waterlinedata-base/etc/waterlinedata-overridedescriptor.xml

and  adding  the  following  block:   <session-config> <session-timeout>30

POST   This  call  requests  an  authentication  token  to  be  used  in  subsequent  API  calls.    

Request  body   A  JSON  object  including  a  username  and  password.   { "username" : "...", "password" : "..." }

The  JSON  payload  consists  of  the  following  properties:   Property   Type   Description   username   string   Username  under  which  operations  will  be  performed  by  Waterline  Data  Inventory.     password   string   Password  for  the  username.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  a  string  to  be  used  as  a  session   authorization  cookie  in  the  header  of  subsequent  API  requests.     Failures  can  be  caused  if:     !

The  credentials  are  missing  or  not  valid.   {"message":"Wrong Username/Password"}  

!

The  web  server  connection  URL  isn’t  correct.   Failed to connect to <WDI host name> port 8062: Connection refused

12

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

auth/login  

or  if  the  provided  host  name  isn’t  correct:   Could not resolve host: <provided host name>  

!

The  repository  database  isn’t  available.   If  the  Waterline  Data  Inventory  repository  is  running  on  Derby:   {"message":"Exception [EclipseLink-4002] (Eclipse Persistence Services 2.5.2.v20131113-a7346c6): org.eclipse.persistence.exceptions.DatabaseException\nInternal Exception: java.sql.SQLNonTransientConnectionException: java.net.ConnectException : Error connecting to server sandbox.hortonworks.com on port 4,444 with message Connection refused.\nError Code: 40000\nQuery: ReadAllQuery(name=\"getUserProfileByPrimaryUserName\" referenceClass=PersistentUserProfile sql=\"SELECT ID, KLASS, PAYLOAD, PRIMARYUSERNAME FROM WD__USERPROFILE WHERE (PRIMARYUSERNAME = ?)\")"}  

Sample  Invocation   The  POST  /auth/login  call  expects  query  parameters  for  a  user  name  and  password.  

cURL   curl -H "Content-Type:application/json" -X POST \ -d '{ "username": "waterlinedata", "password": "waterlinedata" }' \ "http://<WDI-host-name>:8082/waterlinedata/auth/login" -c cookie.txt

Java   // Construct the JSON request payload. String data = "{ \"username\": \"" + "waterlinedata" + "\", \"password\": \"" + "waterlinedata" }"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/waterlinedata/auth/"); // Send the request with user credentials ClientResponse response1 = webResource1.path("login") .type(MediaType.APPLICATION_JSON) .post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class); // Extracting sessionid WDISessionId from response cookies List cookie = response1.getCookies(); sessionId = cookie.get(1).toString(); System.out.println(sessionId + ":" + responseOutput);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

13

/auth/logout  

Waterline  Data  Inventory  

/auth/logout Waterline  Data  Inventory  API  uses  HTTP  basic  authentication  scheme.  It  accepts  the   same  user  credentials  as  the  Waterline  Data  Inventory  browser  application.  These  calls   control  the  creation  and  destruction  of  a  session  cookie  to  validate  other  API  calls.     The  session  cookie  is  valid  for  the  UI  timeout  duration,  as  configured  in  the  web  server   parameter  "session-­‐timeout"  in  the  file:   jetty-distribution-9.2.1.v20140609/waterlinedata-base/etc/webdefault.xml

By  default,  this  setting  is  30  minutes.  

POST   Close  the  API  session.  This  call  logs  out  of  the  session  identified  by  the  authorization   cookie  passed  in  the  header  of  the  request.     Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "User  logged  out."   Sample  Invocation   cURL   curl -H "Content-Type:application/json" -X POST \ "http://<WDI-host-name>:8082/waterlinedata/auth/logout"

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/waterlinedata/auth/"); // Send the logout request ClientResponse response1 = webResource1.path("logout").type(MediaType.APPLICATION_JSON) .post(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

14

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/collection  

/v1/metadata/collection   Waterline  Data  Inventory  "collections"  are  sets  of  files  with  the  same  or  slowly  evolving   schema.  These  files  are  organized  in  a  single  folder  or  sets  of  subfolders  within  a  single   folder.  Think  of  collections  as  a  way  to  represent  a  single  set  of  data  that  happens  to  be   distributed  across  multiple  files  and  over  time  or  across  geographies.  Waterline  Data   Inventory  discovers  files  that  qualify  as  collections;  you  can  also  manually  identify  a  set   of  files  as  a  collection.  Waterline  Data  Inventory  supports  two  types  of  collections:   ! !

Snapshots:  Each  file  in  a  snapshot  holds  essentially  the  same  data  where  each  new   file  may  include  additional  data  or  additional  fields.   Partitions:  The  data  set  is  distributed  across  many  files  where  the  data  does  not   overlap  except  for  values  in  the  partition  fields.  

In  both  cases,  the  collection  represents  the  superset  of  data  and  the  most  recent   schema.  Waterline  Data  Inventory  validates  the  type  of  collection  against  the  files  inside   the  specified  folder.   For  any  collection:   ! ! ! !

The  number  of  files  in  the  folder  (or  its  subfolders)  must  be  greater  than  the  value   set  by  waterlinedata.discovery.smallest.collection.size  (default  is  3  files).   All  files  in  the  folder  have  the  same  file  type.   The  files  have  the  same  schema  or  only  newer  files  have  added  fields  at  the  end  of   the  list  of  fields.   If  the  fields  in  the  files  do  not  have  names,  there  needs  to  be  at  least  ten  fields  in  the   oldest  schema.  

In  addition,  partitioned  collections  must  have:   !

At  least  one  field  includes  non-­‐overlapping  values  across  all  the  files.  

The  metadata  for  a  collection  combines  features  of  folders  and  files:  it  includes  child   information  to  describe  the  files  or  folders  contained  in  the  collection;  it  also  contains   field  information.  Just  as  you  would  in  the  Waterline  Data  Inventory  browser   application,  you  can  choose  to  manage  the  collection  as  a  single  aggregation  of  data   (like  a  file)  or  as  a  set  of  individual  files  (like  a  folder).   Note  that  the  directory  structure  that  makes  up  a  collection  can  be  any  number  of  levels   deep.  For  example,  a  typical  collection  might  be  log  files  collected  an  hour  at  a  time.  The   collection  might  include  a  folder  for  each  year  that  logs  were  collected.  The  year  folders   would  contain  folders  for  each  month;  the  month  folders  would  contain  folders  for  each   day;  the  day  folders  would  contain  files  for  each  hour.   The  following  operations  are  supported  on  this  resource:     ! ! !

POST   DELETE   GET  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

15

/v1/metadata/collection  

Waterline  Data  Inventory  

POST   Identify  an  folder  as  a  collection.  The  folder  must  already  exist  as  a  resource  in  the   Waterline  Data  Inventory  repository.  The  request  contains  the  data  source  and  path  for   the  folder  resource  and  the  type  of  collection,  either  SNAPSHOT  or  PARTITION.     Request  body   JSON  payload  request  that  identifies  the  folder  and  the  type  of  collection.   { "dataSource" : "...", "path" : "...", "type" : "..." }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

dataSource  

string   Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string   Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

type  

string   The  collection  type,  either  SNAPSHOT  or  PARTITION.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Collection  created."  Failures  can  occur  when:     !

The  specified  folder  does  not  exist  in  the  repository.  Check  the  path  to  make  sure  it   points  to  an  existing  folder.  Another  way  this  can  happen  is  when  the  folder  has  not   yet  been  profiled.  For  example,  if  the  folder  is  not  profiled  and  no  entry  for  it  exists   in  the  Waterline  Data  Inventory  repository:   {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}

!

The  folder  fails  to  qualify  as  a  collection,  either  because  the  contained  files  don’t   share  the  same  schema:   {"error":512,"message":"Error while setting collection type:Unable to create PARTITION collection: File schemas do not match"}

16

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/collection  

Or  the  folder  does  not  have  enough  files.  This  error  indicates  that  there  are  fewer   files  in  the  folder  or  the  subdirectories  than  the  setting  from  discovery.properties:   waterlinedata.discovery.smallest.collection.size.  This  value  defaults  to  3.     {"error":512,"message":"Error while setting collection type:Unable to create PARTITION collection: Folder does not have enough files"}

You  may  also  see  this  error  if  the  folder  exists  in  the  repository  but  has  not  been   profiled.   Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Collection  call  takes  parameters  as   part  of  the  JSON  payload.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{ "dataSource":"", "path":"/user/waterlinedata/Landing/restaurant_inspections/elp", "type":"PARTITION" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt

Java   // Collect the data source and path names for the request payload. String sData = "{\"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"type\":\"" + sType + "\"}"; //System.out.println(sData); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); //Send the request with the request payload and session cookie. ClientResponse response1 = webResource1.path("collection") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

17

/v1/metadata/collection  

Waterline  Data  Inventory  

DELETE   Removes  the  collection  identifier  from  a  folder.  This  call    removes  the  aggregated  data  object   and  returns  the  folder  to  a  folder  resource.  It  does  not  delete  the  folder  from  the  repository.   The  request  contains  the  data  source  and  path  to  the  existing  collection.    

Parameters   Name  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

Resource  name  of  the  folder.  This  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.  For   example,  "fromPath=/user/me/myproject/myCollectionFolder".    

string  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "The  folder  is  no  longer  a  collection."  You  see  a  success   message  even  if  the  specified  resource  is  not  currently  a  collection.     Failures  can  be  caused  if:     !

The  specified  folder  does  not  exist  in  the  repository.   {"error":3,"Cannot locate resource by dataSource = , path = ."}

!

The  resource  identified  is  not  a  folder.   {"error":1,"Path is not a directory."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Collection  call  takes  query  parameters   for  data  source  and  path.   cURL   curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/elp" \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt

18

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/collection  

Java   // Collect the data source and path names for the query. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDataSource); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with data source, path, and session authentication cookie. ClientResponse response1 = webResource1.path("collection") .queryParams(queryParams) .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

GET   Retrieves  all  resources  identified  as  collections  from  the  repository.  The  response  includes  a  list   of  resources  augmented  with  the  collection  type  (SNAPSHOT  or  PARTITION).    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  a  list  of  resource  descriptions:   Property  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You   can  retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

owner  

string  

Indicates  the  file  system  owner  of  the  file,  folder,  or  table.    

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting   with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.   For  example  "financedb.revenue".  

type  

string  

Whether  the  collection  is  a  SNAPSHOT  or  PARTITION.    

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.    

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

19

/v1/metadata/collection  

Waterline  Data  Inventory  

cURL   curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with data source, path, and session authentication cookie. ClientResponse response1 = webResource1.path("collection") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

20

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/datasource  

/v1/metadata/datasource   In  Waterline  Data  Inventory,  a  data  source  describes  an  HDFS  cluster  or  Hive  instance.  Waterline   Data  Inventory  needs  a  data  source  description  to  fully  qualify  a  resource,  where  a  "resource"  is   a  folder,  file,  collection,  or  table.  

GET   Retrieve  information  for  all  data  sources  configured  for  this  system:  an  HDFS  cluster  and,  if   configured,  a  Hive  instance.  Information  includes:  

!

The  name  of  the  data  source  in  the  form  of  a  URI  

!

A  description  of  the  data  source,  which  is  available  in  the  repository  but  not  used  in   Waterline  Data  Inventory  UI  

!

The  data  source  type,  including  HDFS  or  Hive  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  payload  with  a  count  of  the  number  of  data  sources  included   and  an  array  of  objects  that  describe  the  data  sources  in  the  repository.   { "count" : ..., "datasources" : [ { "name" : "...", "description" : "...", "type" : "..." }, ... ] }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

count  

long  

The   n umber   o f   d ata   s ources   i ncluded   i n   t he   i nventory.    

datasources   array  of  data  sources    

Container   f or   t he   l ist   o f   d ata   s ources.  

name  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.     For  example:   HDFS:  "hdfs://:8020"   Hive:  "jdbc:hive2://localhost:10000"  

string  

description   string  

Description  is  not  used  in  the  Waterline  Data  Inventory  user  interface;  it  is   only  available  through  the  API.    

type  

Type  can  include  "HdfsDataSource"  or  "HiveDataSource".    

string  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

21

/v1/metadata/datasource  

Waterline  Data  Inventory  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.   cURL   curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/datasource" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send request with session cookie. ClientResponse response1 = webResource1.path("datasource") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

 

22

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/lineage  

/v1/metadata/lineage   Waterline  Data  Inventory  maintains  relationships  among  files  and  tables  that  share  data   or  schemas.  These  relationships  are  arranged  chronologically  such  that  newer  files  are   considered  "children"  of  older  files.  This  chain  of  relationships  forms  the  lineage  of  a   given  resource.  Waterline  Data  Inventory  can  discover  these  relationships  or  the   relationships  can  be  identified  and  applied  in  Waterline  Data  Inventory's  repository.   When  Waterline  Data  Inventory  discovers  lineage  relationships,  each  relationship  is   identified  as  "suggested".  Users  can  validate  a  relationship  by  identifying  it  as   "accepted".  When  a  lineage  relationship  is  added  using  the  API,  it  is  created  as  an   accepted  relationship,  even  if  it  already  existed  in  the  repository  as  a  suggested   relationship.   Applications  can  perform  the  following  lineage  operations:   ! ! ! !

Create  a  lineage  relationship  between  two  existing  resources     POST  /v1/metadata/lineage   Remove  a  lineage  relationship  between  two  resources     DELETE  /v1/metadata/lineage   Retrieve  the  next  earlier  relationship(s)  for  a  resource  (parents)     GET  /v1/metadata/lineage/children   Retrieve  the  next  later  relationship(s)  for  a  resource  (children)     GET  /v1/metadata/lineage/parents  

The  following  operations  are  supported  on  the  resource  /v1/metadata/lineage:     ! !

POST   DELETE  

POST   Create  a  lineage  relationship  between  two  existing  resources.  This  operation  can  apply   to  existing  files,  collections,  or  Hive  tables.     Request  body   A  JSON  object  indicating  the  two  resources  to  relate  and  a  description  of  the   relationship.   { "fromDataSource" : "...", "fromPath" : "...", "toDataSource" : "...", "toPath" : "...", "description" : "..." }

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

23

/v1/metadata/lineage  

Waterline  Data  Inventory  

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

fromDataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

fromPath  

string  

Resource  name  of  the  parent  resource.  For  HDFS  files,  this  name  includes  the  full  path   describing  the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "fromPath=/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "fromPath=financedb.revenue".  

toDataSource  

string  

Data  source  of  the  child  resource.    

toPath  

string  

Resource  name  of  the  child  resource.    

description  

string  

(Optional)  A  description  of  the  relationship.  Use  this  field  to  indicate  how  the  resources   are  related,  such  as  which  parent  fields  map  to  child  fields.  The  description  is  not  used  in   the  Waterline  Data  Inventory  user  interface;  it  is  only  available  through  the  API.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Lineage  created  successfully."  Failures  can  be  caused  if:     ! ! !

Either  of  the  specified  resources  does  not  exist  in  the  repository.  This  can  happen  if   the  data  has  not  yet  been  profiled.   One  of  the  specified  resources  is  a  folder.  Folders  can't  be  included  in  lineage   relationships.   A  lineage  relationship  already  exists  that  conflicts  with  the  specified  relationship,   such  as  if  the  "to"  resource  is  already  indicated  as  a  parent  of  the  "from"  resource.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Lineage  call  takes  parameters  as  part  of   the  JSON  payload.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"fromDataSource":"", "fromPath":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv", "toDataSource":"", "toPath":"/user/waterlinedata/Lestrade/wrangled_inspections/d/14/8_8.csv" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/lineage" \ -b cookie.txt

Java   // Construct the request payload, including "from" resource's data source and path, "to" resource data source and path. String sData ="{\"fromDataSource\":\"" + sFromDS + "\",\"fromPath\":\"" + sFromPath +

24

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/lineage   "\",\"toDataSource\":\"" + sToDS + "\",\"toPath\":\"" + sToPath + "\",\"description\":\"" + sDescription +"\"}";

// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with the payload and session cookie. ClientResponse response1 = webResource1.path("lineage").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

DELETE   Remove  existing  lineage  relationships.  This  call  can  be  used  in  the  following  ways:     ! ! !

Delete  a  specific  lineage  relationship:  specify  both  "from"  and  "to"  resources.   Delete  all  parent  lineage  relationships  for  a  given  resource:  specify  only  the  "to"   resource.   Delete  all  child  lineage  relationships  for  a  given  resource:  specify  only  the  "from"   resource.  

If  no  lineage  relationship  exists  between  the  specified  resource  or  resources,  the  call   returns  a  success  message.     Parameters   Name  

Type  

Description  

fromDataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

fromPath  

string  

Resource  name  of  the  parent  resource.  For  HDFS  files,  this  name  includes  the  full  path   describing  the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "fromPath=/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "fromPath=financedb.revenue".  

toDataSource  

string  

Data  source  of  the  child  resource.    

toPath  

string  

Resource  name  of  the  child  resource.  For  example,   "toPath=/user/me/myproject/myfile".     If  toDataSource  and  toPath  are  not  specified,  all  children  for  the  "from"  resource  are   removed.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Lineage  deleted  successfully."  Failures  can  be  caused  if:     © 2014 - 2016 Waterline Data, Inc. All rights reserved.

25

/v1/metadata/lineage   ! !

Waterline  Data  Inventory  

Either  of  the  specified  resources  does  not  exist  in  the  repository.  This  can  happen  if   the  data  has  not  yet  been  profiled.   One  of  the  specified  resources  is  a  folder.  Folders  can't  be  included  in  lineage   relationships.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Lineage  call  takes  parameters  as   part  of  the  query.   cURL   curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "fromDataSource=" \ --data-urlencode "fromPath=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"\ --data-urlencode "toDataSource=" \ --data-urlencode "toPath=/user/waterlinedata/Lestrade/wrangled_inspections/d/14/8_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/lineage" \ -b cookie.txt

Java   // Construct the URL parameters, including the "from" data source and path and the "to" data source and path. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("fromDataSource", sParentDataSource); queryParams.add("fromPath",sParentPath); queryParams.add("toDataSource", sChildDataSource); queryParams.add("toPath",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including the query parameters and the session cookie. ClientResponse response1 = webResource1.path("lineage").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

 

26

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/lineage/children  

/v1/metadata/lineage/children   Use  this  call  to  retrieve  a  list  of  the  children  of  a  given  resource.  The  children  are  the   newer  resources  that  have  accepted  or  suggested  lineage  relationships  with  the   specified  resource.  Typically,  the  children  will  be  copies  or  transformed  versions  of  the   parent  where  at  least  2  fields  overlap  between  parent  and  child.  

GET   Retrieve  the  child  resource(s)  to  which  the  specified  resource  contributes.     Parameters   Name  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  "hdfs://:8020"   Hive:  "jdbc:hive2://localhost:10000"  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  a  list  of  zero  or  more  resources   identified  by  data  source  and  path.  An  empty  set  returns  if  there  are  no  children  associated   with  the  specified  resource.     [{ "dataSource" : "...", "path" : "...", "owner" : "...", "lineageState" : "...", "lastChange" : "..." }, ...additional children... }]

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

27

/v1/metadata/lineage/children  

Waterline  Data  Inventory  

Name  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  "hdfs://:8020"   Hive:  "jdbc:hive2://localhost:10000"  

path  

string  

Resource  name  of  the  child  resource.  For  HDFS  files,  this  name  includes  the  full  path   describing  the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

owner  

string  

Resource  owner  as  identified  in  the  last  profiling  operation.  

lineageState  

string  

Whether  the  lineage  relationship  suggested  by  Waterline  Data  Inventory  (Suggested)  or   was  it  created  or  approved  by  a  user  or  through  the  API  (Approved).    

lastChange  

timestamp   The  time  when  this  lineage  relationship  was  created,  accepted,  or  rejected.  The  time  is   in  GMT  and  formatted  as  yyyy-­‐MM-­‐dd  hh:mm:ss.SSS  zzz  For  example:   "lastChange":"2016-­‐02-­‐16  20:42:29.037  GMT".  

Failures  can  be  caused  if:    

! !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   The  specified  resources  is  a  folder.  Folders  can't  be  included  in  lineage  relationships.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Lineage  call  takes  parameters  as  part  of   the  query.   cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/inspections.csv"\ "http://<WDI-host-name>:8082/api/v1/metadata/lineage/children" \ -b cookie.txt

Java   // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sChildDataSource); queryParams.add("path",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("lineage/children").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

28

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/lineage/parents  

/v1/metadata/lineage/parents   Use  this  call  to  retrieve  a  list  of  the  parents  of  a  given  resource.  The  parents  are  the   older  resources  that  have  accepted  or  suggested  lineage  relationships  with  the  specified   resource.  Typically,  the  parents  will  be  source  data  where  at  least  2  fields  overlap   between  parent  and  child.  

GET   Retrieve  resources  from  which  the  specified  resource  is  derived.    

Parameters   Name  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  "hdfs://:8020"   Hive:  "jdbc:hive2://localhost:10000"  

path  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

string  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  a  list  of  zero  ore  more  resources   identified  by  data  source  and  path.  An  empty  set  returns  if  there  are  no  children   associated  with  the  specified  resource.     A  successful  call  returns  a  JSON  object  containing  a  list  of  zero  or  more  resources   identified  by  data  source  and  path.     [{ "dataSource" : "...", "path" : "...", "owner" : "...", "lineageState" : "...", "lastChange" : "..." }, ...additional parents... }]

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

29

/v1/metadata/lineage/parents  

Waterline  Data  Inventory  

Name  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  "hdfs://:8020"   Hive:  "jdbc:hive2://localhost:10000"  

path  

string  

Resource  name  of  the  parent  resource.  For  HDFS  files,  this  name  includes  the  full  path   describing  the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

owner  

string  

Resource  owner  as  identified  in  the  last  profiling  operation.  

lineageState  

string  

Whether  the  lineage  relationship  suggested  by  Waterline  Data  Inventory  (Suggested)  or   was  it  created  or  approved  by  a  user  or  through  the  API  (Approved).    

lastChange  

timestamp   The  time  when  this  lineage  relationship  was  created,  accepted,  or  rejected.  The  time  is   in  GMT  and  formatted  as  yyyy-­‐MM-­‐dd  hh:mm:ss.SSS  zzz  For  example:   "lastChange":"2016-­‐02-­‐16  20:42:29.037  GMT".  

Failures  can  be  caused  if:     ! !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   The  specified  resource  is  a  folder.  Folders  can't  be  included  in  lineage  relationships.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  call  takes  parameters  in  the  query.   cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"\ "http://<WDI-host-name>:8082/api/v1/metadata/lineage/parents" \ -b cookie.txt

Java   // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sChildDataSource); queryParams.add("path",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("lineage/parents").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

30

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin  

/v1/metadata/origin   An  "origin"  in  Waterline  Data  Inventory  indicates  a  landing  label  given  to  the  files  or   tables  that  arrived  in  HDFS  from  an  external  source  that  is  then  propagated  through   lineage  relationships.  If  no  origin  label  exists  for  a  resource,  the  resource  cannot  be   traced  through  lineage  to  a  resource  with  a  landing  label.  While  a  resource  marked  as  a   landing  can  have  only  one  origin,  it  is  possible  for  other  resources  to  have  more  than   one  origin.  Multiple  origins  indicate  the  resource  has  elements  of  more  than  one   ancestor  and  that  the  ancestors  trace  back  to  a  resource  with  a  landing  label.   Applications  can  perform  the  following  origin  operations:   ! ! ! ! ! ! ! !

Create  an  origin  label     POST  /v1/metadata/origin   Update  an  origin  label  or  description   PUT  /v1/metadata/origin   Delete  an  origin  label   DELETE  /v1/metadata/origin   List  all  origin  labels     GET  /v1/metadata/origin/allorigins   Associate  an  existing  origin  label  with  a  resource  (Mark  as  landing)   POST  /v1/metadata/origin/landing   Remove  an  origin  label  from  a  resource  (Unmark  as  landing)   DELETE  /v1/metadata/origin/landing   List  the  origin(s)  associated  with  a  resource   GET  /v1/metadata/origin/origins   List  the  resources  that  can  be  traced  back  to  an  origin   GET  /v1/metadata/origin/resources  

The  following  operations  are  supported  on  this  resource  (/v1/metadata/origin):     ! ! !

POST   PUT   DELETE  

POST   Create  an  origin  label.  To  use  this  label  to  mark  a  resource  as  a  "landing",  call  POST   /v1/metadata/origin/landing.     Request  body   A  JSON  object  containing  the  origin  name  and  description.   { "origin" : "...", "description" : "..." }

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

31

/v1/metadata/origin  

Waterline  Data  Inventory  

The  JSON  payload  consists  of  the  following  properties:   Property  

Type   Description  

origin  

string   Origin  name.  Limited  to  256  characters.    

description   string   Origin  description.  Limited  to  512  characters.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Origin  created  successfully."  Failures  can  be  caused  if:     !

The  specified  origin  already  exists  in  the  repository.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  parameters  as  part  of   the  JSON  payload.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"origin":"data.gov", "description":"Public data download for US government" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt

Java   // Construct the JSON payload with the origin name and description. String data = "{\"origin\":\"" + sOriginName + "\",\"description\":\"" + sOriginDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("origin").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

32

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin  

PUT   Update  an  origin  label  or  description.  The  new  origin  name  is  updated  on  any  resource   with  this  origin.     Request  body   A  JSON  object  containing  the  existing  origin  name  and  either  a  new  name,  a  new   description,  or  both.   { "origin" : "...", "newOrigin" : "...", "newDescription" : "..." }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type   Description  

origin  

string   Existing  origin  name.      

newOrigin  

string   New  origin  name.  Limited  to  256  characters.  

newDescription   string   New  description.  Limited  to  512  characters.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Origin  updated  successfully."  Failures  can  be  caused  if:     !

The  specified  origin  does  not  exist  in  the  repository.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  parameters  as  part  of   the  JSON  payload.   cURL   curl -H "Content-Type:application/json" -X PUT \ -d '{"origin":"data.gov", "newDescription":"Public data download from US government (updated)" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt

Java   // Construct the JSON payload with the origin name and description. String data = "{\"origin\":\"" + sOriginName + "\",\"description\":\"" + sOriginDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload.

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

33

/v1/metadata/origin  

Waterline  Data  Inventory  

ClientResponse response1 = webResource1.path("origin").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

DELETE   Delete  an  origin  label  from  the  system.  Deleting  an  origin  removes  that  origin  label  from   resources  that  are  "marked  as  landing"  with  that  origin.   Parameters   Name  

Type  

origin  

string   Origin  label.    

Description  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Origin  deleted  successfully."  Failures  can  be  caused  if:     !

The  specified  origin  does  not  exist  in  the  repository.   {"error":3,"message":"Cannot locate the origin . Please make sure the origin was previously created."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  an  origin  name  as  part   of  the  query.   cURL   curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "origin=human-resources" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with origin name and session cookie. ClientResponse response1 = webResource1.path("origin").queryParam("origin",sOriginName).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

34

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin/allorigins  

/v1/metadata/origin/allorigins   GET   Retrieve  information  for  all  origins  created  in  the  system.  The  information  includes  the   origin  names  and  their  descriptions.     This  origin  call  has  no  parameters.   Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  JSON  payload  with  the  count  of  origins  and  an  array  of  zero  or   more  objects  including  the  name  and  description  of  all  origins.   { "count" : ..., "origins" : [ { "name" : "...", "description" : "..." }, ... ] }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

count  

long  

Count  of  origin  labels  in  the  system.    

origins  

array  of  origins     An  array  of  origin  objects,  including  the  origin  name  and  description.    

name  

string  

description   string  

Origin  name.  Limited  to  256  characters.     Origin  description.  Limited  to  512  characters.    

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  has  no  parameters.   cURL   curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/allorigins" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send request with session cookie. ClientResponse response1 = webResource1.path("origin/allorigins").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

35

/v1/metadata/origin/landing  

Waterline  Data  Inventory  

/v1/metadata/origin/landing   The  following  operations  are  supported  on  this  resource:    

! !

POST   DELETE  

POST   Mark  a  resource  as  a  landing.  Both  resource  and  origin  used  to  indicate  the  landing  must   exist  in  the  repository.  If  the  resource  is  already  marked  as  a  landing,  this  call  will   replace  the  existing  landing  label.  If  the  resource  has  parent  lineage  relationships,  this   call  will  remove  those  lineage  relationships.     Request  body   A  JSON  object  containing  the  origin  label  and  the  data  source  and  name  to  identify  the   resource.   { "dataSource" : "...", "path" : "...", "origin" : "..." }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string    

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

origin  

string    

Origin  name.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Landing  created  successfully."  Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}

36

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference   !

/v1/metadata/origin/landing  

The  specified  origin  does  not  exist  in  the  repository.  Call  POST  /v1/metadata/origin  to   create  the  origin  before  using  it  to  mark  a  landing.   {"error":3,"message":"Origin is not found. Please first create the origin."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  parameters  as  part  of   the  JSON  payload.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"dataSource":"", "path":"/user/waterlinedata/Landing/nyc_open_data/", "origin"="NYC Open Data"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/landing" \ -b cookie.txt

 Java   // Construct the String data = "{ "\", "\",

JSON request, including the data source and path for the resource. \"dataSource\" : \"" + ds + \"path\" : \"" + path + \"origin\":\"" + origin + "\"}";

// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("origin/landing").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

37

/v1/metadata/origin/landing  

Waterline  Data  Inventory  

DELETE   "Unmark"  the  landing  label  from  a  resource.  This  call  will  remove  the  landing  label  from  all   subfolders  and  files,  whether  they  all  have  the  same  landing  label  or  a  different  label.  The  origin   is  not  removed  from  the  list  of  origins.  

Parameters   Name  

Type  

dataSource  

string   Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

Description  

path  

string     Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".    

origin  

string     The  landing  label.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Landing  deleted  successfully."  If  the  resource  exists  but  is  not   already  marked  with  the  landing,  the  success  message  returns  and  nothing  is  done.     Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  the  resource   information  and  origin  as  query  parameters.   cURL   curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Adler/hdfs_u1/AL1" \ --data-urlencode "origin=human-resources" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/landing" \ -b cookie.txt

38

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin/landing  

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("origin", sOriginName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("origin/landing").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

 

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

39

/v1/metadata/origin/origins  

Waterline  Data  Inventory  

/v1/metadata/origin/origins   GET   Retrieve  the  origin  or  origins  assigned  to  a  resource.    

Parameters Name  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  payload  with  the  count  of  origins  and  an  array  of  zero  or   more  objects  that  include  the  name  of  the  origin  or  origins.  Folders  will  return  zero   results  unless  they  are  marked  with  a  landing.     { "count" : ..., "origins" : [ { "name" : "...", "description" : "..." }, ... ] }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

count  

long  

Count  of  origin  labels  in  the  system.    

origins  

array  of  origins     An  array  of  origin  objects,  including  the  origin  name  and  description.    

name  

string  

description   string  

Origin  name.  Limited  to  256  characters.     Origin  description.  Limited  to  512  characters.    

Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  origin  call  takes  parameters  as  part  of   the  query.  

40

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin/origins  

cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode \ "path=/user/waterlinedata/Sherlock/dohmh_inspections/restaurants_sep_2014_vc.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/origins" \ -b cookie.txt

Java   // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDS); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("origin/origins").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

 

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

41

/v1/metadata/origin/resources  

Waterline  Data  Inventory  

/v1/metadata/origin/resources   GET   Retrieves  all  resources  identified  with  a  given  origin.  The  resources  could  be  marked  with  the   origin  as  a  landing  or  they  could  be  derived  from  other  resources  marked  with  the  origin  as  a   landing.    

Parameters   Name   Type   origin  

Description  

query   Origin  label.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  a  count  and  an  array  containing  zero   or  more  objects  that  include  the  details  of  a  resource,  including  whether  it  is  marked  as   a  landing.     { "count" : ..., "resources" : [ { "landing" : false, "dataSource" : "...", "owner" : "...", "path" : "..." }, ... ] }

The  JSON  payload  consists  of  the  following  properties:  

42

Property  

Type  

Description  

count  

long  

Number  of  resources  returned.    

resources  

array  of   resources      

An  array  of  resource  objects.    

landing  

Boolean  

An  indication  of  whether  this  resource  is  the  landing  resource  for  a  given  origin.    

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You   can  retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

owner  

string  

Indicates  the  file  system  owner  of  the  file,  folder,  or  table.    

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting   with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.   For  example  "financedb.revenue".  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/origin/resources  

Failures  can  be  caused  if:     !

The  specified  origin  does  not  exist  in  the  repository.   {"error":3,"message":"Cannot locate the origin . Please make sure the origin was previously created."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Origin  call  takes  an  origin  name  as  part   of  the  query.   cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "origin=data.gov" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/resources" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with the origin as a query parameter and with session cookie. ClientResponse response1 = webResource1.path("origin/resources").queryParam("origin", sOriginName).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

43

/v1/metadata/resource  

Waterline  Data  Inventory  

/v1/metadata/resource   In  Waterline  Data  Inventory,  a  "resource"  describes  an  HDFS  file  or  folder,  a  set  of  files   organized  in  folders  that  is  identified  as  a  Waterline  Data  Inventory  collection,  or  a  Hive  table.   Waterline  Data  Inventory  needs  a  data  source  description  to  fully  qualify  a  resource.   If  a  file  or  table  has  not  been  profiled  by  Waterline  Data  Inventory,  it  won't  have  a   corresponding  resource  in  the  Waterline  Data  Inventory  repository.  

GET   Get  details  of  a  particular  resource  (file,  folder,  collection,  or  Hive  table).  This  is  the  core  call  for   exporting  Waterline  Data  Inventory  metadata,  such  as  tags,  origins,  or  lineage.  If  the  resource   indicated  is  a  Waterline  Data  Inventory  collection,  this  call  returns  details  of  the  aggregated   schema  and  lists  the  files  or  directories  included  in  the  collection.    

Parameters   Name  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

The  name  of  the  resource.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.   For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

verbose  

Boolean   (optional)  The  level  of  detail  of  the  reply.  The  brief  reply  does  not  include  profiling   information  for  each  field  in  the  file  or  table.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  resource  details.     { "dirEntries" : [ { "path" : "...", "owner" : "..." }, ... ], "fieldCount" : ..., "path" : "...", "name" : "...", "owner" : "...", "size" : ..., "fileType" : "...", "timeOfCreation" : "...", "timeOfLastAccess" : "...", "timeOfLastChange" : "...", "timeOfLastProfile" : "...", "recordCount" : ...,

44

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/resource  

"bytesProfiled" : ..., "profileType" : "...", "inputFormat" : "...", "serde" : "...", "landing" : false, "numChildren" : ..., "filePermission":"...", "fileFormat":"...", "hiveDatabase":"...", "fileSeparator": "..." "dirEntryCount" : ..., "collectionInfo" : { "relatedObjectsCount" : ..., "partitionColumn" : "...", "userCreatedCollection" : false }, "tags" : [ { "tagDomain" : "...", "tag" : "..." }, ... ], "origins" : [ "...", ... ], "fields" : [ { "field" : { "fieldNo" : ..., "name" : "...", "type" : "..." }, "tags" : [ { "tagDomain" : "...", "tag" : "..." }, ... ], "fieldProfile" : { "rowCount" : ..., "nullCount" : ..., "cardinality" : ..., "selectivity" : ..., "maxValue" : "...", "minValue" : "...", "stringCount" : ..., "stringCardinality" : ..., "stringSelectivity" : ..., "minString" : "...", "maxString" : "...", "numericCount" : ..., "numericCardinality" : ..., "numericSelectivity" : ..., "maxNumeric" : ..., "minNumeric" : ..., "mean" : ..., "stdDeviation" : ..., "dateCount" : ..., "dateCardinality" : ..., "dateSelectivity" : ..., "maxDate" : "...", "minDate" : "...", "minBoolean" : ..., "maxBoolean" : ..., "booleanCount" : ...,

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

45

/v1/metadata/resource  

Waterline  Data  Inventory  

"booleanCardinality" : ..., "booleanSelectivity" : ..., "modeValue" : "...", "modeCount" : ..., "additionalDataTypes" : [ "...", ... ], "finalRegexCounts" : { "..." : ..., "---" : ... }, "dateFormatCounts" : { "..." : ..., "---" : ... }, "numericFormatCounts" : { "..." : ..., "---" : ... } } }, ... ], "resourceState" : "...", "dirEntries" : [{ "path" : "...", "owner" : "...", },...] }

The  JSON  payload  consists  of  the  following  properties:  

46

Property  

Brief   Type  

Description  

name  

x  

string  

The  name  of  the  file,  folder,  collection,  or  table  without  any  path   information.    

owner  

x  

string  

File-­‐system  owner  of  this  resource.    

path  

x  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing   the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited   string  starting  with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table   name.  For  example  "financedb.revenue".  

size  

x  

long  

In  bytes.  Folders  have  size=0.  The  size  of  a  collection  resource  indicates  the   aggregated  size  of  the  files  in  the  collection.    

fileType  

x  

string  

FOLDER,  HIVE,  CSV,  XML,  JSON,  etc.  This  value  appears  as  "Content  Type"  in   the  Waterline  Data  Inventory  user  interface.    

timeOfCreation  

x  

dateTime  

Timestamp  of  when  the  resource  was  created.  Note  that  this  information  is   not  generated  for  HDFS  files.    

timeOfLastAccess  

x  

dateTime  

Timestamp,  in  GMT,  of  last  time  the  resource  was  read.  Format   "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz".    

timeOfLastChange  

x  

dateTime  

Timestamp,  in  GMT,  of  last  time  the  resource  was  written  to.  Format   "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz".    

timeOfLastProfile  

x  

dateTime  

Timestamp,  in  GMT,  of  last  time  the  resource  was  profiled  by  Waterline   Data  Inventory.  Format  "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz".    

recordCount  

x  

long  

Number   o f   r ecords   i n   a   f ile   o r   t able   r esource.   F or   c omplex   d ata,   t his   i s   the   n umber   o f   " records"   r ather   t han   r ows   a nd   m ay   b e   d ifferent   f rom   the   c ounts   f or   i ndividual   f ields.  

bytesProfiled  

x  

long  

If  the  file,  collection,  or  table  is  sampled  (see  profileType),  how  many  bytes   were  used  to  generate  the  sample.  Corresponds  to  the  profiler  property   waterlinedata.profile.sampled.fraction.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/resource  

Property  

Brief   Type  

Description  

profileType  

x  

string  

Whether  the  metadata  for  this  file,  collection,  or  table  is  collected  from  all   the  data  in  the  file  (FULL)  or  from  a  sample  of  the  data  (SAMPLE).    

inputFormat  

x  

string  

Input  format  used  to  read  the  file.    

serde  

x  

string  

The  SerDe  used  to  serialize  and  de-­‐serialize  the  file,  collection,  or  table.    

landing  

x  

Boolean  

If  "true",  this  resource  has  no  ancestors  in  the  cluster:  it  is  the  first  place   this  data  has  arrived  in  the  cluster.  The  landing  label  appears  as  an  "origin"   for  data  derived  from  this  data  and  linked  through  lineage  relationships.  If   the  resource  is  a  collection  or  folder,  by  default  all  files  contained  in  the   resource  are  identified  with  the  same  landing.    

numChildren  

x  

long  

The  number  of  files  or  folders  contained  one  level  down  inside  this   resource.    

filePermission  

x  

string  

A  string  representing  the  file  permissions  from  HDFS.  For  example,   "-­‐rw-­‐r-­‐-­‐r-­‐-­‐".    

fileFormat  

x  

string  

FOLDER,  Hive,  CSV,  XML,  JSON,  etc.  This  value  appears  as  "Content  Type"  in   the  Waterline  Data  Inventory  user  interface.    

fileSeparator  

x  

string  

For  flat  files,  the  file  separator  discovered  by  Waterline  Data  Inventory  and   used  for  profiling.  If  the  separator  is  an  ASCII  code,  the  output  includes  a   Java  escape  character  to  ensure  the  output  appears  properly.  

hiveDatabase  

x  

string  

Hive  database.  

dirEntryCount  

x  

long  

Number  of  items  in  dirEntries.    

collectionInfo  

x  

collectionInfo     Description  of  the  contents  if  the  resource  is  a  collection.    

relatedObjectsCount   x  

int  

Number  of  files  in  the  collection.  The  files  may  be  organized  in  any  number   of  folders.    

collectionType  

x  

string  

PARTITION  or  SNAPSHOT.  Collection  discovery  creates  only  partition  type   collections.  

collectionCreateTime   x  

string  

Timestamp,  in  GMT,  of  when  the  collection  was  created,  whether  through   collection  discovery  or  manually.  Format  "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz".    

partitionColumn  

string  

Not  used.    

userCreatedCollection   x  

Boolean  

If  "true",  this  collection  was  created  when  identified  by  a  user  (or  through   the  API),  not  discovered  by  Waterline  Data  Inventory.    

tags  

x  

array  of  tags  

A  list  of  tags  associated  with  the  resource.  The  list  includes  the  tag  domain   and  tag  name.    

tagDomain  

x  

string  

Tag  domain  name.  

tag  

x  

string  

Tag  name.  

origins  

x  

array  of   origins   (string)    

One  or  more  origin  labels  from  parents  of  this  file,  collection,  or  table.   Origins  are  propagated  from  landings  by  Waterline  Data  Inventory  in  the   origin  propagation  batch  job.  To  ensure  that  a  resource  has  the  appropriate   origin,  run  Waterline  Data  Inventory's  lineage  discovery  process  or  assign   lineage  relationships  to  this  resource  (see  POST  /v1/metadata/lineage).    

fieldCount  

x  

int  

For  file,  collection,  and  table  resources.  Zero  is  returned  for  folders.    

fields  

x  

array  of  fields   One  or  more  fields  included  in  this  resource.  Does  not  apply  to  folders.   Field  information  includes  field  profiling  metadata  and  tags  associated  with   the  field.    

field  

 

field  

Wrapper  for  the  field  index  number,  name,  and  type.    

fieldNo  

 

integer  

Index  of  the  field  in  the  file  or  table.  Zero  start  value.    

name  

 

string  

Field  name.    

type  

 

string  

Field  type.  This  is  the  type  from  the  original  resource,  not  a  discovered   type.    

tags  

 

array  of  tags    

List  of  zero  or  more  tags  associated  with  this  field.    

x  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

47

/v1/metadata/resource  

48

Waterline  Data  Inventory  

Property  

Brief   Type  

Description  

tagDomain  

 

string  

Tag  domain  name.  

tag  

 

string  

Tag  name.  

fieldProfile  

 

container  

Wrapper  for  field  profile  information.  This  information  is  populated  when   the  resource  is  profiled.    

rowCount  

 

long  

Number   o f   r ows   i n   t he   f ile   o r   t able   t hat   i nclude   t his   f ield.   T his   v alue   varies   b y   f ield   w hen   t he   f ile   o r   t able   i s   b ased   o n   c omplex   d ata   f ormats   such   a s   J SON   o r   X ML.  

nullCount  

 

long  

Number   o f   n ull   v alues   i n   t he   f ield.  

cardinality  

 

long  

Number   o f   u nique   v alues   i n   t he   f ield   a ssuming   t he   d efault   d ata   t ype.  

selectivity  

 

double  

Number   o f   u nique   v alues   o ver   t he   n umber   o f   n on-­‐null   v alues   i n   t he   field   a ssuming   t he   d efault   d ata   t ype.  

maxValue  

 

string  

Maximum   n umeric   v alue   o r   l ast   a lphabetic   v alue   i n   t he   f ield   a ssuming   the   d efault   d ata   t ype.  

minValue  

 

string  

Minimum   n umeric   v alue   o r   f irst   a lphabetic   v alue   i n   t he   f ield   a ssuming   the   d efault   d ata   t ype.  

stringCount  

 

long  

Count   o f   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   s tring.  

stringCardinality  

 

long  

Cardinality   w hen   t he   d ata   t ype   i s   d iscovered   a s   s tring.  

stringSelectivity  

 

double  

Selectivity   w hen   t he   d ata   t ype   i s   d iscovered   a s   s tring.  

minString  

 

string  

First   a lphabetic   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   s tring.  

maxString  

 

string  

Last   a lphabetic   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   s tring.  

numericCount  

 

long  

Count   o f   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

numericCardinality  

 

long  

Cardinality   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

numericSelectivity  

 

double  

Selectivity   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

maxNumeric  

 

double  

Maximum   n umeric   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

minNumeric  

 

double  

Minimum   n umeric   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

mean  

 

double  

Mean   o f   f ield   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   n umeric.  

stdDeviation  

 

double  

Standard   d eviation   o f   f ield   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   numeric.  

dateCount  

 

long  

Count   o f   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   d ate.  

dateCardinality  

 

long  

Cardinality   w hen   t he   d ata   t ype   i s   d iscovered   a s   d ate.  

dateSelectivity  

 

double  

Selectivity   w hen   t he   d ata   t ype   i s   d iscovered   a s   d ate.  

maxDate  

 

dateTime  

Latest   d ate   v alue   w hen   t he   d ata   t ype   i s   d iscovered  as   d ate.  

minDate  

 

dateTime  

Earliest   d ate   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   d ate.  

minBoolean  

 

Boolean  

Maximum   n umeric   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   B oolean.  

maxBoolean  

 

Boolean  

Minimum   n umeric   v alue   w hen   t he   d ata   t ype   i s   d iscovered   a s   B oolean.  

booleanCount  

 

long  

Count   o f   v alues   w hen   t he   d ata   t ype   i s   d iscovered   a s   B oolean.  

booleanCardinality  

 

byte  

Cardinality   w hen   t he   d ata   t ype   i s   d iscovered   a s   B oolean.  

booleanSelectivity  

 

double  

Selectivity   w hen   t he   d ata   t ype   i s   d iscovered   a s   B oolean.  

modeValue  

 

string  

 

modeCount  

 

long  

 

additionalDataTypes    

array  of  data   Data   t ypes   f ound   i n   t he   f ield   v alues   i n   a ddition   t o   t he   p rimary   d ata   types  (string)     type   i ndicated   b y   " type".  

finalRegexCounts  

 

array  of  key-­‐ value  pairs  

Number   o f   v alues   t hat   m atched   t ags   w ith   r egular   e xpression   r ules.   L ist   includes   a n   e ntry   f or   e ach   r egular   e xpression   t ag.  

dateFormatCounts  

 

array  of  key-­‐ value  pairs  

 

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference   Property  

/v1/metadata/resource   Brief   Type  

Description  

numericFormatCounts    

array  of  key-­‐ value  pairs  

 

resourceState  

x  

string  

Status  of  profiling  for  the  resource.  Values  include  DELETED,  CRAWLED,   PROCESSED,  UNPROCESSED,  RECOGNIZED,  UNRECOGNIZED,  PROFILED,   PROFILE  FAILED.  

dirEntries  

x  

array  of   directories    

List  of  files  or  folders  included  in  a  folder  resource.  Each  folder  is  identified   by  its  file  system  owner  and  its  path.  

path  

x  

string  

Resource  name  of  the  subfolder  or  file  included  inside  a  folder  resource.    

owner  

x  

string  

Resource  owner  of  the  subfolder  or  file  included  inside  a  folder  resource.  

Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  Resource  call  takes  parameters  as  part   of  the  query.   cURL   curl -H "Content-Type:application/json" -X GET

\

-G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/data.gov/mo_alc_lic.csv" \ --data-urlencode "verbose=true" \ "http://<WDI-host-name>:8082/api/v1/metadata/resource" \ -b cookie.txt

Java   // Construct the query parameters, including data source and path of the resource. // Verbose is left out to return the more brief version of the response. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDS); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("resource").queryParams(queryParams).header("Cookie", sSessionId).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

49

/v1/metadata/tagdomain  

Waterline  Data  Inventory  

/v1/metadata/tagdomain   By  default,  Waterline  Data  Inventory  tags  are  organized  into  "domains."  Applications   working  with  tags  would  need  to  specify  a  domain  to  fully  quality  a  tag.   Supported  domains  include  a  domain  dedicated  to  built-­‐in  or  system  tags  and  a  domain   dedicated  to  user-­‐defined  tags.   ! ! ! !

POST   GET   PUT   DELETE  

POST   Create  a  tag  domain.  The  user  making  the  call  must  be  assigned  an  administrator  role  in   Waterline  Data  Inventory.    

Request  body   A  JSON  object  containing  a  tag  domain  name  and  description.   {

}  

"name" : "...", "description" : "..."

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

name  

string  

Tag  domain  name.  Limited  to  256  characters.    

description  

string  

Domain  description.  Limited  to  512  characters.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  domain  created  successfully."  Failures  can  be  caused  if:     !

The  authenticated  user  making  the  call  doesn't  have  an  Administrator  role.   {"message":"User cannot access the resource."}

!

A  domain  with  the  same  name  already  exists.   {"error":500,"message":"Create tag domain failed.","additionalDetail":"Tag domain with name Ad Hoc Tags exists"}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  call  takes  parameters  as  part  of  the   JSON  payload.  

50

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagdomain  

cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"name":"Operations", "description":"Domain for final operations team tags"}' "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt

\

 Java   // Construct the JSON request, including the data source and path for the resource. String data = "{ \"name\" : \"" + sTagDomain + "\", \"description\" : \"" + sDomainDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("tagdomain").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

PUT   Update  an  existing  tag  domain  name  or  description.  This  call  uses  the  domain  ID  that   you  can  retrieve  using  GET  /v1/metadata/tagdomain.   Request  body   A  JSON  object  containing  a  tag  domain  name  and  description.   {

}  

"id" : "...", "name" : "...", "description" : "..."

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

id  

integer  

Automatically  generated  tag  domain  ID.  You  can  retrieve  the  valid  values  using  GET   /v1/metadata/tagdomain.  

name  

string  

Tag    domain  name.  The  domain  name  is  required  even  if  it  is  not  being  updated.  Limited  to   256  characters.    

description  

string  

(Optional)  Domain  description.  Limited  to  512  characters.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  domain  updated  successfully."     © 2014 - 2016 Waterline Data, Inc. All rights reserved.

51

/v1/metadata/tagdomain  

Waterline  Data  Inventory  

Failures  can  be  caused  if:     !

The  authenticated  user  making  the  call  doesn't  have  an  Administrator  role.   {"message":"User cannot access the resource."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  This  call  takes  parameters  as  part  of  the   JSON  payload.   cURL   curl -H "Content-Type:application/json" -X PUT \ -d '{"id":23904, "name":"Operations", "description":"Domain for operations team"}' "http://mapr50:8082/api/v1/metadata/tagdomain" \ -b cookie.txt

\

 Java   // Construct the String data = "{ "\", "\",

JSON request, including the data source and path for the resource. \"id\" : \"" + sTagId + \"name\" : \"" + sTagDomain + \"description\" : \"" + sDomainDescription + "\"}";

// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("tagdomain").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).put(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

DELETE   Delete  an  existing  tag  domain.  This  operation  requires  that  all  tags  in  the  domain  be   deleted  before  deleting  the  domain.     Parameters   Name  

Type  

Description  

name  

string  

Existing  empty  tag  domain  to  be  deleted.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  domain  <domainName>  was  deleted."  Failures  can  be   caused  if:    

52

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference   !

/v1/metadata/tagdomain  

The  tag  domain  does  not  exist.   {"error":3,"message":"Cannot locate tag domain . Tag domain may have been deleted previously."}

!

The  authenticated  user  making  the  call  doesn't  have  an  Administrator  role.   {"message":"User cannot access the resource."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  DELETE  /v1/metadata/tagdomain   call  expects  query  parameters  for  the  tag  name.   cURL   curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "name=Operations" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("name", sTagDomain); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagdomain").queryParams(queryParams).type(MediaType.APPLICATION_JSON).h eader("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

53

/v1/metadata/tagdomain  

Waterline  Data  Inventory  

GET   Retrieves  all  tag  domains  in  the  repository.     Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  payload  with  a  count  of  domains  and  an  array  of  tag   domains,  each  of  which  contains  a  domain  name  and  description.   { "count" : ..., "domains" : [ { "id" : ..., "name" : "...", "description" : "..." }, ... ] }

The  JSON  payload  consists  of  the  following  properties:   Property   count   domains   id   name   description  

Type   long   array  of  domains   integer   string   string  

Description   Number  of  domains  returned.     List  of  domains,  including  the  ID,  name,  and  description.   Auto-­‐generated  ID  for  the  tag  domain.   Name  of  the  domain.   Description  of  the  domain.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie. cURL   curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("tagdomain").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

54

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tag  

/v1/metadata/tag   In  Waterline  Data  Inventory,  "tags"  are  annotations  consisting  of  a  label  ("name")  and  a   description,  which  can  be  associated  with  resources  (folders,  files,  collection,  and  tables)  or   fields.  Waterline  Data  Inventory  maintains  a  glossary  of  tags.  The  glossary  is  organized  into   domains  that  can  be  created  using  POST  /v1/metadata/tagdomain.   Applications  can  list  tags  by  domain,  create  new  or  update  descriptions  for  existing  tags,  and   delete  user-­‐defined  tags.   The  following  operations  are  supported  on  this  resource:    

! ! !

POST   DELETE   GET  

POST   Create  a  single  tag  in  the  specified  domain.  Make  this  call  multiple  times  to  add  more  than  one   tag.  Tags  names  can  indicate  hierarchical  nesting  using  dot  notation  (.);  Waterline  Data   Inventory  will  generate  tags  for  each  parent  if  they  do  not  already  exist  in  the  repository.  For   example,  if  the  input  includes  a  tag  named  "Organization.Property.Brand",  Waterline  Data   Inventory  produces  three  tags  organized  hierarchically,  "Organization",  "Property"  and  "Brand".  

Request  body   A  JSON  object  containing  a  tag  name  and  description.   { "domainName" : "...", "name" : "...", "description" : "...", "asDataFacet" : ..., "regexp" : "...", "regexpMin" : ..., "regexpMax" : ..., "regexpThreshold" : ..., "regexpTestdata" : "...", "isRegexEnabled" : ..., "isValueEnabled" : ..., "valueThreshold" : "...", "isManualTagging" : ... }'

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

domainName  

string  

Domain  name.  Must  be  an  existing  domain.  

name  

string  

Tag  name.  Limited  to  256  characters.    

description  

string  

Tag  description.  Limited  to  512  characters.    

asDataFacet  

Boolean   Set  to  true  to  use  this  tag  to  build  a  data  facet  for  searching.    

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

55

/v1/metadata/tag  

Waterline  Data  Inventory  

Property  

Type  

Description  

regexp  

string  

The   r egular   e xpression   u sed   t o   m atch   f ield   v alues   f or   t his   t ag.   M ake   s ure   t hat   a ny   back   s lashes   i n   t he   e xpression   a re   e scaped   w ith   a   s econd   b ack   s lash.   F or   e xample,   a   r egular   e xpression   i ndicating   3   d igits   " \d{3}"   w ould   n eed   t o   b e   e ntered   a s   "\\d{3}".  

regexpMin  

int  

The   m inimum   n umber   o f   c haracters   i n   a   f ield   f or   t he   r egular   e xpression   t o   b e   applied.  

regexpMax  

int  

The   m aximum   n umber   o f   c haracters   i n   a   f ield   f or   t he   r egular   e xpression   t o   b e   applied.  

regexpThreshold   double   The   m inimum   n umber   o f   v alues   i n   a   f ield   t hat   h ave   t o   m atch   f or  t his   t ag   t o   b e   applied   t o   t he   f ield.   regexpTestdata   string  

Test  data  to  validate  the  regular  expression.  

isRegexEnabled   Boolean   The   t ag   i s   p ropagated   u sing   t he   r egular   e xpression   r ule.   O nly   o ne   o f   isRegexEnabled,   i sValueEnabled,   o r   i sManualTagging   c an   b e   t rue.   isValueEnabled  

Boolean   The   t ag   i s   p ropagated   u sing   v alue   t agging.   O nly   o ne   o f   i sRegexEnabled,   isValueEnabled,   o r   i sManualTagging   c an   b e   t rue.  

valueThreshold  

double   The   m inimum   t ag   w eight   c alculated   t o   a ssign   t his   t ag   t o   a   f ield   ( percent).  

isManualTagging   Boolean   The   t ag   i s   n ot   a utomatically   p ropagated.   O nly   o ne   o f   i sRegexEnabled,   isValueEnabled,   o r   i sManualTagging   c an   b e   t rue.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  created  successfully."  Failures  can  be  caused  if:     !

One  or  more  of  the  tag  properties  is  not  specified  for  the  tag.  This  error  also  occurs  if   back-­‐slashes  (\)  in  the  regular  expression  or  elsewhere  in  call  values  are  not  escaped   with  another  backslash.   {"message":"Input JSON Object could not be mapped. Please check documentation for the appropriate format."}

!

One  or  more  of  the  tag  properties  out  of  place  in  the  JSON  object.   {"error":1,"message":"Invalid Request, RegExp Max length

!

must a number"}

The  tag  already  exists.   {"error":4,"message":"The tag to create is already present."}

!

The  regular  expression  could  not  be  evaluated  or  the  test  data  provided  doesn't   match  the  regular  expression.   {"error":500,"message":"Create tag failed. Test didn't match, or invalid regular expression"}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  POST  /v1/metadata/tag  call  expects   JSON  formatted  data  with  name  and  description  elements.   56

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tag  

cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"domainName":"User-defined tags", "name":"Product.ID", "description":"Product ID, ###-###-#####" , "asDataFacet":false, "regexp":"\\d{3}-\\d{3}-\\d{5}", "regexpMin":6, "regexpMax":13, "regexpThreshold":"0.75", "regexpTestdata":"102-003-10021", "isRegexEnabled":true, "isValueEnabled":false, "valueThreshold":"", "isManualTagging":false }' \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt

Java   String data = "{ \"name\":\"" + sTagName + "\", \"description\":\"" + sTagDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request to the path "tag" with the authentication cookie and tag information, "data" ClientResponse response1 = webResource1.path("tag").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

DELETE   Delete  an  existing  tag.  Warning:  This  operation  deletes  all  tag  associations  that  include   this  tag.  Make  this  call  multiple  times  to  remove  more  than  one  tag.     Parameters   Name  

Type  

Description  

name  

string  

Existing  tag  to  be  deleted.  If  the  tag  is  included  in  a  hierarchy  of  tags,  include  all   parent  tags.  For  example,  to  delete  the  tag  "Cuisine",  specify  "Food  Service.Cuisine".  

tagDomain  

string  

Tag  domain.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag    was  deleted."    

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

57

/v1/metadata/tag  

Waterline  Data  Inventory  

Failures  can  be  caused  if:     !

The  tag  does  not  exist.  This  can  be  caused  if  the  tag  is  not  qualified  by  its  parents  in   the  domain.  

{"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  DELETE  /v1/metadata/tag  call   expects  query  parameters  for  tag  name  and  domain.   cURL   curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "name=Food Service.Cuisine" \ --data-urlencode "tagDomain=User-defined Tags" \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tag").queryParams(queryParams).type(MediaType.APPLICATION_JSON).header( "Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

58

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tag  

 GET   Retrieve  all  tags  in  a  given  domain  or  in  all  domains.  The  response  includes  a  list  of  tag  names   and  descriptions.    

Parameters   Name  

Type  

Description  

tagDomain  

string   (optional)  Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.  If  not  specified,  tags   for  all  domains  are  returned,  grouped  by  domain.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  payload  with  a  count  of  tags  and  an  array  of  one  tag   domain,  which  contains  an  array  of  objects  describing  the  tags  in  that  domain.     { "count" : ..., "tagsGroupedByDomain" : [ { "tagDomain" : "...", "count" : ..., "tags" : [ { "name" : "...", "description" : "...", "regexp" : "...", "regexpMin" : "...", "regexpMax" : "...", "regexpThreshold" : "...", "valueThreshold" : "...", "regexEnabled" : ..., "valueEnabled" : ..., "manualTagging" : ..., "facet" : ... }, ... ] }, ... ] }

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

count  

long  

Number   o f   t ags   r eturned   i n   t he   c all.  

tagsGroupedByDomain  

array  of  domains     with  tags  

Container   f or   t he   l ist   o f   t ags.  

tagDomain  

string  

Domain   i dentifying   a   g roup   o f   t ags.  

count  

long  

Number   o f   t ags   i n   a   g iven   d omain.  

tags  

array  of  tags  

Container   f or   t he   l ist   o f   t ags   i n   a   g iven   d omain.  

name  

string  

Tag   n ame.  

description  

string  

Tag   d escription.  

regexp  

string  

Regular   e xpression   r ule   a ssociated   w ith   t he   t ag.   N ote   t hat   t his   rule   i s   u sed   o nly   i f   r egexEnabled   i s   t rue.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

59

/v1/metadata/tag  

Waterline  Data  Inventory  

Property  

Type  

Description  

regexpMin  

string  

The   m inimum   n umber   o f   c haracters   t hat   t he   f ield   v alue   m ust   include   f or   t he   r egular   e xpression   t o   b e   a pplied.  

regexpMax  

string  

The   m aximum   n umber   o f   c haracters   t hat   t he   f ield   v alue   m ust   include   f or   t he   r egular   e xpression   t o   b e   a pplied.  

regexpThreshold  

string  

The   p ercent   o f   v alues   i n   a   f ield   t hat   m ust   m atch   t he   r egular   expression   r ule   f or   t he   f ield   t o   b e   a   c andidate   f or   t his   t ag.  

valueThreshold  

string  

The   w eight   ( percent)   t hat   m ust   b e   c alculated   f or   a   f ield   f or   t he   field   t o   b e   a   c andidate   f or   t his   t ag.  

regexEnabled  

Boolean  

If   t rue,   t he   t ag's   r egular   e xpression   r ule   i s   u sed   f or   t ag   d iscovery.   Only   o ne   o f   r egexEnabled,   v alueEnabled,   o r   m anualTagging   c an   be   t rue   a t   t he   s ame   t ime.  

valueEnabled  

Boolean  

If   t rue,   t ag   d iscovery   i s   e nabled   f or   t his   t ag   u sing   v alue   m atching.   Only   o ne   o f   r egexEnabled,   v alueEnabled,   o r   m anualTagging   c an   be   t rue   a t   t he   s ame   t ime.  

manualTagging  

Boolean  

If   t rue,   t he   t ag   i s   n ot   u sed   i n   t ag   d iscovery.   O nly   o ne   o f   regexEnabled,   v alueEnabled,   o r   m anualTagging   c an   b e   t rue   a t   the   s ame   t ime.  

facet  

Boolean  

If   t rue,   t he   v alues   f rom   f ields   w ith   t his   t ag   a re   i ncluded   i n   a   d ata   facet   f or   s earching.  

Failures  can  be  caused  if:     !

The  specified  domain  does  not  exist  in  the  repository.  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.   cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "tagDomain=User-defined tags" \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("tag").queryParams("Userdefined+Tags").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

60

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/field  

/v1/metadata/tagassociation/field   The  following  operations  are  supported  on  this  resource:     POST   DELETE  

! !

POST   Associate  an  existing  tag  to  a  field  in  an  existing  resource.  The  tag  association  is  created   as  "accepted."  If  a  matching  tag  association  already  exists  as  rejected  or  suggested,  this   call  promotes  the  tag  association  to  accepted.     Request  body   JSON  object  identifying  a  tag  and  the  field  (and  its  file,  collection,  or  table)  that  the  tag  will  be   associated  with.   {

}  

"field" : "...", "dataSource" : "...", "path" : "...", "tagDomain" : "...", "tag" : "...", "description" : "..."

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

field  

string  

The  field  name.    

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

The  name  of  the  resource.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a   slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For   example  "financedb.revenue".  

tag  

string  

The  fully-­‐qualified  name  of  an  existing  tag.  For  example,  "Food  Service.Cuisine".  

tagDomain  

string  

The  tag  domain  where  the  tag  is  located.  

description  

string  

(Optional)  Message  to  include  in  the  audit  history  for  the  tag  and  for  the  file  or  table.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  successfully  associated  with  field."  If  the  tag  is  already   associated  with  the  specified  field,  the  call  succeeds  but  does  not  update  the  field  or  the   audit  history  for  the  file  or  the  tag.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

61

/v1/metadata/tagassociation/field  

Waterline  Data  Inventory  

Failures  can  be  caused  if:     !

The  specified  resource  or  field  does  not  exist  in  the  repository.  This  can  happen  if   the  data  has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate data field in the resource."}

!

The  specified  tag  does  not  exist  in  the  repository.  Call  POST  /v1/metadata/tag  to   create  the  tag  before  associating  the  tag  to  a  resource.   {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  POST   /v1/metadata/tagassociation/field  call  expects  JSON  formatted  data  with  a  resource   data  store,  path,  and  field  name,  and  a  tag  name  and  domain.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"tagDomain":"User-defined tags", "tag":"Food Service.Restaurant.Cuisine", "dataSource":"", "path":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv", "field":"CUISINE_DESCRIPTION", "description":"Marked on ingress"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field" \ -b cookie.txt

Java   // Construct the JSON request payload. String sData = "{\"tagDomain\":\"" + sTagDomain + "\",\"tag\":\"" + sTagName + "\", \"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"field\":\"" + sField + "\",\"description\":\""+ sDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("tagassociation/field").type(MediaType.APPLICATION_JSON).header("Cookie" , sSessionId).post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

62

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/field  

DELETE   Reject  an  association  between  a  tag  and  a  field  in  an  existing  resource.  If  both  tag   information  and  field  information  are  specified,  then  the  specific  tag  association   between  the  tag  and  the  field  is  rejected.  If  no  tag  information  is  specified,  the  call   removes  all  tag  associations  on  this  field.  If  no  field  and  resource  information  are   specified,  the  call  rejects  all  field-­‐level  tag  associations  involving  this  tag.  To  also  reject   resource-­‐level  associations  for  the  tag,  use  DELETE  /v1/metadata/tagassociation/resource.     In  the  context  of  Waterline  Data  Inventory,  "deleting"  a  tag  means  "rejecting"  it.  The   difference  is  that  when  a  tag  is  "rejected",  it  is  no  longer  associated  with  the  field  or   file  in  future  discovery  operations.   Parameters   Name  

Type  

Description  

tagDomain   string   Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.     tag  

string   Tag  name  as  returned  by  GET  /v1/metadata/tag.  The  tag  name  must  include  the  entire  hierarchy   for  the  tag  (for  example,  “Food  Service.Restaurant.Cuisine”)  

dataSource   string   Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can  retrieve   the  valid  values  using  GET  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”   path  

string   The  name  of  the  resource.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.   For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For  example   "financedb.revenue".  

field  

string   Field  or  column  name.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  association  successfully  deleted  from  field."  If  the  tag  is   not  currently  associated  with  the  field,  the  call  returns  a  success  message  and  does   nothing.   Failures  can  be  caused  if:     !

The  specified  resource  or  field  does  not  exist  in  the  repository.  This  can  happen  if   the  data  has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate data field to delete the tag association."}

!

The  specified  tag  does  not  exist  in  the  repository.   {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

63

/v1/metadata/tagassociation/field  

Waterline  Data  Inventory  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  DELETE   /v1/metadata/tagassociation/field  call  expects  query  parameters  for  both  tag   information  and  a  resource  with  field.   cURL   curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Food Service.Restaurant.Cuisine" \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ --data-urlencode "field=CUISINE_DESCRIPTION" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field" \ -b cookie.txt

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("field", sField); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagassociation/field").queryParams(queryParams).type(MediaType.APPLICAT ION_JSON).header("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

64

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/field/fields  

/v1/metadata/tagassociation/field/fields   GET   Retrieve  all  fields  that  have  suggested  or  accepted  tag  associations  with  a  given  tag.     Parameters   Name  

Type  

Description  

tagDomain   string   Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.     tag  

string   Tag  name  as  returned  by  GET  /v1/metadata/tag.  The  tag  name  must  include  the  entire  hierarchy   for  the  tag  (for  example,  “Food  Service.Restaurant.Cuisine”)  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  the  tag  information  specified  in  the   input  and  an  array  of  zero  or  more  objects,  each  describing  a  field  and  resource   associated  with  the  tag.     {

}  

"tagDomainName" : "...", "tagName" : "...", "count" : ..., "fields" : [ { "dataSource" : "...", "path" : "..." "owner" : "...", "tagState" : "...", "weight" : "...", "lastChange" : "...", "fieldName" : "..." "fieldNo" : ..., }, ... ],

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

tagDomainName   string  

Tag  domain.  One  of  "User-­‐defined  Tags"  or  "Build-­‐in  Tags".    

tagName  

string  

Tag  name.    

count  

long  

The  number  of  fields  returned  in  the  result.    

fields  

array  of   fields    

An  array  of  field  objects,  include  the  field  name,  data  source  and  resource  name,  and   owner.    

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.     Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

65

/v1/metadata/tagassociation/field/fields  

Waterline  Data  Inventory  

Property  

Type  

Description  

path  

string  

The  name  of  the  resource.  For  HDFS  files,  this  name  includes  the  full  path  describing   the  resource's  location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting   with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.   For  example  "financedb.revenue".  

owner  

string  

File  or  table  owner.  

tagState  

string  

Suggested,  Accepted.  Tag  discovery  marks  tag  associations  as  "Suggested".  Manual   tag  associations  are  entered  as  "Accepted";  in  addition,  users  can  convert  a   suggested  tag  association  into  "Accepted".  

weight  

double  

The   w eight   c alculated   f or   t he   f ield   f or   t his   t ag.   I f   t he   t ag   h as   a   r egular   expression   r ule   e nabled,   t his   v alue   c orresponds   t o   t he   p ercentage   o f   f ield   values   t hat   m atch   t he   r egular   e xpression.   I f   t he   t ag   h as   v alue   t agging   e nabled,   this   v alue   i s   t he   c alculated   w eight   f or   h ow   w ell   t he   f ield   m atches   t he   t ag.   T he   minimum   w eight   i s   d efined   p er   t ag;   t he   m aximum   i s   1 00.  

lastChange  

date  

Timestamp  in  GMT  that  describes  the  most  recent  event  applied  to  the  tag   association.  This  timestamp  could  describe  the  creation  of  the  tag  association  or  the   change  of  a  tag  association  from  SUGGESTED  to  ACCEPTED.  Format  "yyyy-­‐MM-­‐dd   HH:mm:ss.SSS  zzz"    

fieldName  

string  

Field  or  column  name.  

fieldNo  

integer  

Field  or  column  index  number,  starting  at  zero  for  the  first  column.  

Failures  can  be  caused  if:     !

The  tag  domain  does  not  exist  in  the  repository.     {"error":3,"message":"Cannot locate tag domain ."}

!

The  tag  or  does  not  exist  in  the  specified  domain.   {"error":3,"message":"Cannot locate tag by tag domain <Built-in Tags>, tag name ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  GET   /v1/metadata/tagassociation/field/fields  call  expects  query  parameters  for  a  tag  name   and  domain.   cURL   curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Food Service.Restaurant.Cuisine" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field/fields" -b cookie.txt  

66

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/field/fields  

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the query parameters with the session cookie. ClientResponse response1 = webResource1.path("tagassociation/field/fields").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

67

/v1/metadata/tagassociation/field/tags  

Waterline  Data  Inventory  

/v1/metadata/tagassociation/field/tags   GET   Retrieve  all  suggested  or  rejected  tags  associated  with  a  field  on  a  given  resource.     Parameters   Name  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".    

field  

string  

Field  name.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  the  resource  and  field  information   specified  in  the  input  and  an  array  of  zero  or  more  objects  each  describing  a  tag   associated  with  the  field.     {

}  

"dataSource" : "...", "path" : "...", "fieldName" : "...", "count" : ... "tags" : [ { "tagDescriptor" : { "tagDomain" : "...", "tag" : "..." }, "description" : "...", "weight" : ..., "lastChange" : "...", "tagState" : "..." }, ... ],

The  JSON  payload  consists  of  the  following  properties:  

68

Property  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data   source.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/field/tags  

Property  

Type  

Description  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing   the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table   name.  For  example  "financedb.revenue".  

fieldName  

string  

Field  name.    

count  

long  

The  number  of  tags  returned  in  the  result.    

tags  

array  of  tag   associations  

Container  for  the  list  of  tag  associations  with  the  field.  

tagDescriptor  

array  of  tags  

Container  for  the  tag  and  tag  domain.  

tagDomainName   string  

Tag  domain.    

tagName  

string  

Tag  name.    

description  

string  

Information  about  the  tag  association.  If  the  tag  association  was  suggested,   the  description  may  contain  information  such  as  "Propagated  from  signature   [Built-­‐in  Tags/US  City];  value  overlap(100%)  "  

weight  

double  

The   w eight   c alculated   f or   t he   f ield   f or   t his   t ag.   I f   t he   t ag   h as   a   r egular   expression   r ule   e nabled,   t his   v alue   c orresponds   t o   t he   p ercentage   o f   field   v alues   t hat   m atch   t he   r egular   e xpression.   I f   t he   t ag   h as   v alue   tagging   e nabled,   t his   v alue   i s   t he   c alculated   w eight   f or   h ow   w ell   t he   field   m atches   t he   t ag.   T he   m inimum   w eight   i s   d efined   p er   t ag;   t he   maximum   i s   1 00.  

lastChange  

date  

Timestamp  in  GMT  that  describes  the  most  recent  event  applied  to  the  tag   association.  This  timestamp  could  describe  the  creation  of  the  tag   association  or  the  change  of  a  tag  association  from  SUGGESTED  to   ACCEPTED.  Format  "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz"    

tagState  

string  

Suggested,  Accepted.  Tag  discovery  marks  tag  associations  as  "Suggested".   Manual  tag  associations  are  entered  as  "Accepted";  in  addition,  users  can   convert  a  suggested  tag  association  into  "Accepted".  

Failures  can  be  caused  if:     !

The  specified  resource  or  field  does  not  exist  in  the  repository.  This  can  happen  if   the  data  has  not  yet  been  profiled.  Resource  not  found:   {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}

Resource  found  but  field  not  found:   {"error":3,"message":"Cannot locate data field to retrieve the tag association."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  GET   /v1/metadata/tagassociation/field/tags  call  expects  query  parameters  for  a  data  source,   resource,  and  field  name.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

69

/v1/metadata/tagassociation/field/tags  

Waterline  Data  Inventory  

cURL   curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ --data-urlencode "field=ACTION" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field/tags" \ -b cookie.txt    

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("field", sField); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("tagassociation/field/tags").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

70

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/resource  

/v1/metadata/tagassociation/resource   The  following  operations  are  supported  on  this  resource:     POST   DELETE  

! !

POST   Associate  an  existing  tag  to  an  existing  resource.  The  tag  association  is  created  as   "accepted."  If  a  matching  tag  association  already  exists  as  suggested  or  rejected,  this  call   promotes  the  tag  association  to  accepted.     Request  body   JSON  object  identifying  a  tag  and  the  resource  (file,  folder,  collection,  or  Hive  table)  that   the  tag  will  be  associated  with.   {

}  

"dataSource" : "...", "path" : "...", "tagDomain" : "...", "tag" : "...", "description" : "..."

The  JSON  payload  consists  of  the  following  properties:   Property  

Type   Description  

dataSource   string   Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can  retrieve   the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”   path  

string   Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For  example,   "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For  example   "financedb.revenue".  

tagDomain   string   Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.     tag  

string   Tag  name.    

description   string   (Optional)  Description  of  the  tag  association.  This  text  appears  in  the  audit  history  for  both  the   tag  and  the  resource.  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  successfully  associated  with  resource."  If  the  tag  is  already   associated  with  the  resource,  the  call  returns  a  success  message  and  does  nothing.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

71

/v1/metadata/tagassociation/resource  

Waterline  Data  Inventory  

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  POST   /v1/metadata/tagassociation/resource  call  expects  JSON  formatted  data  with  a  resource   data  store  and  path,  and  a  tag  name  and  domain.   cURL   curl -H "Content-Type:application/json" -X POST \ -d '{"tagDomain":"User-defined tags", "tag":"Food Service.Restaurant.Cuisine", "dataSource":"", "path":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource" \ -b cookie.txt

Java   // Construct the JSON request payload. String sData = "{\"tagDomain\":\"" + sTagDomain + "\",\"tag\":\"" + sTagName + "\", \"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"description\":\""+ sDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("tagassociation/resource").type(MediaType.APPLICATION_JSON).header("Cook ie", sSessionId).post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

   

72

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/resource  

DELETE   Reject  the  association  between  an  existing  tag  and  an  existing  resource.  The  tag  remains   in  the  Waterline  Data  Inventory  tag  glossary.  If  no  tag  is  included  in  the  request,  all  tags   are  rejected  from  the  resource.  If  no  resource  is  included  in  the  request,  all  associations   between  this  tag  and  any  resources  are  rejected.  To  also  reject  field-­‐level  associations   for  a  tag,  use  DELETE  /v1/metadata/tagassociation/field.     Parameters   Name  

Type   Description  

tagDomain   string   Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.     tag  

string   Tag  name.    

dataSource   string   Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can  retrieve  the   valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”   path  

string   Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's  location   in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with  a  slash.  For  example,   "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.  For  example   "financedb.revenue".  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "Tag  association  successfully  deleted  from  resource."  If  the  tag   was  not  associated  with  the  resource,  the  success  message  is  returned  and  not  action  is   taken.     Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.   {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}

!

The  tag  does  not  exist  in  the  specified  tag  domain  or  in  the  glossary.   {"error":3,"message":"Cannot locate tag by tag domain <Built-in Tags>, tag name ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  DELETE   /v1/metadata/tagassociation/resource  call  expects  query  parameters  for  both  tag   information  and  a  resource.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

73

/v1/metadata/tagassociation/resource  

Waterline  Data  Inventory  

cURL   curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Topic.Accidents" \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource" \ -b cookie.txt  

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); queryParams.add("dataSource", ds); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagassociation/resource").queryParams(queryParams).type(MediaType.APPLI CATION_JSON).header("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

 

74

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/resource/resources  

/v1/metadata/tagassociation/resource/resources   GET   Retrieve  all  resources  that  have  accepted  tag  associations  with  a  given  tag.     Parameters   Name  

Type  

Description  

tagDomain  

string  

Domain  name  as  returned  by  GET  /v1/metadata/tagdomain.    

tag  

string  

Tag  name.    

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  the  tag  information  specified  in  the   input  and  an  array  of  zero  or  more  objects  each  describing  a  resource  associated  with   the  tag.     {

}  

"tagDomainName" : "...", "tagName" : "...", "count" : ..., "resources" : [ { "dataSource" : "...", "path" : "...", "owner" : "...", "tagState" : "...", "weight" : ..., "lastChange" : "..." }, ... ],

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

tagDomainName   string  

Tag  domain.    

tagName  

string  

Tag  name.    

count  

long  

The  number  of  resources  associated  with  the  tag.    

resources  

array  of   A  list  of  resource  objects,  including  the  data  source,  resource  name,  and  owner.     resources    

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.   Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the   resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string  starting  with   a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table  name.   For  example  "financedb.revenue".  

owner  

string  

File  or  table  owner.  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

75

/v1/metadata/tagassociation/resource/resources  

Waterline  Data  Inventory  

Property  

Type  

Description  

weight  

double  

The   w eight   c alculated   f or   t he   f ield   f or   t his   t ag.   I f   t he   t ag   h as   a   r egular   expression   r ule   e nabled,   t his   v alue   c orresponds   t o   t he   p ercentage   o f   f ield   values   t hat   m atch   t he   r egular   e xpression.   I f   t he   t ag   h as   v alue   t agging   e nabled,   this   v alue   i s   t he   c alculated   w eight   f or   h ow   w ell   t he   f ield   m atches   t he   t ag.   T he   minimum   w eight   i s   d efined   p er   t ag;   t he   m aximum   i s   1 00.  

lastChange  

date  

Timestamp  in  GMT  that  describes  the  most  recent  event  applied  to  the  tag   association.  This  timestamp  could  describe  the  creation  of  the  tag  association  or  the   change  of  a  tag  association  from  SUGGESTED  to  ACCEPTED.  Format  "yyyy-­‐MM-­‐dd   HH:mm:ss.SSS  zzz"    

tagState  

string  

Suggested,  Accepted.  Tag  discovery  marks  tag  associations  as  "Suggested".  Manual   tag  associations  are  entered  as  "Accepted";  in  addition,  users  can  convert  a   suggested  tag  association  into  "Accepted".  

Failures  can  be  caused  if:     !

The  tag  does  not  exist  in  the  specified  domain.   {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}

!

The  tag  domain  does  not  exist  in  the  repository.     {"error":3,"message":"Cannot locate tag domain <User-defined Tags>."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  GET   /v1/metadata/tagassociation/resource/resources  call  expects  query  parameters  for  a   data  source,  resource,  and  field  name.   cURL   curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "tagDomain=User-defined tags"\ --data-urlencode "tag=Topic.Travel.NYC" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource/resources" \ -b cookie.txt

Java   // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("tagassociation/resource/resources") .queryParams(queryParams).header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);

76

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/v1/metadata/tagassociation/resource/tags  

/v1/metadata/tagassociation/resource/tags   GET   Retrieve  all  suggested  or  rejected  tags  associated  with  a  resource.     Parameters   Name  

Type  

Description  

dataSource   string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data  source.  You  can   retrieve  the  valid  values  using  /v1/metadata/datasource.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing  the  resource's   location  in  the  file  system.  The  path  is  a  slash  delimited  string  starting  with  a  slash.  For   example,  "/user/me/myproject/myfile".    

string  

Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  a  JSON  object  containing  the  resource  and  field  information   specified  in  the  input  and  an  array  of  zero  or  more  objects  each  describing  a  tag   associated  with  the  field.     {

}  

"dataSource" : "...", "path" : "...", "count" : ... "tags" : [ { "tagDescriptor" : { "tagDomain" : "...", "tag" : "..." }, "description" : "...", "weight" : ..., "lastChange" : "...", "tagState" : "..." }, ... ],

The  JSON  payload  consists  of  the  following  properties:   Property  

Type  

Description  

dataSource  

string  

Data  source  name.  A  string  formatted  as  a  URI  that  identifies  the  data   source.  Examples  as  follows:   HDFS:  “hdfs://:8020”   Hive:  “jdbc:hive2://localhost:10000”  

path  

string  

Resource  name.  For  HDFS  files,  this  name  includes  the  full  path  describing   the  resource's  location  in  the  file  system.  The  path  is  a  slash-­‐delimited  string   starting  with  a  slash.  For  example,  "/user/me/myproject/myfile".     For  Hive  tables,  this  name  includes  the  database  name,  a  dot,  and  the  table   name.  For  example  "financedb.revenue".  

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

77

/v1/metadata/tagassociation/resource/tags  

Waterline  Data  Inventory  

Property  

Type  

Description  

count  

long  

The  number  of  tags  returned  in  the  result.    

tags  

array  of  tag   associations  

Container  for  the  list  of  tag  associations  with  the  field.  

tagDescriptor  

array  of  tags  

Container  for  the  tag  and  tag  domain.  

tagDomainName   string  

Tag  domain.    

tagName  

string  

Tag  name.    

description  

string  

Information  about  the  tag  association.  If  the  tag  association  was  suggested,   the  description  may  contain  information  such  as  "Propagated  from  signature   [Built-­‐in  Tags/US  City];  value  overlap(100%)  "  

weight  

double  

The   w eight   c alculated   f or   t he   f ield   f or   t his   t ag.   I f   t he   t ag   h as   a   r egular   expression   r ule   e nabled,   t his   v alue   c orresponds   t o   t he   p ercentage   o f   field   v alues   t hat   m atch   t he   r egular   e xpression.   I f   t he   t ag   h as   v alue   tagging   e nabled,   t his   v alue   i s   t he   c alculated   w eight   f or   h ow   w ell   t he   field   m atches   t he   t ag.   T he   m inimum   w eight   i s   d efined   p er   t ag;   t he   maximum   i s   1 00.  

lastChange  

date  

Timestamp  in  GMT  that  describes  the  most  recent  event  applied  to  the  tag   association.  This  timestamp  could  describe  the  creation  of  the  tag   association  or  the  change  of  a  tag  association  from  SUGGESTED  to   ACCEPTED.  Format  "yyyy-­‐MM-­‐dd  HH:mm:ss.SSS  zzz"    

tagState  

string  

Suggested,  Accepted.  Tag  discovery  marks  tag  associations  as  "Suggested".   Manual  tag  associations  are  entered  as  "Accepted";  in  addition,  users  can   convert  a  suggested  tag  association  into  "Accepted".  

Failures  can  be  caused  if:     !

The  specified  resource  does  not  exist  in  the  repository.  This  can  happen  if  the  data   has  not  yet  been  profiled.  Resource  not  found:   {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}

Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.  The  GET   /v1/metadata/tagassociation/field/tags  call  expects  query  parameters  for  a  data  source,   resource,  and  field  name.   cURL   curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource/tags" \ -b cookie.txt

78

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

API  Reference  

/version  

/version   GET   Show  the  API  version.     Response  body   The  request  returns  HTTP  error  code  200  for  success  and  500  if  the  call  completes  with   errors.  See  Error  Messages  for  details.   A  successful  call  returns  "v"  followed  by  the  API  version  number.  Note  that  the  API   version  number  does  not  necessarily  correspond  to  the  compatible  Waterline  Data   Inventory  version.   Sample  Invocation   First  call  POST  auth/login  to  get  a  session  cookie.   cURL   curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/version" \ -b cookie.txt

Java   // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResourceVersion = client.resource("http://:8082/"); //Send the request with the session cookie. ClientResponse responseVersion = webResourceVersion.path("version") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = responseVersion.getStatus(); String responseOutput = responseVersion.getEntity(String.class);

 

© 2014 - 2016 Waterline Data, Inc. All rights reserved.

79