Waterline Data Inventory
API Reference Product Version 2.1 Document Version 3.18.2016
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
Table of Contents
Waterline Data Inventory
Table of Contents Preface ............................................................................................................................................ 4 Programming guide ......................................................................................................................... 5 Authentication .................................................................................................................................. 5 Error messages .................................................................................................................................. 5 Programming languages .................................................................................................................... 7 Waterline Data Inventory “resources” ............................................................................................... 7 cURL syntax ....................................................................................................................................... 7
Sample applications ........................................................................................................................ 8 Importing tags ................................................................................................................................... 8
REST Calls .............................................................................................................................. 10 auth/login ...................................................................................................................................... 12 POST ................................................................................................................................................ 12
/auth/logout .................................................................................................................................. 14 POST ................................................................................................................................................ 14
/v1/metadata/collection ............................................................................................................... 15 POST ................................................................................................................................................ 16 DELETE ............................................................................................................................................. 18 GET .................................................................................................................................................. 19
/v1/metadata/datasource ............................................................................................................. 21 GET .................................................................................................................................................. 21
/v1/metadata/lineage ................................................................................................................... 23 POST ................................................................................................................................................ 23 DELETE ............................................................................................................................................. 25
/v1/metadata/lineage/children .................................................................................................... 27 GET .................................................................................................................................................. 27
/v1/metadata/lineage/parents ..................................................................................................... 29 GET .................................................................................................................................................. 29
/v1/metadata/origin ..................................................................................................................... 31 POST ................................................................................................................................................ 31 PUT .................................................................................................................................................. 33 DELETE ............................................................................................................................................. 34
/v1/metadata/origin/allorigins ..................................................................................................... 35 GET .................................................................................................................................................. 35
/v1/metadata/origin/landing ........................................................................................................ 36 POST ................................................................................................................................................ 36 DELETE ............................................................................................................................................. 38
/v1/metadata/origin/origins ......................................................................................................... 40 GET .................................................................................................................................................. 40
/v1/metadata/origin/resources .................................................................................................... 42 GET .................................................................................................................................................. 42
/v1/metadata/resource ................................................................................................................ 44 GET .................................................................................................................................................. 44
/v1/metadata/tagdomain ............................................................................................................. 50 POST ................................................................................................................................................ 50 PUT .................................................................................................................................................. 51 DELETE ............................................................................................................................................. 52 GET .................................................................................................................................................. 54
/v1/metadata/tag ......................................................................................................................... 55 2
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
Table of Contents
POST ................................................................................................................................................ 55 DELETE ............................................................................................................................................. 57 GET .................................................................................................................................................. 59
/v1/metadata/tagassociation/field ............................................................................................... 61 POST ................................................................................................................................................ 61 DELETE ............................................................................................................................................. 63
/v1/metadata/tagassociation/field/fields ..................................................................................... 65 GET .................................................................................................................................................. 65
/v1/metadata/tagassociation/field/tags ....................................................................................... 68 GET .................................................................................................................................................. 68
/v1/metadata/tagassociation/resource ........................................................................................ 71 POST ................................................................................................................................................ 71 DELETE ............................................................................................................................................. 73
/v1/metadata/tagassociation/resource/resources ....................................................................... 75 GET .................................................................................................................................................. 75
/v1/metadata/tagassociation/resource/tags ................................................................................ 77 GET .................................................................................................................................................. 77
/version ......................................................................................................................................... 79 GET .................................................................................................................................................. 79
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
3
Preface
Waterline Data Inventory
Preface Waterline Data Inventory provides a Representational State Transfer (REST) API to access information discovered and collected on files, folders, and Hive tables. The same API allows applications to insert information into the Waterline Data Inventory repository such as tags, tag associations, and lineage relationships. The API provides access to the same operations available from the Waterline Data Inventory browser application. The API uses JSON objects as request and response payloads. The HTTP call returns a general pass/fail status message; calls that fail at the Waterline Data Inventory server return an failure response message with an error code and more detailed message. The API uses the HTTP basic authentication scheme. It accepts the same user credentials as the Waterline Data Inventory browser application. Before sending API calls, an application would send an authentication request. The server responds to a successful request with a session cookie, which the application then uses in the header of API calls. The token is valid for the length of the session. This documentation organizes the components of the API in the following sections:
4
!
Programming guide
!
Sample applications
!
REST Calls
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
Programming guide
Programming guide Authentication This API uses the HTTP basic authentication scheme. It accepts the same user credentials as the Waterline Data Inventory browser application, any authenticated users. Before sending API calls, an application would send an authentication request. The server responds to a successful request with a session cookie, which the application then uses in the header of API calls. The token is valid for the length of the session. For example, to create an authentication token saved in the text file “cookie.txt” where the Waterline Data Inventory web server is running on an edge node and the HDFS root is on a different node: curl -H "Content-Type:application/json" -X POST \ -d '{ "username": "waterlinedata", "password": "waterlinedata" }' "http://edge.productionsystem.com:8082/waterlinedata/auth/login" -c cookie.txt
\
The token would then be used in subsequent API calls: curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=hdfs://master.productionsystem.com:8020" \ --data-urlencode "path=/user/waterlinedata/Landing/data.gov/mo_alc_lic.csv" \ --data-urlencode "verbose=true" \ "http://edge.productionsystem.com:8082/api/v1/metadata/resource" \ -b cookie.txt
Error messages This API returns standard HTTP status message to indicate the success or failure of a call. Successful calls are always met with the HTTP code "200". Calls that succeed through the HTTP protocol but fail to produce the expected response from Waterline Data Inventory are marked with the HTTP code "500" and specific information about the failure is provided in the response payload. Other failure codes may be returned through HTTP before Waterline Data Inventory receives the request. For example: { "error":4, "message":"Existing lineage already present between the pair of data resources" }
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
5
Programming guide
Waterline Data Inventory
HTTP errors The HTTP errors you can expect are as follows:
HTTP Errors
API Errors
Description
200
NA
OK. Request is successful.
403
NA
FORBIDDEN. The user is not yet authenticated with the server. Call POST auth/login to request an authentication session cookie.
500
1
BAD REQUEST. Returned when parts of the parameters or request payload are not specified correctly or are missing.
500
2
NOT AUTHORIZED. The user is not authorized to perform the operation or access the resource based on the system access permissions.
500
3
NOT-‐FOUND. The request cannot locate the asset based on the parameters. For example, the request specifies a file or tag that does not exist in the repository.
500
4
CONFLICT. There is a conflict in satisfying the request. For example, the request is trying to create a tag that already exists or to create a duplicate lineage.
Typical Waterline Data Inventory errors Some typical errors you may encounter are: !
Data resource cannot be found by dataSource . {"error":3,"message":"\"from\" data resource cannot be found by dataSource <maprfs:///>, path , path = <default.tablename>."}
It is critical to make sure the data source value provided matches the value found in the repository. Use the specific string that is returned by the GET /v1/metadata/datasource call. With Hive data sources, the correct data source string may not be intuitive. In addition, the path value for a Hive table includes the database name and the table name, separated by dot (.). !
User cannot access the resource. {"message":"User cannot access the resource."}
Make sure you have a valid authentication cookie. You will see this error if you have refreshed the cookie, but the valid cookie isn't in the local directory (or as specified by the command).
6
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
Programming guide
Programming languages While you can make RESTful calls from many programming languages, the examples provided in this documentation are written in Java.
Waterline Data Inventory “resources” This API uses the term “resource” to indicate an HDFS file, folder, or Hive table in the Waterline Data Inventory repository. The Waterline Data Inventory concept of a collection can also be a resource. Because REST uses the term “resource” differently, there may be some awkwardness in how we're using the term. For example, there are REST resources described in the API (such as /v1/metadata/tag) that refer to objects that are not considered resources in the Waterline Data Inventory sense. We hope this doesn't cause too much confusion. To completely identify a resource, there needs to be a resource path including the name of the resource and a data source. In this API, you can query Waterline Data Inventory for a list of data sources (GET /v1/metadata/datasource). Data sources can be an HDFS root (hdfs://:<port>) or Hive (jdbc:hive2://:10000). Resources are typically identified by their path within the data source. For example, to retrieve HDFS file metadata from Waterline Data Inventory for the file “public_art_inventory.csv” found in the Waterline Data Inventory sandbox, the application would provide the following query parameters to identify the file: ! !
dataSource: "hdfs://finance.acme.com:8020" path: "/user/purchasing/Landing/data.gov/public_art_inventory.csv"
To retrieve metadata for a Hive table, identify the table as follows: ! !
dataSource: "jdbc:hive2://hive.hostname.com:10000" path: "sales.2015revenue"
cURL syntax The calls in this guide include examples of cURL and Java calls. The following guidelines may help you use these examples more efficiently. POST and PUT operations require that you format the request payload as a JSON object; however GET and DELETE operations can take advantage of the cURL "-‐-‐data-‐urlencode" options to list the parameters. POST and PUT syntax curl -H "Content-Type:application/json" -X POST \ -d '{ "dataSource":"", "path":"/user/waterlinedata/Landing/restaurant_inspections/elp", "type":"PARTITION" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
7
Sample applications
Waterline Data Inventory
!
Line continuation markers (\) are allowed between cURL parameters, but not used inside the JSON object
!
JSON object elements are enclosed in double-‐quotation marks with colon between the element and the value: "type":"PARTITION"
!
Commas are used to separate elements inside the JSON object
GET and DELETE syntax curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/elp" \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt
!
Parameters are separated by white space.
!
Line continuation markers (\) are used as needed at any point in the command
!
Data parameters are introduced by "-‐-‐data-‐urlencode"
!
The option "-‐G" enables the "-‐-‐data-‐urlencode" qualifier
!
The data parameters are enclosed in double-‐quotation marks with an equal sign between the name and the value: "type=PARTITION"
Sample applications We've provided the following sample application to make it easier for you to implement your own application. This app is written in Java. The application is set up to be managed and built using Maven. We've also provided the complete application so you can run it without having to build it. The application requires that Waterline Data Inventory be installed and the repository (Derby) and web server (Jetty) be running.
Importing tags This sample application imports a list of tags and their descriptions into Waterline Data Inventory's repository. The input is a comma-‐delimited text file with tags and their descriptions. The output is a message indicating success for each tag creation. Tags names can indicate hierarchical nesting using dot (.) notation. Waterline Data Inventory generates tags for each item in the hierarchy if it doesn’t already exist in the repository. For example, if the input includes a tag named “Organization.Property.Brand”, Waterline Data Inventory produces three tags organized hierarchically. The application includes the following logic: !
8
Collect input from the command line © 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference ! ! ! ! !
Sample applications
Retrieve the Waterline Data Inventory session authentication token (GET /v1/auth) Check for the existence of an input file Read an entry (tag and description) from the input file Send the entry to Waterline Data Inventory (POST /v1/metadata/tag) Repeat for each entry in the input file
The sample also includes a script that provides a usage message and collects input from the command line. To run the application: 1. Create a tag list file or update the sample tag list file: /waterlinedata/samples/TagManager/Sample-Tag-List.txt
Be sure that there are no empty lines at the end of the data in the file. 2. Navigate to the script location: /waterlinedata/samples/TagManager/target
3. Call the importTag script: tagManager importtags <user> <password>
where " is the host name or IP address for the node on which Waterline Data Inventory is running " <user> and <password> are valid credentials for a user account on the Linux file system where the cluster is installed " is the list of tags to create. For example: ./tagManager importtags 192.168.1.249 waterlinedata waterlinedata ../Sample-TagList.txt
To build the application: cd /waterlinedata/samples/tagManager mvn clean install
Check to see that the script file target/tagManager has execute permissions.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
9
REST Calls o o o o o o o
login, logout collections datasources lineage, origins resources tag domains, tags version
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
This API supports a REST model for accessing a set of resources through a fixed set of operations. The following resources are accessible through the RESTful model: ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
auth/login /auth/logout /v1/metadata/collection /v1/metadata/datasource /v1/metadata/lineage /v1/metadata/lineage/children /v1/metadata/lineage/parents /v1/metadata/origin /v1/metadata/origin/allorigins /v1/metadata/origin/landing /v1/metadata/origin/origins /v1/metadata/origin/resources /v1/metadata/resource /v1/metadata/tagdomain /v1/metadata/tag /v1/metadata/tagassociation/field /v1/metadata/tagassociation/field/fields /v1/metadata/tagassociation/field/tags /v1/metadata/tagassociation/resource /v1/metadata/tagassociation/resource/resources /v1/metadata/tagassociation/resource/tags /version
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
11
auth/login
Waterline Data Inventory
auth/login Waterline Data Inventory API uses HTTP basic authentication scheme. It accepts the same user credentials as the Waterline Data Inventory browser application. These calls control the creation and destruction of a session cookie to validate other API calls. The session cookie is valid for the UI timeout duration, which defaults to 120 minutes. You can override this duration by editing the file: jetty-distribution-9.2.1.v20140609/waterlinedata-base/etc/waterlinedata-overridedescriptor.xml
and adding the following block: <session-config> <session-timeout>30
POST This call requests an authentication token to be used in subsequent API calls.
Request body A JSON object including a username and password. { "username" : "...", "password" : "..." }
The JSON payload consists of the following properties: Property Type Description username string Username under which operations will be performed by Waterline Data Inventory. password string Password for the username.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing a string to be used as a session authorization cookie in the header of subsequent API requests. Failures can be caused if: !
The credentials are missing or not valid. {"message":"Wrong Username/Password"}
!
The web server connection URL isn’t correct. Failed to connect to <WDI host name> port 8062: Connection refused
12
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
auth/login
or if the provided host name isn’t correct: Could not resolve host: <provided host name>
!
The repository database isn’t available. If the Waterline Data Inventory repository is running on Derby: {"message":"Exception [EclipseLink-4002] (Eclipse Persistence Services 2.5.2.v20131113-a7346c6): org.eclipse.persistence.exceptions.DatabaseException\nInternal Exception: java.sql.SQLNonTransientConnectionException: java.net.ConnectException : Error connecting to server sandbox.hortonworks.com on port 4,444 with message Connection refused.\nError Code: 40000\nQuery: ReadAllQuery(name=\"getUserProfileByPrimaryUserName\" referenceClass=PersistentUserProfile sql=\"SELECT ID, KLASS, PAYLOAD, PRIMARYUSERNAME FROM WD__USERPROFILE WHERE (PRIMARYUSERNAME = ?)\")"}
Sample Invocation The POST /auth/login call expects query parameters for a user name and password.
cURL curl -H "Content-Type:application/json" -X POST \ -d '{ "username": "waterlinedata", "password": "waterlinedata" }' \ "http://<WDI-host-name>:8082/waterlinedata/auth/login" -c cookie.txt
Java // Construct the JSON request payload. String data = "{ \"username\": \"" + "waterlinedata" + "\", \"password\": \"" + "waterlinedata" }"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/waterlinedata/auth/"); // Send the request with user credentials ClientResponse response1 = webResource1.path("login") .type(MediaType.APPLICATION_JSON) .post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class); // Extracting sessionid WDISessionId from response cookies List cookie = response1.getCookies(); sessionId = cookie.get(1).toString(); System.out.println(sessionId + ":" + responseOutput);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
13
/auth/logout
Waterline Data Inventory
/auth/logout Waterline Data Inventory API uses HTTP basic authentication scheme. It accepts the same user credentials as the Waterline Data Inventory browser application. These calls control the creation and destruction of a session cookie to validate other API calls. The session cookie is valid for the UI timeout duration, as configured in the web server parameter "session-‐timeout" in the file: jetty-distribution-9.2.1.v20140609/waterlinedata-base/etc/webdefault.xml
By default, this setting is 30 minutes.
POST Close the API session. This call logs out of the session identified by the authorization cookie passed in the header of the request. Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "User logged out." Sample Invocation cURL curl -H "Content-Type:application/json" -X POST \ "http://<WDI-host-name>:8082/waterlinedata/auth/logout"
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/waterlinedata/auth/"); // Send the logout request ClientResponse response1 = webResource1.path("logout").type(MediaType.APPLICATION_JSON) .post(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
14
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/collection
/v1/metadata/collection Waterline Data Inventory "collections" are sets of files with the same or slowly evolving schema. These files are organized in a single folder or sets of subfolders within a single folder. Think of collections as a way to represent a single set of data that happens to be distributed across multiple files and over time or across geographies. Waterline Data Inventory discovers files that qualify as collections; you can also manually identify a set of files as a collection. Waterline Data Inventory supports two types of collections: ! !
Snapshots: Each file in a snapshot holds essentially the same data where each new file may include additional data or additional fields. Partitions: The data set is distributed across many files where the data does not overlap except for values in the partition fields.
In both cases, the collection represents the superset of data and the most recent schema. Waterline Data Inventory validates the type of collection against the files inside the specified folder. For any collection: ! ! ! !
The number of files in the folder (or its subfolders) must be greater than the value set by waterlinedata.discovery.smallest.collection.size (default is 3 files). All files in the folder have the same file type. The files have the same schema or only newer files have added fields at the end of the list of fields. If the fields in the files do not have names, there needs to be at least ten fields in the oldest schema.
In addition, partitioned collections must have: !
At least one field includes non-‐overlapping values across all the files.
The metadata for a collection combines features of folders and files: it includes child information to describe the files or folders contained in the collection; it also contains field information. Just as you would in the Waterline Data Inventory browser application, you can choose to manage the collection as a single aggregation of data (like a file) or as a set of individual files (like a folder). Note that the directory structure that makes up a collection can be any number of levels deep. For example, a typical collection might be log files collected an hour at a time. The collection might include a folder for each year that logs were collected. The year folders would contain folders for each month; the month folders would contain folders for each day; the day folders would contain files for each hour. The following operations are supported on this resource: ! ! !
POST DELETE GET
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
15
/v1/metadata/collection
Waterline Data Inventory
POST Identify an folder as a collection. The folder must already exist as a resource in the Waterline Data Inventory repository. The request contains the data source and path for the folder resource and the type of collection, either SNAPSHOT or PARTITION. Request body JSON payload request that identifies the folder and the type of collection. { "dataSource" : "...", "path" : "...", "type" : "..." }
The JSON payload consists of the following properties: Property
Type
Description
dataSource
string Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
type
string The collection type, either SNAPSHOT or PARTITION.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Collection created." Failures can occur when: !
The specified folder does not exist in the repository. Check the path to make sure it points to an existing folder. Another way this can happen is when the folder has not yet been profiled. For example, if the folder is not profiled and no entry for it exists in the Waterline Data Inventory repository: {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}
!
The folder fails to qualify as a collection, either because the contained files don’t share the same schema: {"error":512,"message":"Error while setting collection type:Unable to create PARTITION collection: File schemas do not match"}
16
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/collection
Or the folder does not have enough files. This error indicates that there are fewer files in the folder or the subdirectories than the setting from discovery.properties: waterlinedata.discovery.smallest.collection.size. This value defaults to 3. {"error":512,"message":"Error while setting collection type:Unable to create PARTITION collection: Folder does not have enough files"}
You may also see this error if the folder exists in the repository but has not been profiled. Sample Invocation First call POST auth/login to get a session cookie. This Collection call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X POST \ -d '{ "dataSource":"", "path":"/user/waterlinedata/Landing/restaurant_inspections/elp", "type":"PARTITION" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt
Java // Collect the data source and path names for the request payload. String sData = "{\"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"type\":\"" + sType + "\"}"; //System.out.println(sData); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); //Send the request with the request payload and session cookie. ClientResponse response1 = webResource1.path("collection") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
17
/v1/metadata/collection
Waterline Data Inventory
DELETE Removes the collection identifier from a folder. This call removes the aggregated data object and returns the folder to a folder resource. It does not delete the folder from the repository. The request contains the data source and path to the existing collection.
Parameters Name
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
Resource name of the folder. This name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "fromPath=/user/me/myproject/myCollectionFolder".
string
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "The folder is no longer a collection." You see a success message even if the specified resource is not currently a collection. Failures can be caused if: !
The specified folder does not exist in the repository. {"error":3,"Cannot locate resource by dataSource = , path = ."}
!
The resource identified is not a folder. {"error":1,"Path is not a directory."}
Sample Invocation First call POST auth/login to get a session cookie. This Collection call takes query parameters for data source and path. cURL curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/elp" \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt
18
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/collection
Java // Collect the data source and path names for the query. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDataSource); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with data source, path, and session authentication cookie. ClientResponse response1 = webResource1.path("collection") .queryParams(queryParams) .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
GET Retrieves all resources identified as collections from the repository. The response includes a list of resources augmented with the collection type (SNAPSHOT or PARTITION).
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing a list of resource descriptions: Property
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
owner
string
Indicates the file system owner of the file, folder, or table.
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
type
string
Whether the collection is a SNAPSHOT or PARTITION.
Sample Invocation First call POST auth/login to get a session cookie.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
19
/v1/metadata/collection
Waterline Data Inventory
cURL curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/collection" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with data source, path, and session authentication cookie. ClientResponse response1 = webResource1.path("collection") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
20
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/datasource
/v1/metadata/datasource In Waterline Data Inventory, a data source describes an HDFS cluster or Hive instance. Waterline Data Inventory needs a data source description to fully qualify a resource, where a "resource" is a folder, file, collection, or table.
GET Retrieve information for all data sources configured for this system: an HDFS cluster and, if configured, a Hive instance. Information includes:
!
The name of the data source in the form of a URI
!
A description of the data source, which is available in the repository but not used in Waterline Data Inventory UI
!
The data source type, including HDFS or Hive
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON payload with a count of the number of data sources included and an array of objects that describe the data sources in the repository. { "count" : ..., "datasources" : [ { "name" : "...", "description" : "...", "type" : "..." }, ... ] }
The JSON payload consists of the following properties: Property
Type
Description
count
long
The n umber o f d ata s ources i ncluded i n t he i nventory.
datasources array of data sources
Container f or t he l ist o f d ata s ources.
name
Data source name. A string formatted as a URI that identifies the data source. For example: HDFS: "hdfs://:8020" Hive: "jdbc:hive2://localhost:10000"
string
description string
Description is not used in the Waterline Data Inventory user interface; it is only available through the API.
type
Type can include "HdfsDataSource" or "HiveDataSource".
string
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
21
/v1/metadata/datasource
Waterline Data Inventory
Sample Invocation First call POST auth/login to get a session cookie. cURL curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/datasource" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send request with session cookie. ClientResponse response1 = webResource1.path("datasource") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
22
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/lineage
/v1/metadata/lineage Waterline Data Inventory maintains relationships among files and tables that share data or schemas. These relationships are arranged chronologically such that newer files are considered "children" of older files. This chain of relationships forms the lineage of a given resource. Waterline Data Inventory can discover these relationships or the relationships can be identified and applied in Waterline Data Inventory's repository. When Waterline Data Inventory discovers lineage relationships, each relationship is identified as "suggested". Users can validate a relationship by identifying it as "accepted". When a lineage relationship is added using the API, it is created as an accepted relationship, even if it already existed in the repository as a suggested relationship. Applications can perform the following lineage operations: ! ! ! !
Create a lineage relationship between two existing resources POST /v1/metadata/lineage Remove a lineage relationship between two resources DELETE /v1/metadata/lineage Retrieve the next earlier relationship(s) for a resource (parents) GET /v1/metadata/lineage/children Retrieve the next later relationship(s) for a resource (children) GET /v1/metadata/lineage/parents
The following operations are supported on the resource /v1/metadata/lineage: ! !
POST DELETE
POST Create a lineage relationship between two existing resources. This operation can apply to existing files, collections, or Hive tables. Request body A JSON object indicating the two resources to relate and a description of the relationship. { "fromDataSource" : "...", "fromPath" : "...", "toDataSource" : "...", "toPath" : "...", "description" : "..." }
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
23
/v1/metadata/lineage
Waterline Data Inventory
The JSON payload consists of the following properties: Property
Type
Description
fromDataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
fromPath
string
Resource name of the parent resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "fromPath=/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "fromPath=financedb.revenue".
toDataSource
string
Data source of the child resource.
toPath
string
Resource name of the child resource.
description
string
(Optional) A description of the relationship. Use this field to indicate how the resources are related, such as which parent fields map to child fields. The description is not used in the Waterline Data Inventory user interface; it is only available through the API.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Lineage created successfully." Failures can be caused if: ! ! !
Either of the specified resources does not exist in the repository. This can happen if the data has not yet been profiled. One of the specified resources is a folder. Folders can't be included in lineage relationships. A lineage relationship already exists that conflicts with the specified relationship, such as if the "to" resource is already indicated as a parent of the "from" resource.
Sample Invocation First call POST auth/login to get a session cookie. This Lineage call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X POST \ -d '{"fromDataSource":"", "fromPath":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv", "toDataSource":"", "toPath":"/user/waterlinedata/Lestrade/wrangled_inspections/d/14/8_8.csv" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/lineage" \ -b cookie.txt
Java // Construct the request payload, including "from" resource's data source and path, "to" resource data source and path. String sData ="{\"fromDataSource\":\"" + sFromDS + "\",\"fromPath\":\"" + sFromPath +
24
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/lineage "\",\"toDataSource\":\"" + sToDS + "\",\"toPath\":\"" + sToPath + "\",\"description\":\"" + sDescription +"\"}";
// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with the payload and session cookie. ClientResponse response1 = webResource1.path("lineage").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
DELETE Remove existing lineage relationships. This call can be used in the following ways: ! ! !
Delete a specific lineage relationship: specify both "from" and "to" resources. Delete all parent lineage relationships for a given resource: specify only the "to" resource. Delete all child lineage relationships for a given resource: specify only the "from" resource.
If no lineage relationship exists between the specified resource or resources, the call returns a success message. Parameters Name
Type
Description
fromDataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
fromPath
string
Resource name of the parent resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "fromPath=/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "fromPath=financedb.revenue".
toDataSource
string
Data source of the child resource.
toPath
string
Resource name of the child resource. For example, "toPath=/user/me/myproject/myfile". If toDataSource and toPath are not specified, all children for the "from" resource are removed.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Lineage deleted successfully." Failures can be caused if: © 2014 - 2016 Waterline Data, Inc. All rights reserved.
25
/v1/metadata/lineage ! !
Waterline Data Inventory
Either of the specified resources does not exist in the repository. This can happen if the data has not yet been profiled. One of the specified resources is a folder. Folders can't be included in lineage relationships.
Sample Invocation First call POST auth/login to get a session cookie. This Lineage call takes parameters as part of the query. cURL curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "fromDataSource=" \ --data-urlencode "fromPath=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"\ --data-urlencode "toDataSource=" \ --data-urlencode "toPath=/user/waterlinedata/Lestrade/wrangled_inspections/d/14/8_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/lineage" \ -b cookie.txt
Java // Construct the URL parameters, including the "from" data source and path and the "to" data source and path. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("fromDataSource", sParentDataSource); queryParams.add("fromPath",sParentPath); queryParams.add("toDataSource", sChildDataSource); queryParams.add("toPath",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including the query parameters and the session cookie. ClientResponse response1 = webResource1.path("lineage").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
26
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/lineage/children
/v1/metadata/lineage/children Use this call to retrieve a list of the children of a given resource. The children are the newer resources that have accepted or suggested lineage relationships with the specified resource. Typically, the children will be copies or transformed versions of the parent where at least 2 fields overlap between parent and child.
GET Retrieve the child resource(s) to which the specified resource contributes. Parameters Name
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: "hdfs://:8020" Hive: "jdbc:hive2://localhost:10000"
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing a list of zero or more resources identified by data source and path. An empty set returns if there are no children associated with the specified resource. [{ "dataSource" : "...", "path" : "...", "owner" : "...", "lineageState" : "...", "lastChange" : "..." }, ...additional children... }]
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
27
/v1/metadata/lineage/children
Waterline Data Inventory
Name
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: "hdfs://:8020" Hive: "jdbc:hive2://localhost:10000"
path
string
Resource name of the child resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
owner
string
Resource owner as identified in the last profiling operation.
lineageState
string
Whether the lineage relationship suggested by Waterline Data Inventory (Suggested) or was it created or approved by a user or through the API (Approved).
lastChange
timestamp The time when this lineage relationship was created, accepted, or rejected. The time is in GMT and formatted as yyyy-‐MM-‐dd hh:mm:ss.SSS zzz For example: "lastChange":"2016-‐02-‐16 20:42:29.037 GMT".
Failures can be caused if:
! !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. The specified resources is a folder. Folders can't be included in lineage relationships.
Sample Invocation First call POST auth/login to get a session cookie. This Lineage call takes parameters as part of the query. cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/restaurant_inspections/inspections.csv"\ "http://<WDI-host-name>:8082/api/v1/metadata/lineage/children" \ -b cookie.txt
Java // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sChildDataSource); queryParams.add("path",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("lineage/children").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
28
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/lineage/parents
/v1/metadata/lineage/parents Use this call to retrieve a list of the parents of a given resource. The parents are the older resources that have accepted or suggested lineage relationships with the specified resource. Typically, the parents will be source data where at least 2 fields overlap between parent and child.
GET Retrieve resources from which the specified resource is derived.
Parameters Name
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: "hdfs://:8020" Hive: "jdbc:hive2://localhost:10000"
path
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
string
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing a list of zero ore more resources identified by data source and path. An empty set returns if there are no children associated with the specified resource. A successful call returns a JSON object containing a list of zero or more resources identified by data source and path. [{ "dataSource" : "...", "path" : "...", "owner" : "...", "lineageState" : "...", "lastChange" : "..." }, ...additional parents... }]
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
29
/v1/metadata/lineage/parents
Waterline Data Inventory
Name
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: "hdfs://:8020" Hive: "jdbc:hive2://localhost:10000"
path
string
Resource name of the parent resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
owner
string
Resource owner as identified in the last profiling operation.
lineageState
string
Whether the lineage relationship suggested by Waterline Data Inventory (Suggested) or was it created or approved by a user or through the API (Approved).
lastChange
timestamp The time when this lineage relationship was created, accepted, or rejected. The time is in GMT and formatted as yyyy-‐MM-‐dd hh:mm:ss.SSS zzz For example: "lastChange":"2016-‐02-‐16 20:42:29.037 GMT".
Failures can be caused if: ! !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. The specified resource is a folder. Folders can't be included in lineage relationships.
Sample Invocation First call POST auth/login to get a session cookie. This call takes parameters in the query. cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"\ "http://<WDI-host-name>:8082/api/v1/metadata/lineage/parents" \ -b cookie.txt
Java // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sChildDataSource); queryParams.add("path",sChildPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("lineage/parents").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
30
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin
/v1/metadata/origin An "origin" in Waterline Data Inventory indicates a landing label given to the files or tables that arrived in HDFS from an external source that is then propagated through lineage relationships. If no origin label exists for a resource, the resource cannot be traced through lineage to a resource with a landing label. While a resource marked as a landing can have only one origin, it is possible for other resources to have more than one origin. Multiple origins indicate the resource has elements of more than one ancestor and that the ancestors trace back to a resource with a landing label. Applications can perform the following origin operations: ! ! ! ! ! ! ! !
Create an origin label POST /v1/metadata/origin Update an origin label or description PUT /v1/metadata/origin Delete an origin label DELETE /v1/metadata/origin List all origin labels GET /v1/metadata/origin/allorigins Associate an existing origin label with a resource (Mark as landing) POST /v1/metadata/origin/landing Remove an origin label from a resource (Unmark as landing) DELETE /v1/metadata/origin/landing List the origin(s) associated with a resource GET /v1/metadata/origin/origins List the resources that can be traced back to an origin GET /v1/metadata/origin/resources
The following operations are supported on this resource (/v1/metadata/origin): ! ! !
POST PUT DELETE
POST Create an origin label. To use this label to mark a resource as a "landing", call POST /v1/metadata/origin/landing. Request body A JSON object containing the origin name and description. { "origin" : "...", "description" : "..." }
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
31
/v1/metadata/origin
Waterline Data Inventory
The JSON payload consists of the following properties: Property
Type Description
origin
string Origin name. Limited to 256 characters.
description string Origin description. Limited to 512 characters.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Origin created successfully." Failures can be caused if: !
The specified origin already exists in the repository.
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X POST \ -d '{"origin":"data.gov", "description":"Public data download for US government" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt
Java // Construct the JSON payload with the origin name and description. String data = "{\"origin\":\"" + sOriginName + "\",\"description\":\"" + sOriginDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("origin").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
32
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin
PUT Update an origin label or description. The new origin name is updated on any resource with this origin. Request body A JSON object containing the existing origin name and either a new name, a new description, or both. { "origin" : "...", "newOrigin" : "...", "newDescription" : "..." }
The JSON payload consists of the following properties: Property
Type Description
origin
string Existing origin name.
newOrigin
string New origin name. Limited to 256 characters.
newDescription string New description. Limited to 512 characters.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Origin updated successfully." Failures can be caused if: !
The specified origin does not exist in the repository.
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X PUT \ -d '{"origin":"data.gov", "newDescription":"Public data download from US government (updated)" }' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt
Java // Construct the JSON payload with the origin name and description. String data = "{\"origin\":\"" + sOriginName + "\",\"description\":\"" + sOriginDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
33
/v1/metadata/origin
Waterline Data Inventory
ClientResponse response1 = webResource1.path("origin").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
DELETE Delete an origin label from the system. Deleting an origin removes that origin label from resources that are "marked as landing" with that origin. Parameters Name
Type
origin
string Origin label.
Description
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Origin deleted successfully." Failures can be caused if: !
The specified origin does not exist in the repository. {"error":3,"message":"Cannot locate the origin . Please make sure the origin was previously created."}
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes an origin name as part of the query. cURL curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "origin=human-resources" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with origin name and session cookie. ClientResponse response1 = webResource1.path("origin").queryParam("origin",sOriginName).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
34
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin/allorigins
/v1/metadata/origin/allorigins GET Retrieve information for all origins created in the system. The information includes the origin names and their descriptions. This origin call has no parameters. Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns JSON payload with the count of origins and an array of zero or more objects including the name and description of all origins. { "count" : ..., "origins" : [ { "name" : "...", "description" : "..." }, ... ] }
The JSON payload consists of the following properties: Property
Type
Description
count
long
Count of origin labels in the system.
origins
array of origins An array of origin objects, including the origin name and description.
name
string
description string
Origin name. Limited to 256 characters. Origin description. Limited to 512 characters.
Sample Invocation First call POST auth/login to get a session cookie. This Origin call has no parameters. cURL curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/allorigins" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send request with session cookie. ClientResponse response1 = webResource1.path("origin/allorigins").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
35
/v1/metadata/origin/landing
Waterline Data Inventory
/v1/metadata/origin/landing The following operations are supported on this resource:
! !
POST DELETE
POST Mark a resource as a landing. Both resource and origin used to indicate the landing must exist in the repository. If the resource is already marked as a landing, this call will replace the existing landing label. If the resource has parent lineage relationships, this call will remove those lineage relationships. Request body A JSON object containing the origin label and the data source and name to identify the resource. { "dataSource" : "...", "path" : "...", "origin" : "..." }
The JSON payload consists of the following properties: Property
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
origin
string
Origin name.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Landing created successfully." Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}
36
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference !
/v1/metadata/origin/landing
The specified origin does not exist in the repository. Call POST /v1/metadata/origin to create the origin before using it to mark a landing. {"error":3,"message":"Origin is not found. Please first create the origin."}
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X POST \ -d '{"dataSource":"", "path":"/user/waterlinedata/Landing/nyc_open_data/", "origin"="NYC Open Data"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/landing" \ -b cookie.txt
Java // Construct the String data = "{ "\", "\",
JSON request, including the data source and path for the resource. \"dataSource\" : \"" + ds + \"path\" : \"" + path + \"origin\":\"" + origin + "\"}";
// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("origin/landing").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
37
/v1/metadata/origin/landing
Waterline Data Inventory
DELETE "Unmark" the landing label from a resource. This call will remove the landing label from all subfolders and files, whether they all have the same landing label or a different label. The origin is not removed from the list of origins.
Parameters Name
Type
dataSource
string Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
Description
path
string Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile".
origin
string The landing label.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Landing deleted successfully." If the resource exists but is not already marked with the landing, the success message returns and nothing is done. Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes the resource information and origin as query parameters. cURL curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Adler/hdfs_u1/AL1" \ --data-urlencode "origin=human-resources" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/landing" \ -b cookie.txt
38
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin/landing
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("origin", sOriginName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://<WDI-hostname>:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("origin/landing").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
39
/v1/metadata/origin/origins
Waterline Data Inventory
/v1/metadata/origin/origins GET Retrieve the origin or origins assigned to a resource.
Parameters Name
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON payload with the count of origins and an array of zero or more objects that include the name of the origin or origins. Folders will return zero results unless they are marked with a landing. { "count" : ..., "origins" : [ { "name" : "...", "description" : "..." }, ... ] }
The JSON payload consists of the following properties: Property
Type
Description
count
long
Count of origin labels in the system.
origins
array of origins An array of origin objects, including the origin name and description.
name
string
description string
Origin name. Limited to 256 characters. Origin description. Limited to 512 characters.
Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}
Sample Invocation First call POST auth/login to get a session cookie. This origin call takes parameters as part of the query.
40
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin/origins
cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "dataSource=" \ --data-urlencode \ "path=/user/waterlinedata/Sherlock/dohmh_inspections/restaurants_sep_2014_vc.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/origins" \ -b cookie.txt
Java // Construct the query parameters, including data source and path of the resource. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDS); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("origin/origins").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
41
/v1/metadata/origin/resources
Waterline Data Inventory
/v1/metadata/origin/resources GET Retrieves all resources identified with a given origin. The resources could be marked with the origin as a landing or they could be derived from other resources marked with the origin as a landing.
Parameters Name Type origin
Description
query Origin label.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing a count and an array containing zero or more objects that include the details of a resource, including whether it is marked as a landing. { "count" : ..., "resources" : [ { "landing" : false, "dataSource" : "...", "owner" : "...", "path" : "..." }, ... ] }
The JSON payload consists of the following properties:
42
Property
Type
Description
count
long
Number of resources returned.
resources
array of resources
An array of resource objects.
landing
Boolean
An indication of whether this resource is the landing resource for a given origin.
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
owner
string
Indicates the file system owner of the file, folder, or table.
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/origin/resources
Failures can be caused if: !
The specified origin does not exist in the repository. {"error":3,"message":"Cannot locate the origin . Please make sure the origin was previously created."}
Sample Invocation First call POST auth/login to get a session cookie. This Origin call takes an origin name as part of the query. cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "origin=data.gov" \ "http://<WDI-host-name>:8082/api/v1/metadata/origin/resources" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with the origin as a query parameter and with session cookie. ClientResponse response1 = webResource1.path("origin/resources").queryParam("origin", sOriginName).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
43
/v1/metadata/resource
Waterline Data Inventory
/v1/metadata/resource In Waterline Data Inventory, a "resource" describes an HDFS file or folder, a set of files organized in folders that is identified as a Waterline Data Inventory collection, or a Hive table. Waterline Data Inventory needs a data source description to fully qualify a resource. If a file or table has not been profiled by Waterline Data Inventory, it won't have a corresponding resource in the Waterline Data Inventory repository.
GET Get details of a particular resource (file, folder, collection, or Hive table). This is the core call for exporting Waterline Data Inventory metadata, such as tags, origins, or lineage. If the resource indicated is a Waterline Data Inventory collection, this call returns details of the aggregated schema and lists the files or directories included in the collection.
Parameters Name
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
The name of the resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
verbose
Boolean (optional) The level of detail of the reply. The brief reply does not include profiling information for each field in the file or table.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing resource details. { "dirEntries" : [ { "path" : "...", "owner" : "..." }, ... ], "fieldCount" : ..., "path" : "...", "name" : "...", "owner" : "...", "size" : ..., "fileType" : "...", "timeOfCreation" : "...", "timeOfLastAccess" : "...", "timeOfLastChange" : "...", "timeOfLastProfile" : "...", "recordCount" : ...,
44
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/resource
"bytesProfiled" : ..., "profileType" : "...", "inputFormat" : "...", "serde" : "...", "landing" : false, "numChildren" : ..., "filePermission":"...", "fileFormat":"...", "hiveDatabase":"...", "fileSeparator": "..." "dirEntryCount" : ..., "collectionInfo" : { "relatedObjectsCount" : ..., "partitionColumn" : "...", "userCreatedCollection" : false }, "tags" : [ { "tagDomain" : "...", "tag" : "..." }, ... ], "origins" : [ "...", ... ], "fields" : [ { "field" : { "fieldNo" : ..., "name" : "...", "type" : "..." }, "tags" : [ { "tagDomain" : "...", "tag" : "..." }, ... ], "fieldProfile" : { "rowCount" : ..., "nullCount" : ..., "cardinality" : ..., "selectivity" : ..., "maxValue" : "...", "minValue" : "...", "stringCount" : ..., "stringCardinality" : ..., "stringSelectivity" : ..., "minString" : "...", "maxString" : "...", "numericCount" : ..., "numericCardinality" : ..., "numericSelectivity" : ..., "maxNumeric" : ..., "minNumeric" : ..., "mean" : ..., "stdDeviation" : ..., "dateCount" : ..., "dateCardinality" : ..., "dateSelectivity" : ..., "maxDate" : "...", "minDate" : "...", "minBoolean" : ..., "maxBoolean" : ..., "booleanCount" : ...,
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
45
/v1/metadata/resource
Waterline Data Inventory
"booleanCardinality" : ..., "booleanSelectivity" : ..., "modeValue" : "...", "modeCount" : ..., "additionalDataTypes" : [ "...", ... ], "finalRegexCounts" : { "..." : ..., "---" : ... }, "dateFormatCounts" : { "..." : ..., "---" : ... }, "numericFormatCounts" : { "..." : ..., "---" : ... } } }, ... ], "resourceState" : "...", "dirEntries" : [{ "path" : "...", "owner" : "...", },...] }
The JSON payload consists of the following properties:
46
Property
Brief Type
Description
name
x
string
The name of the file, folder, collection, or table without any path information.
owner
x
string
File-‐system owner of this resource.
path
x
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
size
x
long
In bytes. Folders have size=0. The size of a collection resource indicates the aggregated size of the files in the collection.
fileType
x
string
FOLDER, HIVE, CSV, XML, JSON, etc. This value appears as "Content Type" in the Waterline Data Inventory user interface.
timeOfCreation
x
dateTime
Timestamp of when the resource was created. Note that this information is not generated for HDFS files.
timeOfLastAccess
x
dateTime
Timestamp, in GMT, of last time the resource was read. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz".
timeOfLastChange
x
dateTime
Timestamp, in GMT, of last time the resource was written to. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz".
timeOfLastProfile
x
dateTime
Timestamp, in GMT, of last time the resource was profiled by Waterline Data Inventory. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz".
recordCount
x
long
Number o f r ecords i n a f ile o r t able r esource. F or c omplex d ata, t his i s the n umber o f " records" r ather t han r ows a nd m ay b e d ifferent f rom the c ounts f or i ndividual f ields.
bytesProfiled
x
long
If the file, collection, or table is sampled (see profileType), how many bytes were used to generate the sample. Corresponds to the profiler property waterlinedata.profile.sampled.fraction.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/resource
Property
Brief Type
Description
profileType
x
string
Whether the metadata for this file, collection, or table is collected from all the data in the file (FULL) or from a sample of the data (SAMPLE).
inputFormat
x
string
Input format used to read the file.
serde
x
string
The SerDe used to serialize and de-‐serialize the file, collection, or table.
landing
x
Boolean
If "true", this resource has no ancestors in the cluster: it is the first place this data has arrived in the cluster. The landing label appears as an "origin" for data derived from this data and linked through lineage relationships. If the resource is a collection or folder, by default all files contained in the resource are identified with the same landing.
numChildren
x
long
The number of files or folders contained one level down inside this resource.
filePermission
x
string
A string representing the file permissions from HDFS. For example, "-‐rw-‐r-‐-‐r-‐-‐".
fileFormat
x
string
FOLDER, Hive, CSV, XML, JSON, etc. This value appears as "Content Type" in the Waterline Data Inventory user interface.
fileSeparator
x
string
For flat files, the file separator discovered by Waterline Data Inventory and used for profiling. If the separator is an ASCII code, the output includes a Java escape character to ensure the output appears properly.
hiveDatabase
x
string
Hive database.
dirEntryCount
x
long
Number of items in dirEntries.
collectionInfo
x
collectionInfo Description of the contents if the resource is a collection.
relatedObjectsCount x
int
Number of files in the collection. The files may be organized in any number of folders.
collectionType
x
string
PARTITION or SNAPSHOT. Collection discovery creates only partition type collections.
collectionCreateTime x
string
Timestamp, in GMT, of when the collection was created, whether through collection discovery or manually. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz".
partitionColumn
string
Not used.
userCreatedCollection x
Boolean
If "true", this collection was created when identified by a user (or through the API), not discovered by Waterline Data Inventory.
tags
x
array of tags
A list of tags associated with the resource. The list includes the tag domain and tag name.
tagDomain
x
string
Tag domain name.
tag
x
string
Tag name.
origins
x
array of origins (string)
One or more origin labels from parents of this file, collection, or table. Origins are propagated from landings by Waterline Data Inventory in the origin propagation batch job. To ensure that a resource has the appropriate origin, run Waterline Data Inventory's lineage discovery process or assign lineage relationships to this resource (see POST /v1/metadata/lineage).
fieldCount
x
int
For file, collection, and table resources. Zero is returned for folders.
fields
x
array of fields One or more fields included in this resource. Does not apply to folders. Field information includes field profiling metadata and tags associated with the field.
field
field
Wrapper for the field index number, name, and type.
fieldNo
integer
Index of the field in the file or table. Zero start value.
name
string
Field name.
type
string
Field type. This is the type from the original resource, not a discovered type.
tags
array of tags
List of zero or more tags associated with this field.
x
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
47
/v1/metadata/resource
48
Waterline Data Inventory
Property
Brief Type
Description
tagDomain
string
Tag domain name.
tag
string
Tag name.
fieldProfile
container
Wrapper for field profile information. This information is populated when the resource is profiled.
rowCount
long
Number o f r ows i n t he f ile o r t able t hat i nclude t his f ield. T his v alue varies b y f ield w hen t he f ile o r t able i s b ased o n c omplex d ata f ormats such a s J SON o r X ML.
nullCount
long
Number o f n ull v alues i n t he f ield.
cardinality
long
Number o f u nique v alues i n t he f ield a ssuming t he d efault d ata t ype.
selectivity
double
Number o f u nique v alues o ver t he n umber o f n on-‐null v alues i n t he field a ssuming t he d efault d ata t ype.
maxValue
string
Maximum n umeric v alue o r l ast a lphabetic v alue i n t he f ield a ssuming the d efault d ata t ype.
minValue
string
Minimum n umeric v alue o r f irst a lphabetic v alue i n t he f ield a ssuming the d efault d ata t ype.
stringCount
long
Count o f v alues w hen t he d ata t ype i s d iscovered a s s tring.
stringCardinality
long
Cardinality w hen t he d ata t ype i s d iscovered a s s tring.
stringSelectivity
double
Selectivity w hen t he d ata t ype i s d iscovered a s s tring.
minString
string
First a lphabetic v alue w hen t he d ata t ype i s d iscovered a s s tring.
maxString
string
Last a lphabetic v alue w hen t he d ata t ype i s d iscovered a s s tring.
numericCount
long
Count o f v alues w hen t he d ata t ype i s d iscovered a s n umeric.
numericCardinality
long
Cardinality w hen t he d ata t ype i s d iscovered a s n umeric.
numericSelectivity
double
Selectivity w hen t he d ata t ype i s d iscovered a s n umeric.
maxNumeric
double
Maximum n umeric v alue w hen t he d ata t ype i s d iscovered a s n umeric.
minNumeric
double
Minimum n umeric v alue w hen t he d ata t ype i s d iscovered a s n umeric.
mean
double
Mean o f f ield v alues w hen t he d ata t ype i s d iscovered a s n umeric.
stdDeviation
double
Standard d eviation o f f ield v alues w hen t he d ata t ype i s d iscovered a s numeric.
dateCount
long
Count o f v alues w hen t he d ata t ype i s d iscovered a s d ate.
dateCardinality
long
Cardinality w hen t he d ata t ype i s d iscovered a s d ate.
dateSelectivity
double
Selectivity w hen t he d ata t ype i s d iscovered a s d ate.
maxDate
dateTime
Latest d ate v alue w hen t he d ata t ype i s d iscovered as d ate.
minDate
dateTime
Earliest d ate v alue w hen t he d ata t ype i s d iscovered a s d ate.
minBoolean
Boolean
Maximum n umeric v alue w hen t he d ata t ype i s d iscovered a s B oolean.
maxBoolean
Boolean
Minimum n umeric v alue w hen t he d ata t ype i s d iscovered a s B oolean.
booleanCount
long
Count o f v alues w hen t he d ata t ype i s d iscovered a s B oolean.
booleanCardinality
byte
Cardinality w hen t he d ata t ype i s d iscovered a s B oolean.
booleanSelectivity
double
Selectivity w hen t he d ata t ype i s d iscovered a s B oolean.
modeValue
string
modeCount
long
additionalDataTypes
array of data Data t ypes f ound i n t he f ield v alues i n a ddition t o t he p rimary d ata types (string) type i ndicated b y " type".
finalRegexCounts
array of key-‐ value pairs
Number o f v alues t hat m atched t ags w ith r egular e xpression r ules. L ist includes a n e ntry f or e ach r egular e xpression t ag.
dateFormatCounts
array of key-‐ value pairs
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference Property
/v1/metadata/resource Brief Type
Description
numericFormatCounts
array of key-‐ value pairs
resourceState
x
string
Status of profiling for the resource. Values include DELETED, CRAWLED, PROCESSED, UNPROCESSED, RECOGNIZED, UNRECOGNIZED, PROFILED, PROFILE FAILED.
dirEntries
x
array of directories
List of files or folders included in a folder resource. Each folder is identified by its file system owner and its path.
path
x
string
Resource name of the subfolder or file included inside a folder resource.
owner
x
string
Resource owner of the subfolder or file included inside a folder resource.
Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate resource by dataSource = , path = ."}
Sample Invocation First call POST auth/login to get a session cookie. This Resource call takes parameters as part of the query. cURL curl -H "Content-Type:application/json" -X GET
\
-G --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/Landing/data.gov/mo_alc_lic.csv" \ --data-urlencode "verbose=true" \ "http://<WDI-host-name>:8082/api/v1/metadata/resource" \ -b cookie.txt
Java // Construct the query parameters, including data source and path of the resource. // Verbose is left out to return the more brief version of the response. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", sDS); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("resource").queryParams(queryParams).header("Cookie", sSessionId).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
49
/v1/metadata/tagdomain
Waterline Data Inventory
/v1/metadata/tagdomain By default, Waterline Data Inventory tags are organized into "domains." Applications working with tags would need to specify a domain to fully quality a tag. Supported domains include a domain dedicated to built-‐in or system tags and a domain dedicated to user-‐defined tags. ! ! ! !
POST GET PUT DELETE
POST Create a tag domain. The user making the call must be assigned an administrator role in Waterline Data Inventory.
Request body A JSON object containing a tag domain name and description. {
}
"name" : "...", "description" : "..."
The JSON payload consists of the following properties: Property
Type
Description
name
string
Tag domain name. Limited to 256 characters.
description
string
Domain description. Limited to 512 characters.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag domain created successfully." Failures can be caused if: !
The authenticated user making the call doesn't have an Administrator role. {"message":"User cannot access the resource."}
!
A domain with the same name already exists. {"error":500,"message":"Create tag domain failed.","additionalDetail":"Tag domain with name Ad Hoc Tags exists"}
Sample Invocation First call POST auth/login to get a session cookie. This call takes parameters as part of the JSON payload.
50
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagdomain
cURL curl -H "Content-Type:application/json" -X POST \ -d '{"name":"Operations", "description":"Domain for final operations team tags"}' "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt
\
Java // Construct the JSON request, including the data source and path for the resource. String data = "{ \"name\" : \"" + sTagDomain + "\", \"description\" : \"" + sDomainDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("tagdomain").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).post(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
PUT Update an existing tag domain name or description. This call uses the domain ID that you can retrieve using GET /v1/metadata/tagdomain. Request body A JSON object containing a tag domain name and description. {
}
"id" : "...", "name" : "...", "description" : "..."
The JSON payload consists of the following properties: Property
Type
Description
id
integer
Automatically generated tag domain ID. You can retrieve the valid values using GET /v1/metadata/tagdomain.
name
string
Tag domain name. The domain name is required even if it is not being updated. Limited to 256 characters.
description
string
(Optional) Domain description. Limited to 512 characters.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag domain updated successfully." © 2014 - 2016 Waterline Data, Inc. All rights reserved.
51
/v1/metadata/tagdomain
Waterline Data Inventory
Failures can be caused if: !
The authenticated user making the call doesn't have an Administrator role. {"message":"User cannot access the resource."}
Sample Invocation First call POST auth/login to get a session cookie. This call takes parameters as part of the JSON payload. cURL curl -H "Content-Type:application/json" -X PUT \ -d '{"id":23904, "name":"Operations", "description":"Domain for operations team"}' "http://mapr50:8082/api/v1/metadata/tagdomain" \ -b cookie.txt
\
Java // Construct the String data = "{ "\", "\",
JSON request, including the data source and path for the resource. \"id\" : \"" + sTagId + \"name\" : \"" + sTagDomain + \"description\" : \"" + sDomainDescription + "\"}";
// Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and JSON payload with resource information. ClientResponse response1 = webResource1.path("tagdomain").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).put(ClientResponse.class, data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
DELETE Delete an existing tag domain. This operation requires that all tags in the domain be deleted before deleting the domain. Parameters Name
Type
Description
name
string
Existing empty tag domain to be deleted.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag domain <domainName> was deleted." Failures can be caused if:
52
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference !
/v1/metadata/tagdomain
The tag domain does not exist. {"error":3,"message":"Cannot locate tag domain . Tag domain may have been deleted previously."}
!
The authenticated user making the call doesn't have an Administrator role. {"message":"User cannot access the resource."}
Sample Invocation First call POST auth/login to get a session cookie. The DELETE /v1/metadata/tagdomain call expects query parameters for the tag name. cURL curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "name=Operations" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("name", sTagDomain); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagdomain").queryParams(queryParams).type(MediaType.APPLICATION_JSON).h eader("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
53
/v1/metadata/tagdomain
Waterline Data Inventory
GET Retrieves all tag domains in the repository. Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON payload with a count of domains and an array of tag domains, each of which contains a domain name and description. { "count" : ..., "domains" : [ { "id" : ..., "name" : "...", "description" : "..." }, ... ] }
The JSON payload consists of the following properties: Property count domains id name description
Type long array of domains integer string string
Description Number of domains returned. List of domains, including the ID, name, and description. Auto-‐generated ID for the tag domain. Name of the domain. Description of the domain.
Sample Invocation First call POST auth/login to get a session cookie. cURL curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/v1/metadata/tagdomain" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("tagdomain").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
54
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tag
/v1/metadata/tag In Waterline Data Inventory, "tags" are annotations consisting of a label ("name") and a description, which can be associated with resources (folders, files, collection, and tables) or fields. Waterline Data Inventory maintains a glossary of tags. The glossary is organized into domains that can be created using POST /v1/metadata/tagdomain. Applications can list tags by domain, create new or update descriptions for existing tags, and delete user-‐defined tags. The following operations are supported on this resource:
! ! !
POST DELETE GET
POST Create a single tag in the specified domain. Make this call multiple times to add more than one tag. Tags names can indicate hierarchical nesting using dot notation (.); Waterline Data Inventory will generate tags for each parent if they do not already exist in the repository. For example, if the input includes a tag named "Organization.Property.Brand", Waterline Data Inventory produces three tags organized hierarchically, "Organization", "Property" and "Brand".
Request body A JSON object containing a tag name and description. { "domainName" : "...", "name" : "...", "description" : "...", "asDataFacet" : ..., "regexp" : "...", "regexpMin" : ..., "regexpMax" : ..., "regexpThreshold" : ..., "regexpTestdata" : "...", "isRegexEnabled" : ..., "isValueEnabled" : ..., "valueThreshold" : "...", "isManualTagging" : ... }'
The JSON payload consists of the following properties: Property
Type
Description
domainName
string
Domain name. Must be an existing domain.
name
string
Tag name. Limited to 256 characters.
description
string
Tag description. Limited to 512 characters.
asDataFacet
Boolean Set to true to use this tag to build a data facet for searching.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
55
/v1/metadata/tag
Waterline Data Inventory
Property
Type
Description
regexp
string
The r egular e xpression u sed t o m atch f ield v alues f or t his t ag. M ake s ure t hat a ny back s lashes i n t he e xpression a re e scaped w ith a s econd b ack s lash. F or e xample, a r egular e xpression i ndicating 3 d igits " \d{3}" w ould n eed t o b e e ntered a s "\\d{3}".
regexpMin
int
The m inimum n umber o f c haracters i n a f ield f or t he r egular e xpression t o b e applied.
regexpMax
int
The m aximum n umber o f c haracters i n a f ield f or t he r egular e xpression t o b e applied.
regexpThreshold double The m inimum n umber o f v alues i n a f ield t hat h ave t o m atch f or t his t ag t o b e applied t o t he f ield. regexpTestdata string
Test data to validate the regular expression.
isRegexEnabled Boolean The t ag i s p ropagated u sing t he r egular e xpression r ule. O nly o ne o f isRegexEnabled, i sValueEnabled, o r i sManualTagging c an b e t rue. isValueEnabled
Boolean The t ag i s p ropagated u sing v alue t agging. O nly o ne o f i sRegexEnabled, isValueEnabled, o r i sManualTagging c an b e t rue.
valueThreshold
double The m inimum t ag w eight c alculated t o a ssign t his t ag t o a f ield ( percent).
isManualTagging Boolean The t ag i s n ot a utomatically p ropagated. O nly o ne o f i sRegexEnabled, isValueEnabled, o r i sManualTagging c an b e t rue.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag created successfully." Failures can be caused if: !
One or more of the tag properties is not specified for the tag. This error also occurs if back-‐slashes (\) in the regular expression or elsewhere in call values are not escaped with another backslash. {"message":"Input JSON Object could not be mapped. Please check documentation for the appropriate format."}
!
One or more of the tag properties out of place in the JSON object. {"error":1,"message":"Invalid Request, RegExp Max length
!
must a number"}
The tag already exists. {"error":4,"message":"The tag to create is already present."}
!
The regular expression could not be evaluated or the test data provided doesn't match the regular expression. {"error":500,"message":"Create tag failed. Test didn't match, or invalid regular expression"}
Sample Invocation First call POST auth/login to get a session cookie. The POST /v1/metadata/tag call expects JSON formatted data with name and description elements. 56
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tag
cURL curl -H "Content-Type:application/json" -X POST \ -d '{"domainName":"User-defined tags", "name":"Product.ID", "description":"Product ID, ###-###-#####" , "asDataFacet":false, "regexp":"\\d{3}-\\d{3}-\\d{5}", "regexpMin":6, "regexpMax":13, "regexpThreshold":"0.75", "regexpTestdata":"102-003-10021", "isRegexEnabled":true, "isValueEnabled":false, "valueThreshold":"", "isManualTagging":false }' \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt
Java String data = "{ \"name\":\"" + sTagName + "\", \"description\":\"" + sTagDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request to the path "tag" with the authentication cookie and tag information, "data" ClientResponse response1 = webResource1.path("tag").type(MediaType.APPLICATION_JSON).header("Cookie", sSessionId).post(ClientResponse.class,data); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
DELETE Delete an existing tag. Warning: This operation deletes all tag associations that include this tag. Make this call multiple times to remove more than one tag. Parameters Name
Type
Description
name
string
Existing tag to be deleted. If the tag is included in a hierarchy of tags, include all parent tags. For example, to delete the tag "Cuisine", specify "Food Service.Cuisine".
tagDomain
string
Tag domain.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag was deleted."
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
57
/v1/metadata/tag
Waterline Data Inventory
Failures can be caused if: !
The tag does not exist. This can be caused if the tag is not qualified by its parents in the domain.
{"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}
Sample Invocation First call POST auth/login to get a session cookie. The DELETE /v1/metadata/tag call expects query parameters for tag name and domain. cURL curl -H "Content-Type:application/json" -X DELETE \ -G --data-urlencode "name=Food Service.Cuisine" \ --data-urlencode "tagDomain=User-defined Tags" \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tag").queryParams(queryParams).type(MediaType.APPLICATION_JSON).header( "Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
58
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tag
GET Retrieve all tags in a given domain or in all domains. The response includes a list of tag names and descriptions.
Parameters Name
Type
Description
tagDomain
string (optional) Domain name as returned by GET /v1/metadata/tagdomain. If not specified, tags for all domains are returned, grouped by domain.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON payload with a count of tags and an array of one tag domain, which contains an array of objects describing the tags in that domain. { "count" : ..., "tagsGroupedByDomain" : [ { "tagDomain" : "...", "count" : ..., "tags" : [ { "name" : "...", "description" : "...", "regexp" : "...", "regexpMin" : "...", "regexpMax" : "...", "regexpThreshold" : "...", "valueThreshold" : "...", "regexEnabled" : ..., "valueEnabled" : ..., "manualTagging" : ..., "facet" : ... }, ... ] }, ... ] }
The JSON payload consists of the following properties: Property
Type
Description
count
long
Number o f t ags r eturned i n t he c all.
tagsGroupedByDomain
array of domains with tags
Container f or t he l ist o f t ags.
tagDomain
string
Domain i dentifying a g roup o f t ags.
count
long
Number o f t ags i n a g iven d omain.
tags
array of tags
Container f or t he l ist o f t ags i n a g iven d omain.
name
string
Tag n ame.
description
string
Tag d escription.
regexp
string
Regular e xpression r ule a ssociated w ith t he t ag. N ote t hat t his rule i s u sed o nly i f r egexEnabled i s t rue.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
59
/v1/metadata/tag
Waterline Data Inventory
Property
Type
Description
regexpMin
string
The m inimum n umber o f c haracters t hat t he f ield v alue m ust include f or t he r egular e xpression t o b e a pplied.
regexpMax
string
The m aximum n umber o f c haracters t hat t he f ield v alue m ust include f or t he r egular e xpression t o b e a pplied.
regexpThreshold
string
The p ercent o f v alues i n a f ield t hat m ust m atch t he r egular expression r ule f or t he f ield t o b e a c andidate f or t his t ag.
valueThreshold
string
The w eight ( percent) t hat m ust b e c alculated f or a f ield f or t he field t o b e a c andidate f or t his t ag.
regexEnabled
Boolean
If t rue, t he t ag's r egular e xpression r ule i s u sed f or t ag d iscovery. Only o ne o f r egexEnabled, v alueEnabled, o r m anualTagging c an be t rue a t t he s ame t ime.
valueEnabled
Boolean
If t rue, t ag d iscovery i s e nabled f or t his t ag u sing v alue m atching. Only o ne o f r egexEnabled, v alueEnabled, o r m anualTagging c an be t rue a t t he s ame t ime.
manualTagging
Boolean
If t rue, t he t ag i s n ot u sed i n t ag d iscovery. O nly o ne o f regexEnabled, v alueEnabled, o r m anualTagging c an b e t rue a t the s ame t ime.
facet
Boolean
If t rue, t he v alues f rom f ields w ith t his t ag a re i ncluded i n a d ata facet f or s earching.
Failures can be caused if: !
The specified domain does not exist in the repository.
Sample Invocation First call POST auth/login to get a session cookie. cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "tagDomain=User-defined tags" \ "http://<WDI-host-name>:8082/api/v1/metadata/tag" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request including query parameters and session cookie. ClientResponse response1 = webResource1.path("tag").queryParams("Userdefined+Tags").header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
60
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/field
/v1/metadata/tagassociation/field The following operations are supported on this resource: POST DELETE
! !
POST Associate an existing tag to a field in an existing resource. The tag association is created as "accepted." If a matching tag association already exists as rejected or suggested, this call promotes the tag association to accepted. Request body JSON object identifying a tag and the field (and its file, collection, or table) that the tag will be associated with. {
}
"field" : "...", "dataSource" : "...", "path" : "...", "tagDomain" : "...", "tag" : "...", "description" : "..."
The JSON payload consists of the following properties: Property
Type
Description
field
string
The field name.
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
The name of the resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
tag
string
The fully-‐qualified name of an existing tag. For example, "Food Service.Cuisine".
tagDomain
string
The tag domain where the tag is located.
description
string
(Optional) Message to include in the audit history for the tag and for the file or table.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag successfully associated with field." If the tag is already associated with the specified field, the call succeeds but does not update the field or the audit history for the file or the tag.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
61
/v1/metadata/tagassociation/field
Waterline Data Inventory
Failures can be caused if: !
The specified resource or field does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate data field in the resource."}
!
The specified tag does not exist in the repository. Call POST /v1/metadata/tag to create the tag before associating the tag to a resource. {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}
Sample Invocation First call POST auth/login to get a session cookie. The POST /v1/metadata/tagassociation/field call expects JSON formatted data with a resource data store, path, and field name, and a tag name and domain. cURL curl -H "Content-Type:application/json" -X POST \ -d '{"tagDomain":"User-defined tags", "tag":"Food Service.Restaurant.Cuisine", "dataSource":"", "path":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv", "field":"CUISINE_DESCRIPTION", "description":"Marked on ingress"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field" \ -b cookie.txt
Java // Construct the JSON request payload. String sData = "{\"tagDomain\":\"" + sTagDomain + "\",\"tag\":\"" + sTagName + "\", \"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"field\":\"" + sField + "\",\"description\":\""+ sDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("tagassociation/field").type(MediaType.APPLICATION_JSON).header("Cookie" , sSessionId).post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
62
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/field
DELETE Reject an association between a tag and a field in an existing resource. If both tag information and field information are specified, then the specific tag association between the tag and the field is rejected. If no tag information is specified, the call removes all tag associations on this field. If no field and resource information are specified, the call rejects all field-‐level tag associations involving this tag. To also reject resource-‐level associations for the tag, use DELETE /v1/metadata/tagassociation/resource. In the context of Waterline Data Inventory, "deleting" a tag means "rejecting" it. The difference is that when a tag is "rejected", it is no longer associated with the field or file in future discovery operations. Parameters Name
Type
Description
tagDomain string Domain name as returned by GET /v1/metadata/tagdomain. tag
string Tag name as returned by GET /v1/metadata/tag. The tag name must include the entire hierarchy for the tag (for example, “Food Service.Restaurant.Cuisine”)
dataSource string Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using GET /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000” path
string The name of the resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
field
string Field or column name.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag association successfully deleted from field." If the tag is not currently associated with the field, the call returns a success message and does nothing. Failures can be caused if: !
The specified resource or field does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate data field to delete the tag association."}
!
The specified tag does not exist in the repository. {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
63
/v1/metadata/tagassociation/field
Waterline Data Inventory
Sample Invocation First call POST auth/login to get a session cookie. The DELETE /v1/metadata/tagassociation/field call expects query parameters for both tag information and a resource with field. cURL curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Food Service.Restaurant.Cuisine" \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ --data-urlencode "field=CUISINE_DESCRIPTION" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("field", sField); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagassociation/field").queryParams(queryParams).type(MediaType.APPLICAT ION_JSON).header("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
64
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/field/fields
/v1/metadata/tagassociation/field/fields GET Retrieve all fields that have suggested or accepted tag associations with a given tag. Parameters Name
Type
Description
tagDomain string Domain name as returned by GET /v1/metadata/tagdomain. tag
string Tag name as returned by GET /v1/metadata/tag. The tag name must include the entire hierarchy for the tag (for example, “Food Service.Restaurant.Cuisine”)
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing the tag information specified in the input and an array of zero or more objects, each describing a field and resource associated with the tag. {
}
"tagDomainName" : "...", "tagName" : "...", "count" : ..., "fields" : [ { "dataSource" : "...", "path" : "..." "owner" : "...", "tagState" : "...", "weight" : "...", "lastChange" : "...", "fieldName" : "..." "fieldNo" : ..., }, ... ],
The JSON payload consists of the following properties: Property
Type
Description
tagDomainName string
Tag domain. One of "User-‐defined Tags" or "Build-‐in Tags".
tagName
string
Tag name.
count
long
The number of fields returned in the result.
fields
array of fields
An array of field objects, include the field name, data source and resource name, and owner.
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
65
/v1/metadata/tagassociation/field/fields
Waterline Data Inventory
Property
Type
Description
path
string
The name of the resource. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
owner
string
File or table owner.
tagState
string
Suggested, Accepted. Tag discovery marks tag associations as "Suggested". Manual tag associations are entered as "Accepted"; in addition, users can convert a suggested tag association into "Accepted".
weight
double
The w eight c alculated f or t he f ield f or t his t ag. I f t he t ag h as a r egular expression r ule e nabled, t his v alue c orresponds t o t he p ercentage o f f ield values t hat m atch t he r egular e xpression. I f t he t ag h as v alue t agging e nabled, this v alue i s t he c alculated w eight f or h ow w ell t he f ield m atches t he t ag. T he minimum w eight i s d efined p er t ag; t he m aximum i s 1 00.
lastChange
date
Timestamp in GMT that describes the most recent event applied to the tag association. This timestamp could describe the creation of the tag association or the change of a tag association from SUGGESTED to ACCEPTED. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz"
fieldName
string
Field or column name.
fieldNo
integer
Field or column index number, starting at zero for the first column.
Failures can be caused if: !
The tag domain does not exist in the repository. {"error":3,"message":"Cannot locate tag domain ."}
!
The tag or does not exist in the specified domain. {"error":3,"message":"Cannot locate tag by tag domain <Built-in Tags>, tag name ."}
Sample Invocation First call POST auth/login to get a session cookie. The GET /v1/metadata/tagassociation/field/fields call expects query parameters for a tag name and domain. cURL curl -H "Content-Type:application/json" -X GET \ -G --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Food Service.Restaurant.Cuisine" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field/fields" -b cookie.txt
66
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/field/fields
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the query parameters with the session cookie. ClientResponse response1 = webResource1.path("tagassociation/field/fields").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
67
/v1/metadata/tagassociation/field/tags
Waterline Data Inventory
/v1/metadata/tagassociation/field/tags GET Retrieve all suggested or rejected tags associated with a field on a given resource. Parameters Name
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile".
field
string
Field name.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing the resource and field information specified in the input and an array of zero or more objects each describing a tag associated with the field. {
}
"dataSource" : "...", "path" : "...", "fieldName" : "...", "count" : ... "tags" : [ { "tagDescriptor" : { "tagDomain" : "...", "tag" : "..." }, "description" : "...", "weight" : ..., "lastChange" : "...", "tagState" : "..." }, ... ],
The JSON payload consists of the following properties:
68
Property
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/field/tags
Property
Type
Description
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
fieldName
string
Field name.
count
long
The number of tags returned in the result.
tags
array of tag associations
Container for the list of tag associations with the field.
tagDescriptor
array of tags
Container for the tag and tag domain.
tagDomainName string
Tag domain.
tagName
string
Tag name.
description
string
Information about the tag association. If the tag association was suggested, the description may contain information such as "Propagated from signature [Built-‐in Tags/US City]; value overlap(100%) "
weight
double
The w eight c alculated f or t he f ield f or t his t ag. I f t he t ag h as a r egular expression r ule e nabled, t his v alue c orresponds t o t he p ercentage o f field v alues t hat m atch t he r egular e xpression. I f t he t ag h as v alue tagging e nabled, t his v alue i s t he c alculated w eight f or h ow w ell t he field m atches t he t ag. T he m inimum w eight i s d efined p er t ag; t he maximum i s 1 00.
lastChange
date
Timestamp in GMT that describes the most recent event applied to the tag association. This timestamp could describe the creation of the tag association or the change of a tag association from SUGGESTED to ACCEPTED. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz"
tagState
string
Suggested, Accepted. Tag discovery marks tag associations as "Suggested". Manual tag associations are entered as "Accepted"; in addition, users can convert a suggested tag association into "Accepted".
Failures can be caused if: !
The specified resource or field does not exist in the repository. This can happen if the data has not yet been profiled. Resource not found: {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}
Resource found but field not found: {"error":3,"message":"Cannot locate data field to retrieve the tag association."}
Sample Invocation First call POST auth/login to get a session cookie. The GET /v1/metadata/tagassociation/field/tags call expects query parameters for a data source, resource, and field name.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
69
/v1/metadata/tagassociation/field/tags
Waterline Data Inventory
cURL curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ --data-urlencode "field=ACTION" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/field/tags" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("dataSource", ds); queryParams.add("path",sPath); queryParams.add("field", sField); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("tagassociation/field/tags").queryParams(queryParams).header("Cookie", sSessionId).type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
70
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/resource
/v1/metadata/tagassociation/resource The following operations are supported on this resource: POST DELETE
! !
POST Associate an existing tag to an existing resource. The tag association is created as "accepted." If a matching tag association already exists as suggested or rejected, this call promotes the tag association to accepted. Request body JSON object identifying a tag and the resource (file, folder, collection, or Hive table) that the tag will be associated with. {
}
"dataSource" : "...", "path" : "...", "tagDomain" : "...", "tag" : "...", "description" : "..."
The JSON payload consists of the following properties: Property
Type Description
dataSource string Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000” path
string Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
tagDomain string Domain name as returned by GET /v1/metadata/tagdomain. tag
string Tag name.
description string (Optional) Description of the tag association. This text appears in the audit history for both the tag and the resource.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag successfully associated with resource." If the tag is already associated with the resource, the call returns a success message and does nothing.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
71
/v1/metadata/tagassociation/resource
Waterline Data Inventory
Sample Invocation First call POST auth/login to get a session cookie. The POST /v1/metadata/tagassociation/resource call expects JSON formatted data with a resource data store and path, and a tag name and domain. cURL curl -H "Content-Type:application/json" -X POST \ -d '{"tagDomain":"User-defined tags", "tag":"Food Service.Restaurant.Cuisine", "dataSource":"", "path":"/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv"}' \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource" \ -b cookie.txt
Java // Construct the JSON request payload. String sData = "{\"tagDomain\":\"" + sTagDomain + "\",\"tag\":\"" + sTagName + "\", \"dataSource\":\"" + sDataSource + "\",\"path\":\"" + sPath + "\",\"description\":\""+ sDescription + "\"}"; // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and payload. ClientResponse response1 = webResource1.path("tagassociation/resource").type(MediaType.APPLICATION_JSON).header("Cook ie", sSessionId).post(ClientResponse.class,sData); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
72
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/resource
DELETE Reject the association between an existing tag and an existing resource. The tag remains in the Waterline Data Inventory tag glossary. If no tag is included in the request, all tags are rejected from the resource. If no resource is included in the request, all associations between this tag and any resources are rejected. To also reject field-‐level associations for a tag, use DELETE /v1/metadata/tagassociation/field. Parameters Name
Type Description
tagDomain string Domain name as returned by GET /v1/metadata/tagdomain. tag
string Tag name.
dataSource string Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000” path
string Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "Tag association successfully deleted from resource." If the tag was not associated with the resource, the success message is returned and not action is taken. Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}
!
The tag does not exist in the specified tag domain or in the glossary. {"error":3,"message":"Cannot locate tag by tag domain <Built-in Tags>, tag name ."}
Sample Invocation First call POST auth/login to get a session cookie. The DELETE /v1/metadata/tagassociation/resource call expects query parameters for both tag information and a resource.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
73
/v1/metadata/tagassociation/resource
Waterline Data Inventory
cURL curl -H "Content-Type:application/json" -X DELETE -G \ --data-urlencode "tagDomain=User-defined tags" \ --data-urlencode "tag=Topic.Accidents" \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); queryParams.add("dataSource", ds); queryParams.add("path",sPath); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with session cookie and query parameters. ClientResponse response1 = webResource1.path("tagassociation/resource").queryParams(queryParams).type(MediaType.APPLI CATION_JSON).header("Cookie", sSessionId).delete(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
74
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/resource/resources
/v1/metadata/tagassociation/resource/resources GET Retrieve all resources that have accepted tag associations with a given tag. Parameters Name
Type
Description
tagDomain
string
Domain name as returned by GET /v1/metadata/tagdomain.
tag
string
Tag name.
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing the tag information specified in the input and an array of zero or more objects each describing a resource associated with the tag. {
}
"tagDomainName" : "...", "tagName" : "...", "count" : ..., "resources" : [ { "dataSource" : "...", "path" : "...", "owner" : "...", "tagState" : "...", "weight" : ..., "lastChange" : "..." }, ... ],
The JSON payload consists of the following properties: Property
Type
Description
tagDomainName string
Tag domain.
tagName
string
Tag name.
count
long
The number of resources associated with the tag.
resources
array of A list of resource objects, including the data source, resource name, and owner. resources
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
owner
string
File or table owner.
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
75
/v1/metadata/tagassociation/resource/resources
Waterline Data Inventory
Property
Type
Description
weight
double
The w eight c alculated f or t he f ield f or t his t ag. I f t he t ag h as a r egular expression r ule e nabled, t his v alue c orresponds t o t he p ercentage o f f ield values t hat m atch t he r egular e xpression. I f t he t ag h as v alue t agging e nabled, this v alue i s t he c alculated w eight f or h ow w ell t he f ield m atches t he t ag. T he minimum w eight i s d efined p er t ag; t he m aximum i s 1 00.
lastChange
date
Timestamp in GMT that describes the most recent event applied to the tag association. This timestamp could describe the creation of the tag association or the change of a tag association from SUGGESTED to ACCEPTED. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz"
tagState
string
Suggested, Accepted. Tag discovery marks tag associations as "Suggested". Manual tag associations are entered as "Accepted"; in addition, users can convert a suggested tag association into "Accepted".
Failures can be caused if: !
The tag does not exist in the specified domain. {"error":3,"message":"Cannot locate tag by tag domain <User-defined tags>, tag name ."}
!
The tag domain does not exist in the repository. {"error":3,"message":"Cannot locate tag domain <User-defined Tags>."}
Sample Invocation First call POST auth/login to get a session cookie. The GET /v1/metadata/tagassociation/resource/resources call expects query parameters for a data source, resource, and field name. cURL curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "tagDomain=User-defined tags"\ --data-urlencode "tag=Topic.Travel.NYC" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource/resources" \ -b cookie.txt
Java // Construct the query parameters. MultivaluedMap queryParams = new MultivaluedMapImpl(); queryParams.add("tagDomain", sTagDomain); queryParams.add("tag", sTagName); // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResource1 = client.resource("http://:8082/api/v1/metadata/"); // Send the request with query parameters and session cookie. ClientResponse response1 = webResource1.path("tagassociation/resource/resources") .queryParams(queryParams).header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON).get(ClientResponse.class); // Collect the response code and response output int statusCode = response1.getStatus(); String responseOutput = response1.getEntity(String.class);
76
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/v1/metadata/tagassociation/resource/tags
/v1/metadata/tagassociation/resource/tags GET Retrieve all suggested or rejected tags associated with a resource. Parameters Name
Type
Description
dataSource string
Data source name. A string formatted as a URI that identifies the data source. You can retrieve the valid values using /v1/metadata/datasource. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash delimited string starting with a slash. For example, "/user/me/myproject/myfile".
string
Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns a JSON object containing the resource and field information specified in the input and an array of zero or more objects each describing a tag associated with the field. {
}
"dataSource" : "...", "path" : "...", "count" : ... "tags" : [ { "tagDescriptor" : { "tagDomain" : "...", "tag" : "..." }, "description" : "...", "weight" : ..., "lastChange" : "...", "tagState" : "..." }, ... ],
The JSON payload consists of the following properties: Property
Type
Description
dataSource
string
Data source name. A string formatted as a URI that identifies the data source. Examples as follows: HDFS: “hdfs://:8020” Hive: “jdbc:hive2://localhost:10000”
path
string
Resource name. For HDFS files, this name includes the full path describing the resource's location in the file system. The path is a slash-‐delimited string starting with a slash. For example, "/user/me/myproject/myfile". For Hive tables, this name includes the database name, a dot, and the table name. For example "financedb.revenue".
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
77
/v1/metadata/tagassociation/resource/tags
Waterline Data Inventory
Property
Type
Description
count
long
The number of tags returned in the result.
tags
array of tag associations
Container for the list of tag associations with the field.
tagDescriptor
array of tags
Container for the tag and tag domain.
tagDomainName string
Tag domain.
tagName
string
Tag name.
description
string
Information about the tag association. If the tag association was suggested, the description may contain information such as "Propagated from signature [Built-‐in Tags/US City]; value overlap(100%) "
weight
double
The w eight c alculated f or t he f ield f or t his t ag. I f t he t ag h as a r egular expression r ule e nabled, t his v alue c orresponds t o t he p ercentage o f field v alues t hat m atch t he r egular e xpression. I f t he t ag h as v alue tagging e nabled, t his v alue i s t he c alculated w eight f or h ow w ell t he field m atches t he t ag. T he m inimum w eight i s d efined p er t ag; t he maximum i s 1 00.
lastChange
date
Timestamp in GMT that describes the most recent event applied to the tag association. This timestamp could describe the creation of the tag association or the change of a tag association from SUGGESTED to ACCEPTED. Format "yyyy-‐MM-‐dd HH:mm:ss.SSS zzz"
tagState
string
Suggested, Accepted. Tag discovery marks tag associations as "Suggested". Manual tag associations are entered as "Accepted"; in addition, users can convert a suggested tag association into "Accepted".
Failures can be caused if: !
The specified resource does not exist in the repository. This can happen if the data has not yet been profiled. Resource not found: {"error":3,"message":"Cannot locate resource by dataSource = <maprfs:///>, path = ."}
Sample Invocation First call POST auth/login to get a session cookie. The GET /v1/metadata/tagassociation/field/tags call expects query parameters for a data source, resource, and field name. cURL curl -H "Content-Type:application/json" -X GET -G \ --data-urlencode "dataSource=" \ --data-urlencode "path=/user/waterlinedata/public_health/restaurants/Inspections_sub_8.csv" \ "http://<WDI-host-name>:8082/api/v1/metadata/tagassociation/resource/tags" \ -b cookie.txt
78
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
API Reference
/version
/version GET Show the API version. Response body The request returns HTTP error code 200 for success and 500 if the call completes with errors. See Error Messages for details. A successful call returns "v" followed by the API version number. Note that the API version number does not necessarily correspond to the compatible Waterline Data Inventory version. Sample Invocation First call POST auth/login to get a session cookie. cURL curl -H "Content-Type:application/json" -X GET \ "http://<WDI-host-name>:8082/api/version" \ -b cookie.txt
Java // Create a connection to Waterline Data Inventory Client client = Client.create(); WebResource webResourceVersion = client.resource("http://:8082/"); //Send the request with the session cookie. ClientResponse responseVersion = webResourceVersion.path("version") .header("Cookie", sSessionId) .type(MediaType.APPLICATION_JSON) .get(ClientResponse.class); // Collect the response code and response output int statusCode = responseVersion.getStatus(); String responseOutput = responseVersion.getEntity(String.class);
© 2014 - 2016 Waterline Data, Inc. All rights reserved.
79