Quarterly Stakeholder Forum
April 11, 2017
Technology Platform Team 1
TPT.1 NOTE: This is a technical breakout session
intake
intake
Agenda Geek Rating • • •
Cast of API Land Characters……………………. Elastic Search (Dora & Neutron)………. SAF Integration (Perry)…………………..
April 11, 2017
intake
API Land Cast of Characters Geek Rating
April 11, 2017
intake
API Land Cast of Characters
Phineas Intake UI Application
Barney Legacy CRUD API
Ferb Intake API
Dora Search API
Perry Security API
Neutron Search Engine Loading
Buzz API Test Harness UI
Candace Audit / Logging April 11, 2017
intake
API Land Cast of Characters Phineas Intake UI Application
Legacy CRUD API
Security API
Buzz API Test Harness UI
Dora
Perry
Barney
Audit / Logging
Search API
intake
Candace
Ferb
SAF
Production DB2
Replicated Copy of Production DB2
Search Engine Loading
Elastic Search
Elk Stack
Neutron
April 11, 2017
intake
Elastic Search Geek Rating
Neutron Search Engine Loading
Dora Search API April 11, 2017
intake
Elastic Search: Overview Data Sources Data can be pulled from a variety of disparate sources POC� Legacy CWS/CMS
POC�
Search API
All data that needs to be searched against is loaded(copied) into Elastic Search, and continually updated.
Search API provides search resources for searching freeform text (e.g. Comments, documents, etc.) and named fields (e.g. Last Name, First Name, Street Address, etc.)
Extract, Transform & Load (ETL) Processes
User Application
Search API Elastic Search
Freeform Text & Named Field Searches Search
Application can present search UI as desired, and call Search API resource(s) to execute the search
UI Application Search Freeform: CDL12345 AND SMITH
…
CWS-NS
Data Loading
First Name: Jeremy
Text-based Indexed Content
Smith
…
Interface Systems (CDCR, DOJ, etc.)
Last Name:
…
SUBMIT
POC�
Documents Documents Documents (Word, Excel, (Word, Excel, (Word, Excel, * CSV, etc.) CSV, CSV,etc. etc.
Neutron
Dora
*Content from scans, photos, PDFs, etc. must be converted to text in order to be searched. POC�
Proof-of-concept completed for this source
April 11, 2017
intake
Elastic Search: Architecture / Performance Problem: Initial loading of millions of records from Legacy Production DB2 into Elastic Search can be expensive from both performance and cost perspectives. Continual near-real-time updates would also be expensive. Pros: • Single data source Cons: • Potential performance degradation of Legacy Production system during initial loads and continual updates • Adding table-level triggers and new tables (in order to determine which updates need to be loaded) would require lengthy and costly impact analysis and testing by the Legacy team
April 11, 2017
intake
Elastic Search: Architecture / Performance Solution (Implemented): Use database replication to implement a read-only copy of Legacy Production DB2. Pros: • Isolates ES loading and updates from actual Legacy Production DB2 • Provides ability to easily add table-level triggers and new tables (in order to determine which updates need to be loaded) Cons: • Database replication needed to be installed / configured on mainframe
April 11, 2017
intake
Elastic Search: Architecture / Performance AWS
Intake UI
Search API
Elastic Search
Mainframe One-way Replication Production DB2
Replicated Copy of Production DB2
April 11, 2017
intake
Elastic Search: Search Legacy Documents (attachments) Problem: Legacy currently does not support searching within attached documents.
Solution (POC implemented): Mapper Attachments plugin for Elastic Search https://github.com/elastic/elasticsearch-mapper-attachments Benefits: • Allows searching against millions of document attachments from Legacy • Supports many common document types: (PDF, Word, XLS, PPT, etc. – uses Apache Tika) • Automatically determines common document metadata (Author, Document Name, Document Type, Keywords, Content Length, Last Update, etc.) and populates metadata fields in the Elastic Search index (also searchable)
April 11, 2017
intake
Elastic Search: Secure Pass-Through Problem: Initial implementation of Dora was customized to take an input string, & return formatted results. Pros: • Can easily lock down access to ES data via user authorizations • Abstracts ES query syntax from UI application Cons: • Requires code changes for any new query functionality (e.g. Highlighting) • Hinders UI application developers from implementing features quickly
April 11, 2017
intake
Elastic Search: Secure Pass-Through Solution (in flight): Implement a secure pass-through architecture, utilizing X-Pack security to lock down access. Pros: • Can lock down access to ES data via user authorizations • Allows UI application developers to use regular ES query syntax (help, training, and examples are easy to find on Internet) • UI application developers can implement new features quickly • New features (like highlighting) will not necessarily require code changes to Dora Cons: • ES Query syntax knowledge necessary for UI application developers
April 11, 2017
intake
Elastic Search: Secure Pass-Through
2
1
Intake UI
6
Search API
3 4
Elastic Search
5
1. 2. 3. 4. 5. 6.
UI application passes Elastic Search (ES) query to Dora via json document using native ES syntax Dora determines user authorizations Dora submits query to ES Dora receives results from ES Dora filters results based on user authorizations Dora returns results to UI application
April 11, 2017
intake
SAF Integration Geek Rating
Perry Security API April 11, 2017
intake
SAF Integration User Browser User points browser at an Intake URL
10
1
Legacy Authorizations
Legacy RACF
Security API (Perry) redirects user to SAF (login page) with Perry callback URL (this URL should take a parameter of the Intake callback URL)
LDAP
Intake redirects user to Perry for login NOTE : Intake callback URL should be passed here
302 from step 9
6
9
Security API (Perry) redirects user back to Intake UI (this URL should be the parameter 1 on Step 6)
7
Intake UI
2 Intake calls Perry to determine if token valid
16 11
Perry returns “INVALID”
8 After user successfully logs in, SAF redirects user to Security API callback URL with a valid token
5
8a
API returns 2xx
12
Intake calls API
Intake calls Perry to get user AUTHS
Perry returns user AUTHS
8b
RACF ID returned
17
Perry Security API
4
18 Resource Endpoints • /authn • /authz
API calls Perry to determine if token valid
Perry calls SAF to determine if token valid
SAF returns “INVALID”
Perry calls SAF to determine if token valid
13
12 Perry returns “VALID” RESTful API Endpoints
15
3
14
SAF returns “VALID”
SAF
Quarterly Stakeholder Forum
April 11, 2017
Technology Platform Team 1
TPT.1 Richard Bach[TPT.1 Technical Lead] Gregg Hill[TPT.1 Scrum Master]
intake