Right Content, Right Audience Using machine learning at Ringier for content optimization at scale Zhao Wang, Ringier AG 25.10.2017
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ringier at a glance Established in 1833, family-owned company, headquarter in
Zurich The largest Swiss multinational media corporation 7,300 employees in 19 countries 539 products cover newspapers and magazines, TV and radio channels, online and mobile digital businesses Turnover of CHF 1.05 billion in 2016 (CHF 844.2 million in Switzerland)
A step back in history Are we actually cutting the edge? History of artificial intelligence
1952 1956 birth of AI 1987
1993
1974 golden years
2001 2017 industrial adoption, 2nd winter growth big data
1980
1st winter
1987 boom
Adoption is years behind the innovation The media companies have benefited from the adoptions History of early adoptions on the core technologies for media business
1993 web analytics
1994
1995
affiliate SSO marketing
2003 pragmatic advertising
1997
2001
content-based filtering
collaborative filtering
2007 on-site personalization
2009 real-time bidding
Happy happy joy joy What can go wrong?
The discrete effort disrupted user experience Not to mention the leaking of data as the most valuable assets Protocol: visualization of page loading
Result: blick.ch homepage snapshot (as of 26-06-2016)
request request
origin
ads-related target
contentrelated target
more requests doubleclick.net
blick.ch
less time brightcove.net
Our approach at Ringier Project Sherlock - an intelligent data hub across portfolio companies
Transactions Social media
Transactions
Customers
/ Sherlock
Financials
Subscription
Subscription
Products
Conceptual design
Content commerce
Dossiers generation
Search engine optimization
Content recommendation
User profiling through content profiling powered by machine learning
Ringier Pixel
User Insights
Track frontend user behavior. AI: identifier de-duplication
TagCloud Construct content profiles. AI: NLP, image recognition
Construct user profiles. AI: user profiling
Advanced Analytics Dashboard, reporting AI: predictive analytics
Intelligent data hub
SSO Single Sign-on Consent management
It is not easy to get started Several essential questions to be answered firstly
Build or Buy Initial cost? Time-to-market? Team capability and capacity? Business & technology know-how? Data protection? Data ownership? Running cost? Unique proposition?
Scoping and planning Data volume? Data velocity? Acceptable latency? Data lifecycle? Minimum viable product of a platform?
Adoption and rollout Who drives the adoption? Discrete effort for adoption? Central effort for rollout? Joint-Venture daughter companies? Carveout scenario?
We tried, learned, refined... and we launched The eventual launch scope Ad sales data/ Ads delivery data/ 3rd party data
S3 buckets
Graph storage backend (DynamoDB)
Lambda functions GPU EC2
Redshift Content managemen t system
Text contents (S3)
NLP (Step functions)
Lambda
Lambda Image contents (S3)
Tracking pixels
Training fleet
Lambda
Image Recognition (Step functions)
Event enrichment (EMR cluster) ECS cluster (ASG)
Kinesis stream
Content profiles JanusGraph DB (S3) (EC2 cluster) ID deduplication
Redshift
User profiles (DynamoDB)
Kinesis Analytics User profiling
Tracking events (S3) Event enrichment Kinesis stream (ECS cluster)
Spot fleet mgr.
Kinesis Firehose
Elasticsearch
Redshift
Lambda
API Gateway
Initial deployment reveals encouraging results Machine learning even optimized the running cost of itself
10.88
35
5
billion
million
million
total data points
new events per day
user profiles
10000
22.5
495365
profiles
seconds
US$
refresh in one minute
from activity to profile
less spending for EC2 (*) (*)
Spot Instances of p2.xlarge (with the NVIDIA K80 GPU and 61G RAM each) comparing to the corresponding on-demand pricing
Lessons learned Quantify “vendor lock-in” and evaluate it Sometimes it is cheaper than presumed.
Align with AWS roadmap closely Being “reinvent-the-wheel” is not nice.
Define clear ownership and accountability It needs different governance models for the platform and the use cases.
Give more priority to architecture design There are too many possibilities because it is too flexible.
Be careful of “serverless” Not everything is ready for that (yet).
Zhao Wang, Principal Solutions Architect
[email protected] Ringier AG