Columns and Clouds Hybrid databases: Faster queries, smaller data, and stretching your database across servers David Rader – OpenSCG PGConf.US – 2017-03-30
What’s this about? • I heard PostgreSQL doesn’t support partitions? • Can you compress your data and get faster queries? • Can you stretch your database across servers? • WTF is an FDW?
Here’s the situation… • Your large table keeps getting larger • More and more records added • Deleting is frowned upon • (and causes vacuum issues) • And you need more disk space
What are we going to do about this? • Partition • Use Foreign Data Wrappers (FDW) • Column Storage • Archive to a remote server
Managing a big table is slow, what can I do?
Partitioning - split large table in to small parts Big Table
Big Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6
Partitioning in PostgreSQL 9.6 • Uses table inheritance • Parent table is empty • Each partition has a date range or id range • Before insert trigger routes tuples to child partition • Optimizer uses check constraints to skip partitions
Parent Table Child Table 1 Child Table 2 Child Table 3 Child Table 4 Child Table 5 Child Table 6
That’s better. But, my disk is full – can I make my tables smaller?
PostgreSQL is a NoSQL database, too • Column Store FDW • Compresses data on disk • Speeds up many analytic queries • Older partitions, no updates • Timeseries • Logs • History
Parent Table Child Table 1 Child Table 2 Compress Column 3 Compress Column 4 Compress Column 5 Compress Column 6
I’m thinking about moving to the cloud - Can I move ½ my database ?
What? Move ½ your database?
Well, what if… Parent Table Child Table 1
• We just moved some partitions to a remote server? • And still queried them like they were local tables?
Child Table 2
Compress Column 3 Remote Server Compress Column 4 Parent Table Child Table 5 Child Table 6
Ok. Now how?
Sample data set • Based on Airline On-Time Performance data • Available from the US DoT • Bureau of Transportation Statistics • ~ 6MM flight stats per year
Partitioning: Parent table
Child partitions w/ check constraints
Select: constraint exclusion
Insert: • Trigger based routing to the correct partition • Choose how to handle missing partition: • Default to parent table • Reject insert/exception
Tip: Use pg_partman • Use pg_partman extension to automate maintenance • https://github.com/keithf4/pg_partman
• Handles time and serial ID based partitions • Child table & Trigger creation
Tip: Bulk loading • Batch load data directly to child partition tables • Avoid insert trigger overhead
Partitioning in PostgreSQL 10+
FEATURE WATCH
• PG 10.0 adds native Partitioned Tables. Yay! • Much faster data inserting – optimized tuple routing • Still uses inheritance behind the scenes but no trigger • Some limitations … no automatic indexing, no default parition, cannot convert from parent to native • More improvements to come …
Ok: • Partitioning in postgresql • Child tables inherit from parent • Check constraints for partition exclusion
Next: • Foreign Data Wrappers (FDW)
Foreign Data Wrappers (FDWs) • Implementation of SQL/MED (Management of External Data) • Enables using non-postgres data as tables • (select, insert, update)
MySQL
Foreign Data Wrappers (FDWs) • Introduced in PG 9.1 – Read-only • Enhancements in each 9.x version • Writes • Push down filters • Remote joins
• Upcoming PG 10 will add remote aggregate for postgres_fdw
FEATURE WATCH
FDW’s available • Postgres • Relational DB’s: Oracle, MS SQL, MySQL, SQLite, JDBC, ODBC, • NoSQL: Cassandra, Hadoop, MongoDB • Files: csv, xml, json, pg_dump • Other: OS processes, S3, Twitter, Google Sheets • Column store! • https://wiki.postgresql.org/wiki/Foreign_data_wrappers
cstore_fdw • http://citusdata.github.io/cstore_fdw/ • Optimized Row Columnar (ORC) on-disk format • Designed for analytics • Skip indexes • Batch inserts (not updates, not single row)
cstore_fdw
• Binary install from bigsql pgc or compile using pgxn • Add to postgresql.conf: shared_preload_libraries = 'cstore_fdw'
cstore_fdw – load data
… or use COPY to load from files
cstore_fdw - compression
PG
76% Smaller! cstore
0
100
200
300
400
500
600
700
800
cstore_fdw - speed
94% Faster!
PG
cstore
0
200
400
600
800
1000
1200
Ok: • Partitioning in postgresql • FDW’s • Cstore
Next: • Use cstore for old partitions
Use cstore partitions • Older partitions, no updates • Timeseries • Logs • History
• Convert partitions to cstore_fdw • Alter inherit from parent • Remove old child
Parent Table Child Table 1 Child Table 2 cstore Child 3 cstore Child 4 cstore Child 5 cstore Child 6
1. Verify child partition
2. Lookup check constraints
3. Create new partition
4. Add check constraint
5. & 6. Drop old, add new
Before: …
Convert:
After: …
Query parent -> cstore
Compare native
Ok: • Partitioning in postgresql • FDW’s • Cstore • Cstore old partitions
Next: • Using Postgres_fdw for partitions on remote servers
Postgres_fdw -> remote server Parent Table Child Table 1
• Move some partitions to a remote server • Execute queries remotely
Child Table 2
Compress Column 3 Remote Server Compress Column 4 Parent Table Child Table 5 Child Table 6
Setting up postgres_fdw
Moving partition data – about the same
Query w/ remote table
Remote Server Partitions - Summary • Yes, this will (mostly) work • Most PG Sharding solutions built with postgres_fdw • But: • Latency on remote queries • Pushing inserts through FDW is slow • PG 9.6 postgres_fdw does not push down aggregates (pulls all rows back to sum or count) • Lose connection – query error • Cannot execute DDL through FDW – need libpq connection to create new partitions on remote server
Ok: • Partitioning in postgresql • FDW’s • Cstore • Cstore old partitions • Postgres_fdw -> remote server
Any Questions?