Columns and Clouds

Report 12 Downloads 70 Views
Columns and Clouds Hybrid databases: Faster queries, smaller data, and stretching your database across servers David Rader – OpenSCG PGConf.US – 2017-03-30

What’s this about? • I heard PostgreSQL doesn’t support partitions? • Can you compress your data and get faster queries? • Can you stretch your database across servers? • WTF is an FDW?

Here’s the situation… • Your large table keeps getting larger • More and more records added • Deleting is frowned upon • (and causes vacuum issues) • And you need more disk space

What are we going to do about this? • Partition • Use Foreign Data Wrappers (FDW) • Column Storage • Archive to a remote server

Managing a big table is slow, what can I do?

Partitioning - split large table in to small parts Big Table

Big Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6

Partitioning in PostgreSQL 9.6 • Uses table inheritance • Parent table is empty • Each partition has a date range or id range • Before insert trigger routes tuples to child partition • Optimizer uses check constraints to skip partitions

Parent Table Child Table 1 Child Table 2 Child Table 3 Child Table 4 Child Table 5 Child Table 6

That’s better. But, my disk is full – can I make my tables smaller?

PostgreSQL is a NoSQL database, too • Column Store FDW • Compresses data on disk • Speeds up many analytic queries • Older partitions, no updates • Timeseries • Logs • History

Parent Table Child Table 1 Child Table 2 Compress Column 3 Compress Column 4 Compress Column 5 Compress Column 6

I’m thinking about moving to the cloud - Can I move ½ my database ?

What? Move ½ your database?

Well, what if… Parent Table Child Table 1

• We just moved some partitions to a remote server? • And still queried them like they were local tables?

Child Table 2

Compress Column 3 Remote Server Compress Column 4 Parent Table Child Table 5 Child Table 6

Ok. Now how?

Sample data set • Based on Airline On-Time Performance data • Available from the US DoT • Bureau of Transportation Statistics • ~ 6MM flight stats per year

Partitioning: Parent table

Child partitions w/ check constraints

Select: constraint exclusion

Insert: • Trigger based routing to the correct partition • Choose how to handle missing partition: • Default to parent table • Reject insert/exception

Tip: Use pg_partman • Use pg_partman extension to automate maintenance • https://github.com/keithf4/pg_partman

• Handles time and serial ID based partitions • Child table & Trigger creation

Tip: Bulk loading • Batch load data directly to child partition tables • Avoid insert trigger overhead

Partitioning in PostgreSQL 10+

FEATURE WATCH

• PG 10.0 adds native Partitioned Tables. Yay! • Much faster data inserting – optimized tuple routing • Still uses inheritance behind the scenes but no trigger • Some limitations … no automatic indexing, no default parition, cannot convert from parent to native • More improvements to come …

Ok: • Partitioning in postgresql • Child tables inherit from parent • Check constraints for partition exclusion

Next: • Foreign Data Wrappers (FDW)

Foreign Data Wrappers (FDWs) • Implementation of SQL/MED (Management of External Data) • Enables using non-postgres data as tables • (select, insert, update)

MySQL

Foreign Data Wrappers (FDWs) • Introduced in PG 9.1 – Read-only • Enhancements in each 9.x version • Writes • Push down filters • Remote joins

• Upcoming PG 10 will add remote aggregate for postgres_fdw

FEATURE WATCH

FDW’s available • Postgres • Relational DB’s: Oracle, MS SQL, MySQL, SQLite, JDBC, ODBC, • NoSQL: Cassandra, Hadoop, MongoDB • Files: csv, xml, json, pg_dump • Other: OS processes, S3, Twitter, Google Sheets • Column store! • https://wiki.postgresql.org/wiki/Foreign_data_wrappers

cstore_fdw • http://citusdata.github.io/cstore_fdw/ • Optimized Row Columnar (ORC) on-disk format • Designed for analytics • Skip indexes • Batch inserts (not updates, not single row)

cstore_fdw

• Binary install from bigsql pgc or compile using pgxn • Add to postgresql.conf: shared_preload_libraries = 'cstore_fdw'

cstore_fdw – load data

… or use COPY to load from files

cstore_fdw - compression

PG

76% Smaller! cstore

0

100

200

300

400

500

600

700

800

cstore_fdw - speed

94% Faster!

PG

cstore

0

200

400

600

800

1000

1200

Ok: • Partitioning in postgresql • FDW’s • Cstore

Next: • Use cstore for old partitions

Use cstore partitions • Older partitions, no updates • Timeseries • Logs • History

• Convert partitions to cstore_fdw • Alter inherit from parent • Remove old child

Parent Table Child Table 1 Child Table 2 cstore Child 3 cstore Child 4 cstore Child 5 cstore Child 6

1. Verify child partition

2. Lookup check constraints

3. Create new partition

4. Add check constraint

5. & 6. Drop old, add new

Before: …

Convert:

After: …

Query parent -> cstore

Compare native

Ok: • Partitioning in postgresql • FDW’s • Cstore • Cstore old partitions

Next: • Using Postgres_fdw for partitions on remote servers

Postgres_fdw -> remote server Parent Table Child Table 1

• Move some partitions to a remote server • Execute queries remotely

Child Table 2

Compress Column 3 Remote Server Compress Column 4 Parent Table Child Table 5 Child Table 6

Setting up postgres_fdw

Moving partition data – about the same

Query w/ remote table

Remote Server Partitions - Summary • Yes, this will (mostly) work • Most PG Sharding solutions built with postgres_fdw • But: • Latency on remote queries • Pushing inserts through FDW is slow • PG 9.6 postgres_fdw does not push down aggregates (pulls all rows back to sum or count) • Lose connection – query error • Cannot execute DDL through FDW – need libpq connection to create new partitions on remote server

Ok: • Partitioning in postgresql • FDW’s • Cstore • Cstore old partitions • Postgres_fdw -> remote server

Any Questions?

Recommend Documents