PostgreSQL 10 running in Docker

Before the end of this year PostgreSQL version 10 will be released. In this post I will explain how you can already have a quick look at this new version by running it in a Docker container.

On my Github page you can find a docker-compose file with five defined containers: an ELK stack, pgAdmin and a PostgreSQL instance. The ELK stack will be used for collecting logs from the PostgreSQL instance. I am using a host machine running Ubuntu 16.04 with Docker version 17.05.0 and Docker Compose version 1.9.0. In the Kibana dashboard, you will be able to see an overview of all the SQL statements done by a pg_restore command, and look for any errors.

First step is to download the Github repository:

git clone -b v1.0 https://github.com/rudibroekhuizen/docker-postgresplus

Move to the docker-postgresplus folder and run the following command:

 

docker-compose up

The containers will start. Start an interactive session to the Postgres container:

docker exec -it dockerpostgresplus_postgres_1 sh

Download Sakila example database:

wget https://s3.amazonaws.com/assets.datacamp.com/course/sql/dvdrental.zip; unzip dvdrental.zip

Create the Sakila database and import data:

su postgres
psql
psql> CREATE DATABASE sakila;
/q
pg_restore -U postgres -d sakila dvdrental.tar

Open the Kibana webinterface and import the dashboard.json file:

http://localhost:5601

Now it is possible to see what and how many statements where issued by pg_restore and how long they took:

postgresql_1

Open the pgAdmin webinterface to browse through all tables in the Sakila database:

http://localhost:5050

Hostname, username and password are all set to “postgres”.

pgadmin_1

Advertisements

Trace PostgreSQL queries using logging collector

A great way to see what queries are being executed and how long they take is by enabling the logging collector on your PostgreSQL database server. The logging collector is a background process which captures log messages sent to stderr and redirects them into log files.  Make sure the following settings are set in postgresql.conf:

logging_collector = on
log_min_duration_statement = 0
log_directory = '/var/log/postgresql'
log_min_messages = WARNING
log_rotation_age = 1d
log_rotation_size = 1GB
log_line_prefix = 'start_prefix{"log_time":"%m","application_name":"%a","user_name":"%u","database_name":"%d","remote_host":"%r","process_id":"%p","command_tag":"%i","sql_state":"%e","session_id":"%c","session_line_number":"%l","virtual_transaction_id":"%v","session_id":"%c"}end_prefix log_line: '

To log only actions that took more than for example 5 ms, set the value to 5. If set to 0, all actions will be logged, to disable, set to -1.

I changed the notation of the log_line_prefix to JSON format, so that this part can be easily parsed by Logstash.

Restart the PostgreSQL to make these settings active. My test setup:

– A Drupal installation on a LAPP stack (Linux, Apache, PostgreSQL, php) with Filebeat installed to ship the logs
– An ELK stack (Elasticsearch, Logstash, Kibana) to parse the logs and make the results visible

In directory /var/log/postgres you will find *.log files with log lines that look like the following:

start_prefix{"log_time":"2017-06-09 14:09:29.141 CEST","application_name":"[unknown]","user_name":"drupal_user","database_name":"drupaldb","remote_host":"::1(49408)","process_id":"8412","command_tag":"SELECT","sql_state":"00000","session_id":"593a8ff9.20dc","session_line_number":"8","virtual_transaction_id":"2/0","session_id":"593a8ff9.20dc"}end_prefix log_line: LOG: duration: 0.501 ms statement: SELECT cid, data, created, expire, serialized, tags, checksum FROM cache_data WHERE cid IN ( 'route:/search/node:keys=foobar' ) ORDER BY cid

The actual log_line should also be parsed, to get the duration value out. The complete Logstash filter can be found on this Github page. The end result in Elasticsearch will look like this:

Screenshot from 2017-06-09 14:31:57

Now make Drupal run some queries, search for ‘helloworld’:

Screenshot from 2017-06-09 14:48:56

A search for ‘helloworld’ in Elasticsearch will show that three queries where executed, two SELECT queries and one INSERT query:

Screenshot from 2017-06-09 14:52:26

Screenshot from 2017-06-09 15:01:49

With pg_stat_activity it is possible to get (almost) the same results, but there are a few drawbacks with this approach:
– Logs can not be shown real-time, scheduled queries are needed to get the information out of the pg_stat_activity table.
– The duration of the action is not available. To see query times, you need to load the pg_stat_statements module, which requires additional shared memory.

The logging collector collects all logs, not only query logs, so you will have all logs in one place. If you haven’t seen enough log line, set log_min_messages to DEBUG5, which will show you all logs possible from your database server.

Connect to PostgreSQL database from pgAdmin4 running in Docker container using ssh tunnel

The latest version of pgAdmin, version 4, is not available in the Ubuntu repo’s. To install you need to install Python, set up an virtual environment, etc. It’s a lot easier to run pgAdmin from a Docker container. In this post I will explain how to set this up and create a secure ssh tunnel to your PostgreSQL database from the container.

Install Docker with this simple script:
sudo curl -sSL https://get.docker.com/ | sh

Start the Docker service:
sudo service docker start

Download and start the pgAdmin container:
docker run -d -p 5050:5050 -v data:/home/pgadmin –name pgadmin nphung/pgadmin

Now open your browser and navigate to http://127.0.0.1:5050. You should see something like this:

pgadmin

The safest way to connect to your database is to use ssh. You don’t need to open up any additional ports this way. Issue the following command on your local computer:

ssh -fNg -L 172.16.1.2:5432:localhost:5432 35.156.108.86

The fist IP adres and port (172.16.1.2:5432) is the IP of your local computer. In your case it will be different, use the command ip addr to look it up. The port 5432 is the port you will be connecting to.

The second IP and port is the address of your PostgreSQL database server.

Now go to your browser and right-click on Servers, Create”, Server”.

pgadmin_server

Enter a name in the General tab. Enter IP address, User name and Password and press Save:

pgadmin_create

Now right-click on the entry you just made and click Connect Server. You should now be connected to your database server.

pgadmin_conn