PostgreSQL Database Backup to S3 and Restore from S3 using simple shell script

Following topic will discuss how to Backup and Restore PostgreSQL Databases directly in S3. This topic will help you to leverage AWS S3 -IA storage to use as Backup Storage for any number of PostgreSQL databases running under an instance.

This topic will cover PostgreSQL Instance running on EC2 or on-premises servers.

Step 1:- Create a new mount point or directory in the database server to use as staging location for placing the backup files before moving into S3.

Example:

 

mkdir /postgresql_bkp

chown postgres:postgres /postgresql_bkp

chmod -R 700 /postgresql_bkp

Step 2:- Now Create a bucket in S3 and inside the bucket create a directory.

Example: s3://phi-postgresql-backup/postgresql01-pgbackup

Step 3:- Now install AWS CLI in the DB server under postgres user following AWS recommended guide:-

http://docs.aws.amazon.com/cli/latest/userguide/awscli-install-linux.html

Configure AWS CLI

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Step 4:- Now configure password of postgres user so during pg_dump or psql using script, the password of the user can be taken from the environment variable.

In postgres user home, create a file .pgpass

vi $HOME/.pgpass

localhost:5432:postgres:postgres:somepassword

In .bash_profile make following entry:–

vi $HOME/.bash_profile

[ -f /etc/profile ] && source /etc/profile
PGDATA=/var/lib/pgsql/9.6/data
export PGDATA
PATH=$PATH:/usr/pgsql-9.6/bin
export PATH
PGPASSFILE=/var/lib/pgsql/.pgpass
export PGPASSFILE
PGPASSWORD=somepassword
export PGPASSWORD
SCRIPTS=/var/lib/pgsql/9.6/data/scripts
export SCRIPTS
# If you want to customize your settings,
# Use the file below. This is not overridden
# by the RPMS.
[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile
export PATH=~/.local/bin:$PATH

Step 5:- Now we will create the scripts to backup the postgresql databases.

In Step 4, we configure $SCRIPT to point a directory /var/lib/pgsql/9.6/data/scripts where we will place our scripts.

The backup script will do the following:-

=> Backup all the databases running in the postgresql instance.

=> Move the backup files to S3 bucket in S3 – IA storage class one after another and remove the backup file from the local storage.

=> Delete backup files older than 7 days from the S3 bucket.

cd $SCRIPTS

vi pg_dump_s3.sh

#!/bin/bash

set -e

# Database credentials
PG_HOST=”localhost”
PG_USER=”postgres”

# S3
S3_PATH=”phi-postgresql-backup/postgresql01-pgbackup”

# get databases list
dbs=`psql -l -t | cut -d’|’ -f1 | sed -e ‘s/ //g’ -e ‘/^$/d’`

# Vars
NOW=$(date +”%m-%d-%Y-at-%H-%M-%S”)
#DIR=”$( cd “$( dirname “${BASH_SOURCE[0]}” )” && pwd )”
DIR=”/postgresql_bkp/pgsql_backup_logical”
SCRIPT=”/var/lib/pgsql/9.6/data/scripts”
echo “PostgreSQL Database Logical Backup is started and stored in s3://phi-postgresql-backup/postgresql01-pgbackup”
for db in $dbs; do
if [ “$db” != “template0” ] && [ “$db” != “template1″ ]; then
# Dump database
pg_dump -Fc -h $PG_HOST -U $PG_USER $db > $DIR/”$NOW”_”$db”.dump
# Copy to S3
aws s3 cp $DIR/”$NOW”_”$db”.dump s3://$S3_PATH/”$NOW”_”$db”.dump –storage-class STANDARD_IA
echo “* Database:” $db “is archived at timestamp:” $NOW
# Delete local file
rm $DIR/”$NOW”_”$db”.dump
fi
# Log
#echo “* Database: ” $db “is archived at timestamp: ” $NOW
done

# Delere old files
echo “Following backups older than 7 days are deleted from s3://phi-postgresql-backup/postgresql01-pgbackup”;
$SCRIPT/s3_autodelete.sh $S3_PATH “7 days”

vi s3_autodelete.sh

#!/bin/bash

# Usage:
# ./s3_autodelete.sh bucket/path “7 days”

set -e

# Maximum date (will delete all files older than this date)
maxDate=`date +%s –date=”-$2″`

# Loop thru files
aws s3 ls s3://$1/ | while read -r line; do
# Get file creation date
createDate=`echo $line|awk {‘print $1″ “$2’}`
createDate=`date -d”$createDate” +%s`

if [[ $createDate -lt $maxDate ]]
then
# Get file name
fileName=`echo $line|awk {‘print $4’}`
if [[ $fileName != “” ]]
then
echo “* Delete $fileName”;
aws s3 rm s3://$1/$fileName
fi
fi
done;

Step 6:- Now schedule the script in cron pg_dump_s3.sh to run every day to take the latest backup to S3 and delete the old backup from S3.
sudo crontab -u postgres -e

0 23 * * * /var/lib/pgsql/9.6/data/scripts/pg_dump_s3.sh > /var/lib/pgsql/9.6/data/log/pg_dump_s3-`date +\%Y\%m\%d\%H\%M\%S` 2>&1

Step 7:- This step will explain the restoration part from S3. Only consideration made in this step is the DB name is a single word name which the script will fetch from the backup file name itself. If the DB name contain any special character then the italic line in the restoration script need to change accordingly to fetch the DB name.

The script will do the following:-

=> From the list of available backups in the S3 bucket, the script will identify the database name and the backup file.

=> Fetch the backup from S3 to local storage.

=> Drop old database and create new database with the same owner, tablespace and privileges.

=> Restore the dump to database.

cd $SCRIPTS

vi pg_restore_s3.sh
#!/bin/bash
set -e

# Usage:
# ./pg_restore_s3.sh backup_file_name

AWS_BUCKET=”phi-postgresql-backup/postgresql01-pgbackup”
DUMP_OBJECT=$1
DIR=”/postgresql_bkp/pgsql_backup_logical”

echo “Postgres restore from s3 – looking for dump in s3 at s3://phi-postgresql-backup/postgresql01-pgbackup”
if [ -n “${DUMP_OBJECT}” ]; then
objectSet=$(aws s3 ls s3://${AWS_BUCKET}/${DUMP_OBJECT}| awk ‘{print $4 }’)
if [ -z “$objectSet” ]; then
echo “Backup file not found in s3 bucket”
else
echo “Downloading dump from s3 – $object”
aws s3 cp s3://phi-postgresql-backup/postgresql01-pgbackup/${DUMP_OBJECT} ${DIR}/${DUMP_OBJECT}
dbName=$(echo $DUMP_OBJECT | awk -F _ ‘{print $2 }’| awk -F . ‘{print $1 }’)
echo “dropping old database $dbName”
dbPgOwner=$(echo “SELECT r.rolname FROM pg_catalog.pg_database d JOIN pg_catalog.pg_roles r ON d.datdba = r.oid JOIN pg_catalog.pg_tablespace t on d.dattablespace
= t.oid WHERE d.datname= ‘$dbName’;”| psql 2>&1)
dbOwner=$(echo $dbPgOwner | awk ‘{print $3 }’)
dbPgTblspc=$(echo “SELECT t.spcname FROM pg_catalog.pg_database d JOIN pg_catalog.pg_roles r ON d.datdba = r.oid JOIN pg_catalog.pg_tablespace t on d.dattablespace
= t.oid WHERE d.datname= ‘$dbName’;”| psql 2>&1)
dbTblspc=$(echo $dbPgTblspc | awk ‘{print $3 }’)
PRE_RESTORE_PSQL=”GRANT ALL PRIVILEGES ON DATABASE $dbName to $dbOwner; REVOKE connect ON DATABASE $dbName FROM PUBLIC;”
dropResult=$(echo “SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = ‘$dbName’; \
DROP DATABASE $dbName;” | psql 2>&1)
if echo $dropResult | grep “other session using the database” -> /dev/null; then
echo “RESTORE FAILED – another database session is preventing drop of database $dbName”
exit 1
fi
createResult=$(echo “CREATE DATABASE $dbName OWNER $dbOwner tablespace = $dbTblspc;” | psql 2>&1)
echo “postgres restore from s3 – filling target database with dump”
if [ -n “$PRE_RESTORE_PSQL” ]; then
echo “postgres restore from s3 – executing pre-restore psql”
printf %s “$PRE_RESTORE_PSQL” | psql
fi
pg_restore -d $dbName $DIR/$DUMP_OBJECT
fi
echo “Postgres restore from s3 – complete – $DUMP_OBJECT $dbName”
else
echo “Please pass the backup dump file name as script argument”
fi

Using the scripts above, you are storing the backup of multiple postgresql databases directly in S3 and also restoring particular dump file to postgresql without any manual effort.

Lambda to Start EC2 Instances on schedule and send instant public IP over email to multiple recipients using SNS

Following article will describe how to start EC2 Instances on schedule using Lambda and send the public IP  instantly to multiple recipients over email using SNS. This activity will help to schedule start EC2 instances on a particular time during weekdays and send the public IP of the instances to the development team who use those instances for some development purpose. Using this method you don’t have to allocate Elastic IP to the EC2 instances and also do not need any human intervention to send the public IP of the instances to users.

Step 1: Create following role for the EC2 Start Stop action:-

1

{

“Version”: “2012-10-17”,

“Statement”: [

{

“Effect”: “Allow”,

“Action”: [

“logs:CreateLogGroup”,

“logs:CreateLogStream”,

“logs:PutLogEvents”

],

“Resource”: “arn:aws:logs:*:*:*”

},

{

“Action”: “ec2:*”,

“Effect”: “Allow”,

“Resource”: “*”

}

]

}

Step 2: Create a lambda function with following Python 2.7 code:-

2.jpg

import boto3

region = ‘eu-central-1’

instances = [‘i- xxxxxxxxxxxx1′,’i- xxxxxxxxxxxx2′,’i- xxxxxxxxxxxx3’]

def lambda_handler(event, context):

ec2 = boto3.client(‘ec2’, region_name=region)

ec2.start_instances(InstanceIds=instances)

print ‘started Bastion and Store1 & Store2 servers’ + str(instances)

Step 3: Schedule the Lambda function from Cloudwatch -> rule -> schedule to run at 11AM IST from Monday to Friday

3

Step 4: Create another role in IAM with Full privilege on EC2 and on SNS (Not recommended for production. Be specific about granting privilages):-

4.jpg

Policy for EC2

{

“Version”: “2012-10-17”,

“Statement”: [

{

“Action”: “ec2:*”,

“Effect”: “Allow”,

“Resource”: “*”

}

]

}

Policy for SNS

{

“Version”: “2012-10-17”,

“Statement”: [

{

“Effect”: “Allow”,

“Action”: [

“sns:*”

],

“Resource”: “arn:aws:sns:eu-central-1:xxxxxxxxxx:fra_tcs_store_ips”

}

]

}

Step 5: Create new Lambda function with the above policy to fetch public IP of instances and send SNS notifications:-

5.jpg

from __future__ import print_function

import json

import boto3

ec2 = boto3.resource(‘ec2′, region_name=’eu-central-1’)

client = boto3.client(

“sns”,

aws_access_key_id=”**********************”,

aws_secret_access_key=”***********************”,

region_name=”eu-central-1″

)

 

topic = client.create_topic(Name=”storeipnotifications”)

topic_arn = topic[‘TopicArn’]

 

client.subscribe(TopicArn=topic_arn,Protocol=’email’,Endpoint=’email.1@xxxxx.com’)

client.subscribe(TopicArn=topic_arn,Protocol=’email’,Endpoint=’email.2@xxxxx.com’)

client.subscribe(TopicArn=topic_arn,Protocol=’email’,Endpoint=’email.3@xxxxxx.com’)

client.subscribe(TopicArn=topic_arn,Protocol=’email’,Endpoint=’email.4@xxxxxx.com’)

 

instances1 = ec2.instances.filter(Filters=[{‘Name’: ‘instance-id’, ‘Values’: [‘i-xxxxxxxxxxxx1’]}])

instances2 = ec2.instances.filter(Filters=[{‘Name’: ‘instance-id’, ‘Values’: [‘i- xxxxxxxxxxxx2’]}])

 

for instance in instances1:

store1IP = instance.public_ip_address

print(“Store1 public IP – “, instance.public_ip_address)

for instance in instances2:

store2IP = instance.public_ip_address

print(“Store2 public IP – “, instance.public_ip_address)

 

print(‘Loading function’)

 

def lambda_handler(event, context):

message = ‘Store1 IP:’+store1IP+’ , Store2 IP:’+store2IP

print(message)

client.publish(Message=message, TopicArn=topic_arn)

return message

Step 6: Now create the schedule to run this Lambda code at 11:05AM IST from Monday to Friday

6.jpg

So once the email subscription is confirmed by the recipients, the recipients will start receiving the IP address of the 2 Stores at 11:05AM IST from Monday to Friday.

Schedule Start & Stop RDS Instances and Save Money

Introduction

If RDS Instances is AWS keep running without 24×7 usage, it can cost you very high in AWS billing.

Take following example how idle RDS instances can cost you:-

Lets take a medium sized RDS instance db.m4.xlarge (4CPU and 16GB RAM) in Sydney region for MySQL engine costs $0.492 per hour.

Now if we use the calculator, usage running 24×7, costs on average $360.15 per month.

Now if we only had that running business hours (9 hrs a day MON-FRI) , talking on average $92.99.

Thats a cost saving of $267.16. That is just for one instance per month/ OR $3,205.92 per annum.

What if you were working in a large enterprise with 10s or 100s of RDS instances.

Hence it is very important to stop the RDS instance when Idle to save significant cost in AWS Billing.

Let’s use aws-sdk, nodeJs Lambda function and CloudWatch to start/stop RDS instances in schedule interval.

Following are step by step guide with screenshot.

Download

Download the source code in zip format from following link and keep the zip file in local machine for further upload in Lambda function.

https://s3.ap-south-1.amazonaws.com/rdsstopstartcode/rds-lambda-stop-start.zip

Step 1: Create IAM Policy RDSManagement

First, let’s create policy, which is in AWS IAM console.

Go to Services -> Policies -> Create Policy -> Create Your Own Policy.
Let’s call it “RDSManagement”. Put the code given below to the ‘Policy Document’ field:

{
    "Version": "2012-10-17",
    "Statement": [        
        {
            "Action": [
                "rds:StopDBInstance",
                "rds:StartDBInstance"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Press ‘Create policy’, Create Your Own Policy put name, description and JSON to the appropriate fields and press save. Eventually, you have to get something like on the screenshot below:

Screen1:-

1

Screen2:-

2

Screen3:-

3

Screen4:-

4

Step 2: Create Role LambdaRDSManagement

The second step is to create a role which will be associated with lambda function and allow it to manage RDS instances.

Navigate to Services -> Roles -> Create new Role.

Select ‘AWS Lambda’ in ‘AWS Service Role’ section. Search for the policy we created previously, select it, and press ‘Next’.

Screen5:-

5.png

Put “LambdaRDSManagement” as a Role Name, set some description and press ‘Create Role’.

Screen6:-

6.png

Screen7:-

7.png

Screen8:-

8.png

Step 3: Create Lambda Function ManageRDSInstances

Now we are ready to go and create lambda function which will manage our instances. Navigate to Services -> Lambda -> Create a Lambda function -> Blank function. Let’s call it ‘ManageRDSInstances’, select latest Node js 6.x as a runtime. Ignore lambda function code for now, and select ‘Choose an existing role’ in ‘Role’ field. You have to be able to find the previously created role in ‘Existing role’ field. Press ‘Next’ -> ‘Create function’.

Screen9:-

9.png

Now let’s upload our archive zip file previously downloaded in local machine to newly created lambda function.

Services -> Lambda -> ManageRDSInstances, and change Code entry type to ‘Upload a .ZIP file’. Press ‘Upload’, select your zip file and press ‘Save’.

Screen10:-

10

Now we need to configure test event: Actions -> Configure test event.

Screen11:-

11.png

Where “tomcatdb01” and “mysqlbiz” are testing RDS instances. After pressing Press ‘Save’ and ‘Test’, you will see that your RDS instances changed state to ‘Stopping’ and soon to ‘Stopped’.

After they are stopped, you can run the same test with action ‘start’, which will run change state of RDS instances to running.

Step 4: Create CloudWatch rule to schedule the event

The last thing is to set up CloudWatch rules to trigger these function on schedule.
Services -> CloudWatch -> Rules -> Create Rule.

Screen12:-

12.png

Select Schedule instead of default Event Pattern. Now you need to set up cron time. Keep in mind that time must be set in GMT timezone. For instance, to start instances every day at 8 am in cron time will look like this: ‘* 8 * * *’.
After you set cron time for waking up your instances, select Lambda function as a Target and pick your newly created lambda function. Then in Configure Input section put your JSON to Constant(JSON text) field:

{ "instances": ["some-test-instance-1",
"some-test-instance-2"], "action":"start" }

Screen13:-

13.png

Screen14:-

14.png

Now your instances will be woken on every morning at 8AM. Create a similar rule with correct cron time for stopping them, do not forget to change action from start to stop in the json:

{ "instances": ["some-test-instance-1",
"some-test-instance-2"], "action":"stop" }

So now we have schedule the Lambda functions to trigger at schedule interval to start and stop RDS instances and save significant cost in AWS billing.