Week 2: Automating PostgreSQL Provisioning with ServiceNow and AWS

☁️ AWS Platform Engineering Lab 📅 Week 2 of 52 ⚡ 12-second provisioning 🏗️ 67 Terraform resources 🐘 Aurora PostgreSQL 16

Week 2: I Built a Self-Service Database Platform — A Developer Gets a PostgreSQL Database in 12 Seconds

ServiceNow → API Gateway → Lambda → Step Functions → Aurora Serverless v2 → Secrets Manager

In my 14 years as a DBA, I've seen this scene play out hundreds of times. A developer raises a ticket: "Need a PostgreSQL database for my new service." The ticket sits in the queue. A DBA reviews it, provisions the database, sets up credentials, emails them back. Three days later. If you're lucky.

Week 2 of my 52-week AWS lab eliminates that process entirely. A developer fills out a ServiceNow catalog item — database name, team, purpose — clicks Order Now, and 12 seconds later the ticket closes itself with full connection details, auto-rotating credentials stored in AWS Secrets Manager, and an isolated PostgreSQL database on a shared Aurora Serverless v2 cluster.

Zero emails. Zero DBA involvement. Zero waiting.

12sTicket to Database
67Terraform Resources
0DBA Actions
30dSecret Auto-Rotation
Tenant Databases

Architecture

Week 2 — Aurora Self-Service Database Platform Architecture Week 2 — Aurora Self-Service Database Platform ServiceNow → API Gateway → Lambda → Step Functions → Aurora Serverless v2 → Secrets Manager 🎫 ServiceNow RITM Ticket Catalog Item 🔌 API Gateway POST /provision HTTPS λ webhook_receiver HMAC validate Start execution ⚙️ Step Functions 5 state transitions Standard type λ db_provisioner CREATE DB + USER Store secret 🐘 Aurora PostgreSQL 16 Serverless v2 0.5 → 16 ACUs ⚡ Ticket closed in ~12 seconds · status_updater → ServiceNow 🔑 Secrets Manager Per-tenant · 30-day rotation 📋 SSM Params ServiceNow credentials 📊 CloudWatch Dashboard · 4 Alarms 🌐 VPC 3-tier isolation 🛡️ IAM + OIDC · GitHub Actions Least privilege · No static keys

How It Works — The 6-Step Flow

Every provisioning request follows the same path from ticket to database:

  1. Developer submits ServiceNow catalog item — fills in Database Name, Team, Purpose, clicks Order Now. A RITM ticket is created in sc_req_item.
  2. Business Rule fires automatically on INSERT — server-side JavaScript calls the AWS Outbound REST Message with the ticket payload over HTTPS.
  3. API Gateway receives the POST — routes to the webhook_receiver Lambda which validates the HMAC signature and starts a Step Functions execution named db-{db_name}-{ticket_id}.
  4. Step Functions orchestrates two tasks — ProvisionDatabase → UpdateServiceNow → Done. 5 state transitions, error handling and retry built in.
  5. db_provisioner Lambda runs — fetches master credentials from Secrets Manager, connects to Aurora via pg8000, runs CREATE DATABASE, CREATE USER, GRANT ALL PRIVILEGES, stores the tenant secret, enables 30-day auto-rotation.
  6. status_updater Lambda closes the ticket — posts a work note to ServiceNow with writer host, reader host, port, username, secret name, AWS CLI retrieval command, rotation policy, and monitoring links. Ticket → Closed Complete.

The Demo — Watch It Work

Deploy — 67 Resources, One Command

Terraform plan shows 67 resources to add. The deploy script runs pre-flight checks, deletes any orphaned CloudWatch log groups, then applies the plan. Aurora Serverless v2 takes ~10 minutes to fully warm up.

Terraform plan showing 67 resources to add

Terraform Outputs

All key values are captured as outputs: API Gateway URL, Aurora writer/reader endpoints, CloudWatch dashboard URL, state machine ARN, VPC ID.

Terraform outputs showing all endpoints and ARNs

Manual curl Test First

Before wiring ServiceNow, I verified the full stack with a direct curl call. Ticket RITM0010001, database myapp_db. Instant response: workflow started.

curl POST to API Gateway with successful response

Step Functions — Both Executions Succeeded

Two successful executions visible in the console: db-myapp_db-RITM0010001 (12.4 seconds via curl) and db-jay_test-RITM0010012 (8.7 seconds via ServiceNow). Both Succeeded.

Step Functions showing 2 successful executions
Step Functions execution detail — ProvisionDatabase UpdateServiceNow Done in 12 seconds

Credentials in Secrets Manager

Every provisioned database gets its own secret at /selfservice-db/dev/db/{db_name} — host, port, username, password, engine, dbname. The password is generated fresh by the provisioner Lambda, never stored in Terraform state.

AWS CLI showing secret JSON from Secrets Manager for myapp_db

ServiceNow Ticket Closes Itself

The status_updater Lambda calls back to ServiceNow and closes the RITM with everything a developer needs — no email, no DBA, no waiting.

ServiceNow RITM0010012 Closed Complete with full database provisioning details

Verified in RDS Query Editor

Connected to Aurora via the RDS Query Editor (Data API). SELECT datname FROM pg_database confirms both tenant databases exist on the shared cluster — completely isolated by PostgreSQL user permissions.

RDS Query Editor connected to Aurora showing jay_test and myapp_db databases

Aurora Architecture Deep Dive

Aurora isn't just RDS with a new name. The storage and compute layers are completely decoupled — and that separation is what makes unlimited multi-tenancy work.

Aurora PostgreSQL Architecture — Compute vs Storage COMPUTE LAYER — EC2 instances running PostgreSQL engine Writer Instance aurora-instance-1 All writes + reads · db.serverless Scales 0.5 → 16 ACUs ✓ Writer endpoint Reader Instance aurora-instance-2 Read-only · db.serverless Scales independently ✓ Reader endpoint replication SHARED DISTRIBUTED STORAGE — 6 copies across 3 Availability Zones · auto-repair AZ-1 (us-east-1a) Copy 1 + Copy 2 AZ-2 (us-east-1b) Copy 3 + Copy 4 AZ-3 (us-east-1c) Copy 5 + Copy 6 postgres rdsadmin selfservice myapp_db (curl test tenant) jay_test (ServiceNow tenant) LOGICAL DATABASES — isolated by PostgreSQL user permissions · unlimited tenants on shared cluster

The key insight: In traditional PostgreSQL you'd spin up a new server per tenant. With Aurora Serverless v2, you just run CREATE DATABASE — instant, zero infrastructure cost per tenant. The shared storage pool automatically maintains 6 copies across 3 AZs. If a storage node fails, Aurora self-repairs without any downtime.


Multi-Tenant Isolation Model

ConcernApproachWhy
IsolationPostgreSQL database per tenantComplete schema isolation — no cross-tenant queries possible
CredentialsDedicated user per databaseCredential compromise affects one tenant only
Secret storage/selfservice-db/dev/db/{name}Consistent path — easy for apps to discover at runtime
Rotation30-day auto-rotation per secretEach tenant rotates independently — no shared rotation risk
Connections50 connections per tenant (configurable)Prevents one tenant from exhausting the cluster connection pool
ScaleAurora Serverless v2 auto-scalesNo right-sizing needed — cluster adapts to combined tenant load

Security Patterns Baked In

No public database access. Aurora lives in isolated database subnets with no internet route. The only inbound path is from the Lambda security group on port 5432.

Least-privilege IAM. Lambda roles are scoped to specific secret ARN prefixes. db_provisioner can't access secrets outside its namespace.

Zero-downtime secret rotation. 4-step protocol: createSecret → setSecret → testSecret → finishSecret. The old password stays valid until the new one is confirmed working. Apps never see a connection failure during rotation.

HMAC-signed webhooks. ServiceNow payloads are signed with a shared secret. The webhook_receiver validates the signature before starting any execution — spoofed requests are rejected.

OIDC for CI/CD. GitHub Actions uses OIDC federation — no long-lived AWS credentials stored anywhere. Role assumed per-run using short-lived STS tokens.

⚠️ Never cache passwords. Because Secrets Manager rotates credentials every 30 days, applications must retrieve the secret at connection time — never at startup, never hardcoded. If your app caches the password, it will break 30 days after the first rotation.

Bugs That Taught Me Things

Bug 1: Terraform Circular Dependency

SYMPTOM Terraform refuses to plan — Lambda ↔ Step Functions circular dependency

Cause: Lambda needed the Step Functions state machine ARN. Step Functions needed the Lambda function ARNs. Classic circular dependency — no resolution possible by reference.

Fix: Compute the state machine ARN deterministically using the account ID:

locals {
  state_machine_arn = "arn:aws:states:${var.aws_region}:${data.aws_caller_identity.current.account_id}:stateMachine:${var.project}-${var.environment}-db-provisioning"
}
FIX ARN is predictable — no circular reference needed.

Bug 2: Em-dash Rejected by AWS

SYMPTOM InvalidParameterValue: Character sets beyond ASCII

Cause: Em-dash (, U+2014) in AWS resource description fields. AWS only accepts ASCII characters.

FIX Replace every with - in all Terraform description fields.

Bug 3: CloudWatch Dashboard Missing Region

SYMPTOM InvalidParameterInput: should have required property 'region'

Cause: Every CloudWatch dashboard widget requires an explicit region field in its properties block.

data "aws_region" "current" {}

# In every widget properties block:
region = data.aws_region.current.name
FIX Add region to all widget properties blocks.

Bug 4: autocommit Required for CREATE DATABASE

SYMPTOM CREATE DATABASE cannot run inside a transaction block

Cause: PostgreSQL won't execute CREATE DATABASE within a transaction — and pg8000 opens a transaction by default.

conn = pg8000.connect(...)
conn.autocommit = True   # Must be set BEFORE CREATE DATABASE
conn.run("CREATE DATABASE my_db")
FIX Set conn.autocommit = True before any DDL statement.

Bug 5: Windows Path Mangling with AWS CLI

SYMPTOM Git Bash converts /selfservice-db/... to C:/Program Files/Git/selfservice-db/...

Cause: Git Bash on Windows converts paths starting with / to Windows paths. Breaks any AWS CLI command using secret or parameter paths.

# Wrong — inline form doesn't apply to variable assignments
MSYS_NO_PATHCONV=1 SECRET=$(aws secretsmanager ...)

# Correct — export first, then run
export MSYS_NO_PATHCONV=1
SECRET=$(aws secretsmanager get-secret-value --secret-id /selfservice-db/dev/db/myapp_db)
FIX Always export MSYS_NO_PATHCONV=1 as a separate command before AWS CLI calls with / paths.

Cost Breakdown

ResourceCost/Month (Running)Cost (Destroyed)
Aurora Serverless v2 (0.5 ACU minimum)~$43$0
NAT Gateway~$32$0
Lambda (minimal traffic)~$0$0
Secrets Manager (per secret)~$0.40/secret$0
Total~$75/month$0
💡 Cost tip: Run sh scripts/cleanup.sh when you're done for the day — destroys everything in ~5 minutes. Run sh scripts/deploy.sh next session — rebuilds in ~10 minutes. You'll pay cents instead of $75/month.

Key Takeaways

Database-per-tenant on a shared Aurora cluster is the right multi-tenancy model for most platforms. You get complete isolation at the PostgreSQL level, a dedicated user and secret per tenant, and independent rotation schedules — all without provisioning new infrastructure per customer.

Aurora's storage/compute separation is underrated. Reader replication is near-instant because both instances share the same storage pool. No network replication lag like traditional Postgres streaming replication.

Secrets Manager rotation with RotateImmediately=False. Always set this when enabling rotation programmatically. Immediate rotation on a brand-new database user will fail before the rotation Lambda has been tested against that specific credential.

14 years as a DBA taught me what the old process cost. Three-day turnaround for a database request isn't a staffing problem — it's an architecture problem. This week proved you can eliminate it entirely.


What's Next — Week 3

Two weeks in. The pattern is working: ServiceNow ticket → API Gateway → Lambda → Step Functions → AWS resource → ticket closed with details. Week 3 builds on this foundation with a more production-hardened pattern. Stay tuned.


All code available on GitHub: katta698/AWS-Platform-Engineering-Lab. Questions? Drop a comment below.

Comments

Popular posts from this blog

ASM Integrity check failed with PRCT-1225 and PRCT-1011 errors while creating database using DBCA on Exadata 3 node RAC

Life is beautiful

Lock Tables in MariaDB