Week 1: From Ticket to EC2 in 6 Minutes: Enterprise Self-Service Infrastructure on AWS

☁️ AWS Platform Engineering Lab 📅 Week 1 ⏱️ 6-min provisioning 🏗️ 49 Terraform resources

I Built a Self-Service AWS Infrastructure Platform

ServiceNow → API Gateway → Lambda → Step Functions → GitHub Actions → Terraform → EC2

Every large enterprise has this problem: a developer needs a VM. They open a ticket. It goes to a queue. Someone reads it three days later, asks clarifying questions, manually clicks through the AWS console, and emails back a hostname. The developer has moved on.

This week I built the automated version of that story. A developer fills out a ServiceNow catalog form, clicks Submit, and 6 minutes later a production-grade EC2 instance is running behind an ALB — provisioned by Terraform, tracked in state, tagged with the ticket number, and monitored in CloudWatch. No human in the loop.


Architecture

Week 1: Enterprise Self-Service EC2 Provisioning Architecture Week 1 — Enterprise Self-Service EC2 Provisioning ServiceNow → API Gateway → Lambda → Step Functions → GitHub Actions → Terraform → EC2 SERVICENOW Catalog Request Business Rule HMAC Webhook POST API GATEWAY REST API 202 Accepted LAMBDA SN Receiver Validate · Start SF Deploy Trigger GitHub Dispatch Status Updater ServiceNow Writeback STEP FUNCTIONS 1. ValidateRequest 2. UpdateTicket 3. TriggerDeploy ⏳ 4. NotifySuccess waitForTaskToken pauses · waits callback dispatch GITHUB ACTIONS OIDC Auth (no static keys) terraform plan terraform apply send-task-success callback AWS INFRASTRUCTURE (Terraform Managed · 49 Resources) VPC 10.0.0.0/16 Public Subnets (2 AZs) ALB HTTP :80 NAT GW Egress Private Subnets (2 AZs) Auto Scaling Group min:1 desired:1 max:4 EC2 t3.micro EC2 scale-out IAM EC2 Instance Role GitHub OIDC Role SSM PARAMETER STORE SN creds · GH token webhook secret CLOUDWATCH Dashboard 4 Metric Alarms VPC Flow Logs SNS Alert Topic Email Subscription SECURITY IMDSv2 enforced No SSH keys SSM Session Manager TERRAFORM STATE S3 State Bucket S3 Native Locking no DynamoDB needed Encrypted at Rest KMS Default Key versioning enabled terraform apply → creates/updates calls

What Gets Deployed (49 Resources)

LayerResources
NetworkingVPC, 2 public subnets, 2 private subnets, IGW, NAT Gateway, Route Tables, VPC Flow Logs
ComputeLaunch Template, Auto Scaling Group (min 1 / max 4), EC2 in private subnet
Load BalancingALB (public), Target Group, HTTP Listener
Security3 Security Groups, IAM Role + Instance Profile, OIDC Provider
Automation3 Lambda Functions, Step Functions State Machine, API Gateway REST API
ObservabilityCloudWatch Dashboard, 4 Metric Alarms, Log Groups, SNS Topic + Email
Configuration5 SSM Parameters (ServiceNow creds, GitHub token, webhook secret)

Step 1 — Setting Up the ServiceNow Catalog Item

The first piece is the user-facing form. In ServiceNow Catalog Builder, I created a new catalog item called "AWS EC2 Instance Request" under Technical Catalog → Infrastructure.

Four questions drive the entire provisioning workflow — everything Terraform needs comes from these fields:

⚠️ Critical: The Name field on each question (not the Label) must match what the Business Rule sends. A mismatch silently falls through to default values — your t3.micro request becomes t3.medium and you'll spend an hour debugging.

Step 2 — Submitting the First Request

After publishing the catalog item under Technical Catalog → Infrastructure, it appeared in the portal. First submission looked like this:


The Real Debugging Journey

This is where blog posts usually skip to "and then it worked." I'm not skipping it — this is where the actual learning happened.

Bug 1: 502 from the ServiceNow REST Message Test

SYMPTOM ServiceNow fires webhook → API Gateway returns 502

Cause: Lambda environment variable mismatch. Terraform creates the Lambda with STATE_MACHINE_ARN but the handler read STEP_FUNCTION_ARN. Python KeyError → Lambda crash → 502.

FIX Align the env var name in the handler to match what Terraform sets.

Bug 2: Runtime.HandlerNotFound on All Three Lambdas

SYMPTOM Lambda invocations fail with Runtime.HandlerNotFound

Cause: Terraform configures handler as handler.lambda_handler but all three files defined def handler(event, context).

FIX Rename all three functions to def lambda_handler.

Bug 3: Step Functions Choice State Failing

SYMPTOM Execution fails at CheckValidation — path $.validation.Payload.valid not found

Cause: The validation step reused the status_updater Lambda, which doesn't return a valid field. The Choice state looked for a field that never existed.

FIX Replace the Lambda Task validation with a Pass state for the lab demo.

Bug 4: GitHub API 404 on repository_dispatch

SYMPTOM deployment_trigger Lambda gets GitHub 404 Not Found

Cause: GitHub PAT scoped to public_repo but the repository is private. Private repos need full repo scope.

Fix: Create new PAT with repo scope, update SSM:

MSYS_NO_PATHCONV=1 aws ssm put-parameter \
  --name "/selfservice-ec2/dev/github/token" \
  --value "ghp_xxx" --type SecureString --overwrite
Set MSYS_NO_PATHCONV=1 on Windows/Git Bash or the path gets mangled to C:/selfservice-ec2/dev/github/token.

Bug 5: Business Rule Fires 5 Times

SYMPTOM 5 parallel GitHub Actions workflows from one catalog request

Cause: The Business Rule had both "Insert" AND "Update" checked. Every RITM state transition re-fired it.

FIX Uncheck "Update" — trigger on Insert only.

Bug 6: Instance Type Not Changing Despite Correct Selection

SYMPTOM ServiceNow request for t3.micro — EC2 launches as t3.medium

Cause: version = "$Latest" in the ASG launch template reference. Terraform doesn't see ASG changes when only the LT version changes — so instance_refresh never fires.

# Before (broken)
launch_template {
  id      = aws_launch_template.app.id
  version = "$Latest"
}
# After (fixed)
launch_template {
  id      = aws_launch_template.app.id
  version = aws_launch_template.app.latest_version
}
FIX Reference latest_version so Terraform detects the change and triggers rolling instance refresh.

Security Patterns Baked In

No SSH keys anywhere. EC2 instances use SSM Session Manager (AmazonSSMManagedInstanceCore). Connect with aws ssm start-session --target i-xxx. No key pair, no port 22, no bastion host.

IMDSv2 enforced. Launch template sets http_tokens = "required". Blocks SSRF attacks from reaching the metadata endpoint.

Private subnets. EC2 lives in private subnets, unreachable from the internet. Only accessible through the ALB.

OIDC for GitHub Actions. No AWS_ACCESS_KEY_ID in GitHub Secrets. GitHub exchanges a JWT for temporary STS credentials scoped to your specific repo.

Secrets in SSM Parameter Store. ServiceNow credentials and GitHub token stored as SecureString (KMS-encrypted). Never in env vars, never in code, never committed to git.


Known Limitation: ServiceNow PDI Ticket Writeback

The status_updater Lambda can't reach the ServiceNow PDI (dev388443.service-now.com) from AWS Lambda. DNS resolution fails because free-tier PDI instances hibernate and aren't always routable from AWS data centers.

This is not an architecture problem. In production with an enterprise ServiceNow instance (dedicated hostname, guaranteed uptime), the ticket writeback works perfectly. The graceful handling we added is actually the right production pattern — a ServiceNow outage should never block infrastructure provisioning:

except urllib.error.URLError as e:
    logger.warning("ServiceNow unreachable — skipping update: %s", str(e))
    return {"message": f"Skipped: {str(e)}", "ticket_id": ticket_id}

Cost Breakdown

ResourceMonthly Cost (est.)
t3.micro EC2 (1 instance, always on)~$8.50
NAT Gateway~$35 ← biggest cost
ALB~$18
Lambda / Step Functions / API GWFree tier
CloudWatch~$3
Total~$65/month
💡 Cost tip: For a dev lab you use occasionally, run terraform destroy between sessions. The state stays in S3 and you can rebuild everything in 6 minutes. You'll pay cents instead of $65/month.

Key Takeaways

The waitForTaskToken pattern in Step Functions is underused. Most people either poll (wasteful) or fire-and-forget (lose visibility). The callback pattern gives you synchronous visibility into a long-running async process with zero idle cost.

version = "$Latest" in ASG launch templates is a silent footgun. Terraform doesn't see LT version changes as an ASG change. Always use aws_launch_template.app.latest_version to trigger instance refresh automatically.

IMDSv2 + private subnets + SSM is the modern EC2 security baseline. If any deployment in your org still has a key pair and port 22 open, that's technical debt worth addressing.

Design every external call as best-effort. Your infrastructure pipeline should never be blocked by a downstream ITSM system. Graceful degradation is a feature, not a workaround.


How to Implement This Yourself

Everything below references the GitHub repo: katta698 / week-01-enterprise-ec2-provisioning. Fork it, follow these steps, and you'll have the full pipeline running in your own AWS account.

Prerequisites

  • AWS account with admin access
  • GitHub account + fork of the project repo
  • ServiceNow Personal Developer Instance — free at developer.servicenow.com
  • Terraform 1.10+ installed locally
  • AWS CLI configured (aws configure)

Phase 1 — AWS Bootstrap (One-time)

# Create state bucket (choose a unique name)
aws s3api create-bucket \
  --bucket YOUR-terraform-state-bucket \
  --region us-east-1
aws s3api put-bucket-versioning \
  --bucket YOUR-terraform-state-bucket \
  --versioning-configuration Status=Enabled
aws s3api put-bucket-encryption \
  --bucket YOUR-terraform-state-bucket \
  --server-side-encryption-configuration \
  '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'

Phase 2 — Configure Variables

Edit terraform/environments/dev/terraform.tfvars (gitignored — never commit):

project              = "selfservice-ec2"
environment          = "dev"
aws_region           = "us-east-1"
artifact_bucket_name = "YOUR-terraform-state-bucket"
tf_state_bucket      = "YOUR-terraform-state-bucket"
github_org           = "YOUR-github-username"
github_repo          = "week-01-enterprise-ec2-provisioning"
create_github_oidc_provider = true
alert_emails         = ["your@email.com"]
servicenow_instance_url = "https://devXXXXXX.service-now.com"
servicenow_username     = "admin"
servicenow_password     = "YOUR-SN-PASSWORD"
github_token            = "ghp_xxxxxxxxxxxx"  # needs 'repo' scope
webhook_secret          = "your-random-secret"

Phase 3 — Deploy Infrastructure

cd terraform/environments/dev
terraform init \
  -backend-config="bucket=YOUR-terraform-state-bucket" \
  -backend-config="region=us-east-1"
terraform plan   # preview 49 resources
terraform apply  # ~6 minutes
# Save this output — needed for ServiceNow
terraform output api_gateway_url

Phase 4 — Set GitHub Actions Secrets

Repo → Settings → Secrets and variables → Actions → New repository secret:

Secret NameValue
AWS_ROLE_ARNFrom terraform output github_actions_role_arn
TF_STATE_BUCKETYOUR-terraform-state-bucket
AWS_REGIONus-east-1

Phase 5 — Configure ServiceNow

5a. REST Message (outbound endpoint): System Web Services → Outbound → REST Messages → New. Set Endpoint to your api_gateway_url. Add POST method using the script in servicenow/business_rule_script.js.

5b. Catalog Item: Service Catalog → Catalog Builder → New item named "AWS EC2 Instance Request" under Technical Catalog → Infrastructure. Add 4 questions with these exact Name values:

LabelName (must match exactly)Type
Instance Typeinstance_typeDropdown: t3.micro, t3.small, t3.medium, t3.large
EnvironmentenvironmentDropdown: dev, staging, prod
Desired Capacitydesired_capacitySingle line text
Cost Centercost_centerSingle line text

5c. Business Rule: System Definition → Business Rules → New. Table: sc_req_item. When: After. ✅ Insert only — uncheck Update. Paste script from servicenow/business_rule_script.js.

⚠️ Only check Insert, not Update. If Update is checked, the rule fires on every RITM state change → 5+ parallel GitHub Actions runs per request.

Phase 6 — End-to-End Test

  1. ServiceNow → Service Catalog → search "AWS EC2 Instance Request"
  2. Fill in: Instance Type = t3.micro, Environment = dev, Capacity = 1
  3. Submit → watch GitHub Actions start within 30 seconds
  4. Step Functions execution runs ~6 minutes through all 4 states
  5. EC2 Console → new instance selfservice-ec2-dev-app appears ✅
ServiceNow submissions
Final screenshot - EC2 provisioned
Step Functions execution succeeded

Phase 7 — Teardown

terraform destroy  # destroys all the resources created
# S3 state bucket and OIDC provider are preserved — reusable across all weekly projects
terraform destroy output

All code available on GitHub. Questions? Drop a comment below.

Comments

Popular posts from this blog

ASM Integrity check failed with PRCT-1225 and PRCT-1011 errors while creating database using DBCA on Exadata 3 node RAC

Life is beautiful

Lock Tables in MariaDB