Week 1: From Ticket to EC2 in 6 Minutes: Enterprise Self-Service Infrastructure on AWS
I Built a Self-Service AWS Infrastructure Platform
Every large enterprise has this problem: a developer needs a VM. They open a ticket. It goes to a queue. Someone reads it three days later, asks clarifying questions, manually clicks through the AWS console, and emails back a hostname. The developer has moved on.
This week I built the automated version of that story. A developer fills out a ServiceNow catalog form, clicks Submit, and 6 minutes later a production-grade EC2 instance is running behind an ALB — provisioned by Terraform, tracked in state, tagged with the ticket number, and monitored in CloudWatch. No human in the loop.
Architecture
What Gets Deployed (49 Resources)
| Layer | Resources |
|---|---|
| Networking | VPC, 2 public subnets, 2 private subnets, IGW, NAT Gateway, Route Tables, VPC Flow Logs |
| Compute | Launch Template, Auto Scaling Group (min 1 / max 4), EC2 in private subnet |
| Load Balancing | ALB (public), Target Group, HTTP Listener |
| Security | 3 Security Groups, IAM Role + Instance Profile, OIDC Provider |
| Automation | 3 Lambda Functions, Step Functions State Machine, API Gateway REST API |
| Observability | CloudWatch Dashboard, 4 Metric Alarms, Log Groups, SNS Topic + Email |
| Configuration | 5 SSM Parameters (ServiceNow creds, GitHub token, webhook secret) |
Step 1 — Setting Up the ServiceNow Catalog Item
The first piece is the user-facing form. In ServiceNow Catalog Builder, I created a new catalog item called "AWS EC2 Instance Request" under Technical Catalog → Infrastructure.
Four questions drive the entire provisioning workflow — everything Terraform needs comes from these fields:
Step 2 — Submitting the First Request
After publishing the catalog item under Technical Catalog → Infrastructure, it appeared in the portal. First submission looked like this:
The Real Debugging Journey
This is where blog posts usually skip to "and then it worked." I'm not skipping it — this is where the actual learning happened.
Bug 1: 502 from the ServiceNow REST Message Test
Cause: Lambda environment variable mismatch. Terraform creates the Lambda with STATE_MACHINE_ARN but the handler read STEP_FUNCTION_ARN. Python KeyError → Lambda crash → 502.
Bug 2: Runtime.HandlerNotFound on All Three Lambdas
Cause: Terraform configures handler as handler.lambda_handler but all three files defined def handler(event, context).
def lambda_handler.Bug 3: Step Functions Choice State Failing
Cause: The validation step reused the status_updater Lambda, which doesn't return a valid field. The Choice state looked for a field that never existed.
Bug 4: GitHub API 404 on repository_dispatch
Cause: GitHub PAT scoped to public_repo but the repository is private. Private repos need full repo scope.
Fix: Create new PAT with repo scope, update SSM:
MSYS_NO_PATHCONV=1 aws ssm put-parameter \
--name "/selfservice-ec2/dev/github/token" \
--value "ghp_xxx" --type SecureString --overwrite
SetMSYS_NO_PATHCONV=1on Windows/Git Bash or the path gets mangled toC:/selfservice-ec2/dev/github/token.
Bug 5: Business Rule Fires 5 Times
Cause: The Business Rule had both "Insert" AND "Update" checked. Every RITM state transition re-fired it.
Bug 6: Instance Type Not Changing Despite Correct Selection
Cause: version = "$Latest" in the ASG launch template reference. Terraform doesn't see ASG changes when only the LT version changes — so instance_refresh never fires.
# Before (broken)
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
# After (fixed)
launch_template {
id = aws_launch_template.app.id
version = aws_launch_template.app.latest_version
}
latest_version so Terraform detects the change and triggers rolling instance refresh.Security Patterns Baked In
No SSH keys anywhere. EC2 instances use SSM Session Manager (AmazonSSMManagedInstanceCore). Connect with aws ssm start-session --target i-xxx. No key pair, no port 22, no bastion host.
IMDSv2 enforced. Launch template sets http_tokens = "required". Blocks SSRF attacks from reaching the metadata endpoint.
Private subnets. EC2 lives in private subnets, unreachable from the internet. Only accessible through the ALB.
OIDC for GitHub Actions. No AWS_ACCESS_KEY_ID in GitHub Secrets. GitHub exchanges a JWT for temporary STS credentials scoped to your specific repo.
Secrets in SSM Parameter Store. ServiceNow credentials and GitHub token stored as SecureString (KMS-encrypted). Never in env vars, never in code, never committed to git.
Known Limitation: ServiceNow PDI Ticket Writeback
The status_updater Lambda can't reach the ServiceNow PDI (dev388443.service-now.com) from AWS Lambda. DNS resolution fails because free-tier PDI instances hibernate and aren't always routable from AWS data centers.
This is not an architecture problem. In production with an enterprise ServiceNow instance (dedicated hostname, guaranteed uptime), the ticket writeback works perfectly. The graceful handling we added is actually the right production pattern — a ServiceNow outage should never block infrastructure provisioning:
except urllib.error.URLError as e:
logger.warning("ServiceNow unreachable — skipping update: %s", str(e))
return {"message": f"Skipped: {str(e)}", "ticket_id": ticket_id}
Cost Breakdown
| Resource | Monthly Cost (est.) |
|---|---|
| t3.micro EC2 (1 instance, always on) | ~$8.50 |
| NAT Gateway | ~$35 ← biggest cost |
| ALB | ~$18 |
| Lambda / Step Functions / API GW | Free tier |
| CloudWatch | ~$3 |
| Total | ~$65/month |
terraform destroy between sessions. The state stays in S3 and you can rebuild everything in 6 minutes. You'll pay cents instead of $65/month.Key Takeaways
The waitForTaskToken pattern in Step Functions is underused. Most people either poll (wasteful) or fire-and-forget (lose visibility). The callback pattern gives you synchronous visibility into a long-running async process with zero idle cost.
version = "$Latest" in ASG launch templates is a silent footgun. Terraform doesn't see LT version changes as an ASG change. Always use aws_launch_template.app.latest_version to trigger instance refresh automatically.
IMDSv2 + private subnets + SSM is the modern EC2 security baseline. If any deployment in your org still has a key pair and port 22 open, that's technical debt worth addressing.
Design every external call as best-effort. Your infrastructure pipeline should never be blocked by a downstream ITSM system. Graceful degradation is a feature, not a workaround.
How to Implement This Yourself
Everything below references the GitHub repo: katta698 / week-01-enterprise-ec2-provisioning. Fork it, follow these steps, and you'll have the full pipeline running in your own AWS account.
Prerequisites
- AWS account with admin access
- GitHub account + fork of the project repo
- ServiceNow Personal Developer Instance — free at developer.servicenow.com
- Terraform 1.10+ installed locally
- AWS CLI configured (
aws configure)
Phase 1 — AWS Bootstrap (One-time)
# Create state bucket (choose a unique name)
aws s3api create-bucket \
--bucket YOUR-terraform-state-bucket \
--region us-east-1
aws s3api put-bucket-versioning \
--bucket YOUR-terraform-state-bucket \
--versioning-configuration Status=Enabled
aws s3api put-bucket-encryption \
--bucket YOUR-terraform-state-bucket \
--server-side-encryption-configuration \
'{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
Phase 2 — Configure Variables
Edit terraform/environments/dev/terraform.tfvars (gitignored — never commit):
project = "selfservice-ec2"
environment = "dev"
aws_region = "us-east-1"
artifact_bucket_name = "YOUR-terraform-state-bucket"
tf_state_bucket = "YOUR-terraform-state-bucket"
github_org = "YOUR-github-username"
github_repo = "week-01-enterprise-ec2-provisioning"
create_github_oidc_provider = true
alert_emails = ["your@email.com"]
servicenow_instance_url = "https://devXXXXXX.service-now.com"
servicenow_username = "admin"
servicenow_password = "YOUR-SN-PASSWORD"
github_token = "ghp_xxxxxxxxxxxx" # needs 'repo' scope
webhook_secret = "your-random-secret"
Phase 3 — Deploy Infrastructure
cd terraform/environments/dev
terraform init \
-backend-config="bucket=YOUR-terraform-state-bucket" \
-backend-config="region=us-east-1"
terraform plan # preview 49 resources
terraform apply # ~6 minutes
# Save this output — needed for ServiceNow
terraform output api_gateway_url
Phase 4 — Set GitHub Actions Secrets
Repo → Settings → Secrets and variables → Actions → New repository secret:
| Secret Name | Value |
|---|---|
AWS_ROLE_ARN | From terraform output github_actions_role_arn |
TF_STATE_BUCKET | YOUR-terraform-state-bucket |
AWS_REGION | us-east-1 |
Phase 5 — Configure ServiceNow
5a. REST Message (outbound endpoint): System Web Services → Outbound → REST Messages → New. Set Endpoint to your api_gateway_url. Add POST method using the script in servicenow/business_rule_script.js.
5b. Catalog Item: Service Catalog → Catalog Builder → New item named "AWS EC2 Instance Request" under Technical Catalog → Infrastructure. Add 4 questions with these exact Name values:
| Label | Name (must match exactly) | Type |
|---|---|---|
| Instance Type | instance_type | Dropdown: t3.micro, t3.small, t3.medium, t3.large |
| Environment | environment | Dropdown: dev, staging, prod |
| Desired Capacity | desired_capacity | Single line text |
| Cost Center | cost_center | Single line text |
5c. Business Rule: System Definition → Business Rules → New. Table: sc_req_item. When: After. ✅ Insert only — uncheck Update. Paste script from servicenow/business_rule_script.js.
Phase 6 — End-to-End Test
- ServiceNow → Service Catalog → search "AWS EC2 Instance Request"
- Fill in: Instance Type = t3.micro, Environment = dev, Capacity = 1
- Submit → watch GitHub Actions start within 30 seconds
- Step Functions execution runs ~6 minutes through all 4 states
- EC2 Console → new instance
selfservice-ec2-dev-appappears ✅
Phase 7 — Teardown
terraform destroy # destroys all the resources created
# S3 state bucket and OIDC provider are preserved — reusable across all weekly projects
All code available on GitHub. Questions? Drop a comment below.






Comments
Post a Comment