SSO for SSH

Integrating Service Workbench and Session Manager

Open padlock on yellow door

I recently had the opportunity to set up AWS' Service Workbench solution for a client. It's not a bad app! It does what those of us who work with AWS regularly can do through the console, but it does it with a clean and simple UI that makes cloud compute accessible to researchers without needing prior AWS experience.

After some light documentation perusal, Service Workbench is not difficult for an engineer to install either. The client I was working with had a number of customisations they were interested in, though, which made for a fun and challenging project!

One of those customisations was the use of Session Manager rather than SSH, for access to any workspaces launched through Service Workbench. Support for Session Manager isn't built-in (yet!) but with a bit of tinkering around the edges, it's possible to retrofit it.

It's worth noting that setting up Session Manager is more complex than SSH, especially given the latter is pretty much just there. But depending on your security posture, the tradeoff can be well worth it. By 'outsourcing' your authentication to AWS, you're:

  • avoiding the need to open ports and manage SSH keys (though EC2 Instance Connect - which Service Workbench uses - does a good job at solving key management), and
  • you get the bonus of logged and auditable sessions.

Since Session Manager was launched, I find myself rarely using SSH anymore.

Federating identities

Although Session Manager is a bit more work to set up, we of course want to make it as easy as possible for the end user (and for engineers to maintain going forward!), so federated identities are a must. This allows us to use the same identity for logging in to the Service Workbench UI (which, naturally, utilises Cognito) as for connecting via Session Manager (either using the AWS CLI or through the console).

And, crucially, using federated identities allows us to ensure that the user who launches a workspace is the same user who can 'Session Manager' to it. (Really not sure how to verb that - 'SM to a server' just doesn't have the same ring to it as 'SSH' does, right?)

From here we need a couple of pre-requisites:

  • a suitable IAM role - let's call it Okta-ServiceWorkbenchUser - for end users to assume;
  • that role needs to be set up in your identity provider (in my case that's Okta, but you can do this with any suitable provider);
  • Service Workbench will need to be installed - and the same identity provider connected up to the Cognito user pool that is automatically created; and
  • an end user!

End users will need to have the same username in the identity provider as they have in Service Workbench (this will be important shortly), and the identity provider will need to send through the 'username' as the IAM role session name when federating to AWS (Okta's AWS Account Federation integration does this by default).

Okta's AWS Application Assignment dialog The Application Assignment dialog for Okta's AWS Account Federation integration. The username influences the external ID which becomes the session name, and the role to use is in this case set via group membership.

Tags and policies

Over in the Service Workbench UI, once I've launched a workspace, the instance runs with the CreatedBy tag set to my Service Workbench username, tim.malone. This is the key for determining that the user who started a workspace in Service Workbench, is the same user who wants to use Session Manager - without having to make any code changes in Service Workbench itself.

Now what we're really here for are the IAM policies that tie this all together, right? The key here is the aws:userid moniker, which is available as both an IAM policy variable and condition key. We're primarily going to use it as a policy variable - but we can also utilise the condition key context to help us build a holistic solution.

There is a gotcha, however (I told you this was more complex). If the aws:userid policy variable was set directly to the session name (remember, that'll be our user's username), we could attach a policy statement like this to our role:

json
{
"Sid": "NonWorkingExampleDontUseThis",
"Effect": "Allow",
"Action": "ssm:StartSession",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringEquals": {
"ssm:resourceTag/CreatedBy": "${aws:userid}"
}
}
}

(As outlined in the Systems Manager docs, we can check any tag assigned to the instance using ssm:resourceTag).

So far, so good. But - that policy variable is going to trip us up. According to the IAM docs, it will return the role ID as well as the username, in the format role-id:username! This is no good, because Service Workbench has no idea what role we're going to use, and therefore no way to have included that role's ID in the CreatedBy tag.

Now I said we're not going to be modifying Service Workbench code. While doing so might be simple enough, it'll set up future maintenance issues for whichever engineer gets tasked with upgrading it (AWS are continually iterating on and releasing new versions of the solution). We also can't do anything too fancy in an IAM policy, such as matching against a substring of a policy variable (though that would make a great feature request...).

However, we do know that Service Workbench is going to launch instances, and that gives us an opportunity to hook into its workflow while remaining completely indepdendent.

Enter CloudTrail, EventBridge, and - you guessed it - Lambda. With a small amount of 'Lambda glue', we can turn the CreatedBy tag into our own custom AwsUserId tag:

python
import boto3
role_name = 'Okta-ServiceWorkbenchUser' # Update this with your role's name.
def handler(event, context):
ec2 = boto3.client('ec2')
iam = boto3.client('iam')
instance = event['detail']['responseElements']['instancesSet']['items'][0]
instance_id = instance['instanceId']
instance_data = ec2.describe_instances(
InstanceIds=[instance_id]
)['Reservations'][0]['Instances'][0]
instance_tags = instance_data.get('Tags', [])
role_data = iam.get_role(RoleName=role_name)['Role']
role_id = role_data['RoleId']
username = None
for tag in instance_tags:
if tag['Key'] == 'CreatedBy':
username = tag['Value']
if username is None:
return
aws_user_id = f'{role_id}:{username}'
extra_instance_tags = [{
'Key': 'AwsUserId',
'Value': aws_user_id,
}]
result = ec2.create_tags(Resources=[instance_id], Tags=extra_instance_tags)
print(result)

The role on this Lambda function will need a policy statement like the following:

json
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags",
"ec2:DescribeInstances",
"iam:GetRole"
],
"Resource": "*"
}

And finally, the function should be hooked up to an EventBridge rule that looks like this:

json
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["ec2.amazonaws.com"],
"eventName": ["RunInstances"]
}
}

Within a few seconds or so of Service Workbench starting a new instance, we should now end up with our own AwsUserId tag, that will exactly match the value of the ${aws:userid} IAM policy variable. Switch CreatedBy for AwsUserId in the sample IAM policy statement above, and our user should be able to start a Session Manager session to any instance they - and only they - create (of course, it's very key here that our users' role isn't given any permissions itself to ec2:CreateTags - nor anything that will allow it to modify that Lambda function).

$ aws ssm start-session --instance-id i-0f123456aa432109b
Starting session with SessionId: tim.malone-0ab1cd9d876e54321
sh-4.2$ sudo su ec2-user
[ec2-user@ip-10-0-6-136 bin]$

Going further

Keen readers will note that starting a Session Manager session only gets us so far. What if the user also wants to:

  • start a session via the AWS console
  • terminate sessions
  • resume 'paused' sessions
  • 'SCP' data to or from the instance

These all require a little more digging, but they're largely solvable problems. For example, to be able to start a session via the console, the role we're giving users access to will also need permissions to actions such as ec2:DescribeInstances and ssm:DescribeInstanceInformation to go via the EC2 console, or ssm:DescribeSessions to go via the Systems Manager console (you could alternatively expand these to ec2:Describe*, ssm:Describe*, and ssm:GetConnectionStatus to give a nicer user experience with what should be a completely 'error-free' console).

Terminating or resuming sessions, though, requires some further work in our Lambda function. Once a session is started, its ID is prefixed with the user's username. But again, because aws:userid doesn't exactly match that username, we need some glue. Adding a policy like this - in our Lambda function, using eg. iam.put_role_policy() - would allow our user to terminate or resume sessions that only they had started:

python
# This is for use in Python - not valid JSON ;)
{
"Effect": "Allow",
"Action": [
"ssm:ResumeSession",
"ssm:TerminateSession"
],
"Resource": f"arn:aws:ssm:*:*:session/{username}-*",
"Condition": {
"StringEquals": {
"aws:userid": f"{aws_user_id}"
}
}
}

(This time we're making use of aws:userid as a condition key, rather than a policy variable).

This is far from a perfect solution, though. For one, this role is long lived - and if you're dynamically adding a policy, you're gonna need to remove it at some point (another Lambda on instance termination, perhaps?). But there's a bigger issue: roles can only have 10 (up to 20 on request) managed policies attached to them, plus a limited number of characters across inline policies. At a certain number of users - or perhaps users-currently-running-a-workspace, if the policies are removed on termination - new users will no longer be able to connect via Session Manager.

Do you have a better way to utilise an imperfect value for aws:userid in IAM polices? Or perhaps another way to integrate Session Manager with Service Workbench that does allow for scale? Get in touch!

For more information on Service Workbench, see: