Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux Server support #7

Open
1 task done
seeker815 opened this issue Nov 27, 2021 · 6 comments
Open
1 task done

Linux Server support #7

seeker815 opened this issue Nov 27, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@seeker815
Copy link

seeker815 commented Nov 27, 2021

Contact Details

What happened?

I have provisioned the infra and added a couple of devices running ubuntu linux servers. Added the device to dynamoDB using Ottr API and the below payload

data = {
"system_name": "d5.xxxx-prod.com",
"common_name": "d5.xxxx-prod.com",
"certificate_authority": "lets_encrypt",
"data_center": "AWS",
"device_model": "linux",
"host_platform": "Ubuntu",
"ip_address": "10.1.60.173",
"os_version": "18.04",
}

Cloudwatch event trigger scans the DynamoDB using the lambda, the output of which is attached to the report. After that there is no trigger of step function and creation of the certificate? Are the network devices the only supported use case or Ottr also manages Ubuntu/Linux certificates?

Version

v0.0.1

Relevant Log Output

timestamp,message
1637999047236,"START RequestId: 27429d6c-6e01-478d-95c2-a92cae8afc2b Version: $LATEST
"
1637999049733,"[INFO]	2021-11-27 07:44:09,733.733Z	27429d6c-6e01-478d-95c2-a92cae8afc2b	Scanned Table: {'Items': [
    {'origin': 'API', 'os_version': '18.04', 'device_model': 'linux', 'ip_address': '10.1.50.139', 'subject_alternative_name': ['d1.xxxx-prod.com'], 'common_name': 'd1.xxxx-prod.com', 'host_platform': 'Ubuntu', 'certificate_expiration': 'None', 'system_name': 'd1.xxx-prod.com', 'certificate_authority': 'lets_encrypt', 'data_center': 'AWS', 'certificate_validation': 'True'}, {
           {'origin': 'API', 'os_version': '18.04', 'device_model': 'linux', 'ip_address': '10.1.60.173', 'subject_alternative_name': ['d5.xxxx-prod.com'], 'common_name': 'd5.xxxx-prod.com', 'host_platform': 'Ubuntu', 'certificate_expiration': '2021-11-29T01:00:00', 'system_name': 'd5.xxxx-prod.com', 'certificate_authority': 'lets_encrypt', 'data_center': 'AWS', 'certificate_validation': 'True'}], 
           'Count': 3, 'ScannedCount': 3, 'ResponseMetadata': {'RequestId': 'RMSH4SHEHI2N9HVSBTSG1F1DM7VV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Sat, 27 Nov 2021 07:44:09 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '1390', 'connection': 'keep-alive', 'x-amzn-requestid': 'RMSH4SHEHI2N9HVSBTSG1F1DM7VV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4200178217'}, 'RetryAttempts': 0}}
"
1637999049733,"[INFO]	2021-11-27 07:44:09,733.733Z	27429d6c-6e01-478d-95c2-a92cae8afc2b	Rotate Certificates: []
"
1637999049761,"END RequestId: 27429d6c-6e01-478d-95c2-a92cae8afc2b
"

Code of Conduct

  • I agree to follow this project's Code of Conduct
@seeker815 seeker815 added the bug Something isn't working label Nov 27, 2021
@seeker815 seeker815 changed the title [Bug]: Linux Server support Nov 27, 2021
@yangkenneth
Copy link
Collaborator

In order for the rotations to start running you need a few components in place:

  1. Assets will need to be added in the DynamoDB database through the API (this seems to be in place for your case).
  2. Routes will need to be configured in the API (https://github.com/airbnb/ottr/blob/main/api/backend/app/config/route.json) and the Lambda Router (https://github.com/airbnb/ottr/blob/main/otter/router/src/config/route.json) to match the metadata from the database (this seems to be in place for your case).
  3. Route53 mappings from your hosted zone xxxx-prod.com to your DNS Subdelegate Zone need to be set. The Subdelegate Zone is the value you set here: https://github.com/airbnb/ottr/blob/main/infra/otter.tf#L3.

More information on DNS Subdelegation here: https://github.com/airbnb/ottr/tree/main/dns

For each host you have in your infrastructure you will need to create a DNS Module like the examples below. The value for alias_domain_name is the same value you set here: https://github.com/airbnb/ottr/blob/main/infra/otter.tf#L3. Also note that your organization must have ownership over the Subdelegate Zone domain.

module "ubuntu_01" {
  source                  = "./modules/dns"
  certificate_common_name = "d1.xxxx-prod.com'"
  alias_domain_name       = "example-acme.com"
}

If you have any hosts that require multiple SANs you can do something like the following:

module "ubuntu_02" {
  source                  = "./modules/dns"
  certificate_common_name = "e1.xxxx-prod.com'"
  subject_alternative_names = [
    "e2.xxxx-prod.com",
    "e3.xxxx-prod.com"
  ]
  alias_domain_name       = "example-acme.com"
}

@seeker815
Copy link
Author

I setup the dns modules after which I see CNAME records mapping _acme-challenge.d1.xxx-prod.com to _acme-challenge.d1.xxxx-acme.com (sub-delegate domain).

The logs under aws/lambda/otter now show scanned table and rotate certificates like so

Rotate Certificates: [{'origin': 'API', 'os_version': '18.04', 'device_model': 'linux', 'ip_address': '10.1.50.139', 'subject_alternative_name': ['d1.xxx-prod.com'], 'common_name': 'd1.xxx-prod.com', 'host_platform': 'Ubuntu', 'certificate_expiration': 'None', 'system_name': 'd1.xxx-prod.com', 'certificate_authority': 'lets_encrypt', 'data_center': 'AWS', 'certificate_validation': 'True'},

I now see step function triggered for each linux device that was scanned and had Route53 mapping. Looking through the step function input and output, while it showed successful, the output of Platform Task Execution step had the following log output and error

{
  "Error": "States.TaskFailed",
  "Cause": "{"Attachments":[{"Details":[{"Name":"subnetId","Value":"subnet-0da64xxxxx"},{"Name":"networkInterfaceId","Value":"eni-090xxxxx"},{"Name":"macAddress","Value":"02:xx:xx:xx"},{"Name":"privateDnsName","Value":"ip-10-1-50-xx.xx.compute.internal"},{"Name":"privateIPv4Address","Value":"10.1.50.74"}],"Id":","Type":"eni"}],"Attributes":[{"Name":"ecs.cpu-architecture","Value":"x86_64"}],"AvailabilityZone":"xxxx-1a","ClusterArn":"axxxx/otter","Connectivity":"CONNECTED","ConnectivityAt":1638112454496,"Containers":[{"ContainerArn":"arn:aws:ecs:xxxx7:container/otter/45xxx","Cpu":"1","GpuIds":[],"Image":"xxx.dkr.ecr.ap-southeast-1.amazonaws.com/otter-linux-aws-ssm:latest","LastStatus":"STOPPED","ManagedAgents":[],"Memory":"512","Name":"otter","NetworkBindings":[],"NetworkInterfaces":[{"AttachmentId":"bfcxxxaa","PrivateIpv4Address":"10.1.50.74"}],"RuntimeId":"45x7","TaskArn":"arn:aws:ecs:ap-xxx-1:265x:task/otter/4596axxx"}],"Cpu":"256","CreatedAt":1638112450933,"DesiredStatus":"STOPPED","EnableExecuteCommand":false,"EphemeralStorage":{"SizeInGiB":20},"ExecutionStoppedAt":1638112463923,"Group":"family:otter-linux-aws-ssm-lets-encrypt","InferenceAccelerators":[],"LastStatus":"STOPPED","LaunchType":"FARGATE","Memory":"2048","Overrides":{"ContainerOverrides":[{"Command":[],"Environment":[{"Name":"SYSTEM_NAME","Value":"d1.xxx-prod.com"},{"Name":"COMMON_NAME","Value":"d1.xxx-prod.com"},{"Name":"VALIDATE_CERTIFICATE","Value":"True"},{"Name":"HOSTED_ZONE_ID","Value":"ZXXXX"},{"Name":"AWS_REGION","Value":"ap-xxx"},{"Name":"DYNAMODB_TABLE","Value":"xxxx"},{"Name":"ACCOUNT_ID","Value":"xxxx"},{"Name":"ACME_DNS","Value":"xxxx-acme.com"},{"Name":"PREFIX","Value":"development"},{"Name":"country","Value":"US"},{"Name":"state","Value":"xx"},{"Name":"locality","Value":"xxx"},{"Name":"email","Value":"xx@xxx.com"},{"Name":"organization","Value":"xxxx, inc"},{"Name":"organization_unit","Value":"Security"}],"EnvironmentFiles":[],"Name":"otter","ResourceRequirements":[]}],"InferenceAcceleratorOverrides":[]},"PlatformVersion":"1.4.0","PullStartedAt":1638112463361,"StartedBy":"AWS Step Functions","StopCode":"TaskFailedToStart","StoppedAt":1638112486550,"StoppedReason":"ResourceInitializationError: **failed to validate logger args: create stream has been retried 1 times: failed to create Cloudwatch log stream:** ResourceNotFoundException: The specified log group does not exist. : exit status 1","StoppingAt":1638112473946,"Tags":[],"TaskArn":"arn:aws:ecs:ap-xxxx:task/otter/4xxxe",
"TaskDefinitionArn":"arn:aws:ecs:ap-xx:2xx:task-definition/otter-linux-aws-ssm-lets-encrypt:6","Version":4}"
}

Can you help debugging the above issue and the other question I had was in case of network devices, the ECS containers on fargate access the devices using the username/password set in AWS Secret Management /ottr/keyxxx, however for ubuntu we don't set any credentials so how does the container access the instance through an instance role or some other way?

Thank you for the quick reply, as the documentation is limited I am trying to read through and troubleshoot as much as possible ?

@yangkenneth
Copy link
Collaborator

yangkenneth commented Nov 28, 2021

Regarding your question for the Linux platforms instead of username and password, it utilizes AWS SSM Agent to run commands on the target system. For Linux distributions the ECS Fargate container has a IAM Role that has permissions to SSM--this means is that SSM Agent must be running on your Ubuntu systems in order for Ottr to work. Amazon Linux 2 has SSM pre-installed but for other distributions you may need to manually install or do it through your configuration management system (https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-manual-agent-install.html).

If you don't want to use the SSM Agent you can create your own implementation and drop it in the platforms directory (https://github.com/airbnb/ottr/tree/main/platforms), the only thing that you would need to change is within the Routes for both the API and Lambda Router you will need to change the ECS Task Definition that you are using: https://github.com/airbnb/ottr/blob/main/api/backend/app/config/route.json#L49

Additional information on creating a new module or implementation would be here: https://github.com/airbnb/ottr/blob/main/docs/CONTRIBUTE.md

The error could be a number of issues, but I assume it is due to the fact SSM isn't installed on your target device. If you need deeper debugging there are CloudWatch Logs for the ECS execution in the Log Group /ecs/otter and you should see a log stream in the format /otter-linux-aws-ssm-lets-encrypt/otter/xxx. If you are still running into issues after installing SSM please let me know and I'll be glad to help look into the logs.

@seeker815
Copy link
Author

seeker815 commented Nov 29, 2021

I have checked for ubuntu instances that were provisioned and 18.04 AMI has SSM agent setup and running. The error in the step Function Platform Task step output had the log as shown earlier, the filtered out reason could be this

"StoppedReason":"ResourceInitializationError: failed to validate logger args: create stream has been retried 1 times: failed to create Cloudwatch log stream: ResourceNotFoundException: The specified log group does not exist. : exit status 

I also checked cloudwatch logs permissions for role/otter-acme-state , subnets (private), security group egress etc, would you be able to suggest what to look at in case of this?

@yangkenneth
Copy link
Collaborator

@seeker815 I rebuilt the environment in a sandbox account and was able to start the execution for the Ubuntu container without any issues. I noticed that you were running in ap-southeast-1 so wanted to confirm that you made the changes in the variables.tf file accordingly. The only CloudWatch Log Group that should be present within your environment should be /ecs/otter, can you check that that is present?

@seeker815
Copy link
Author

@yangkenneth Yes made the change for region in variables.tf, the other coudwatch log groups were created but /ecs/otter was missing. I will go ahead and rebuild the setup if this worked for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants