Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module with Github actions #8

Merged
merged 22 commits into from
Jul 14, 2021
Merged

New module with Github actions #8

merged 22 commits into from
Jul 14, 2021

Conversation

jsirianni
Copy link
Contributor

What does this PR accomplish?

This PR is at the request of @qingling128. This module brings Puppet support for the Google Ops Agent and more (see README.md for details).

Continuous Integration

Github actions will test all agents on Linux and the Ops Agent on Windows.

Screenshot from 2021-06-22 11-29-05

  Service google-cloud-ops-agent.target
     ✔  is expected to be installed
     ✔  is expected to be enabled
     ✔  is expected to be running
  System Package google-cloud-ops-agent
     ✔  is expected to be installed
     ✔  version is expected to match /1.0.4/
  File /etc/google-cloud-ops-agent/config.yaml
     ✔  is expected to exist

A handful of prerequisites are required before CI will finish successfully. I can assist in this area. See test/README.md for specific details.

Joseph Sirianni added 4 commits June 22, 2021 11:07
* replace repo with new puppet module

* update workflows to reflect new branch names
* Add windows ci

* fix branch

* add windows image

* rename because we now support windows

* test windows
* switch from 1.0.4 to 1.0.10 on windows, the agent stop bug is resolved sometime after 1.0.4

* test 1.0.10 in github actions

* now that stopping a service works, re-add custom_config testing

* fix path to source config

* Fix resource name typo

* try running as system user. this command should work but fails in ci. it works under rdp when running 'as administrator'

* revert back to command prompt now that the new agent can stop currectly. powershell syntax keeps failing for some reason

* revert user: SYSTEM. puppet on windows cannot switch users, but I have comfirmed that the ci user should be able to stop services without being elevated

* sleep before restarting the agent to avoid a race condition where we restart during the intiail agent startup (results in a windows error)

* puppet does not like the >NUL param

* fix inspec test now that puppet windows restarts are working

* checksum seems to change on the windows filesystem, test the checksum for install and custom_install, using the hash returned from inspec failures

* correct checksum for install
Copy link
Contributor

@qingling128 qingling128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left some minor documentation / testing related comments. Most of them are not blocking this PR (I tried to make it clear by calling them out).

.github/workflows/linux.yml Outdated Show resolved Hide resolved
.github/scripts/inspec.sh Show resolved Hide resolved
.github/scripts/inspec.sh Show resolved Hide resolved
manifests/agent.pp Show resolved Hide resolved
metadata.json Show resolved Hide resolved
metadata.json Outdated Show resolved Hide resolved

Packer will require access to Google APIs. This [guide provides](https://cloud.google.com/build/docs/building/build-vm-images-with-packer) examples on how to setup your working environment.

### Linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these images built once and hosted somewhere, or built from scratch every time during a test run? I guess for Windows it is probably hosted somewhere, since it's built manually, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit clarifying this. Basically, Packer is used to prepare base images for Linux. The same process is done manually for Windows. Terraform will deploy these "baked" images for Linux and Windows tests every time CI runs. The Windows instances are not left behind to be re-used, they are created on demand.

The project being used for CI should own these images, or have read access to them if they are in a different project.

@@ -0,0 +1,45 @@
# Copyright 2020 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the test files for 1.0.4 maintained by some automation system (aka auto generated), or entered by hand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not generated, created by hand. This is okay for now but more complex CI should probably have some sort of generation process.

@@ -0,0 +1,17 @@
module Fluent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are looking for a sample fluentd config for 3rd party app (In some cases we call it plugin but it's actually not very accurate), here is an example: https://github.com/GoogleCloudPlatform/fluentd-catch-all-config/blob/master/configs/config.d/apache.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joseph Sirianni and others added 12 commits June 30, 2021 14:24
* check source header for apache 2.0 license

* use script to check for invalid headers

* on pr and commit to master

* resolve shellcheck concerns

* fix name

* install go and install addlicense with go install

* fix syntax

* add Google license headers

* Fix linux config checksums, they are different because of the license addition
* Add context to inspec and puppet scripts. Remove project and zone variables from inspec / puppet scripts as they are no longer used.

* Add context to setup.sh

* Add context to ssh.sh

* Add context to terraform wrapper scripts

* project is still required for the action
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
* rename cloud_ops to google_cloud_ops

* update module path to reflect new module name

* use latest checksum for config
Joseph Sirianni and others added 5 commits July 7, 2021 15:38
* try 2.0.0 in ci

* update test cases to support GA 2.0.0 release

* Remove support for pre ga

* update to ga service name

* Remove support for pre ga

* fix windows version typo

* test against new service names

* use new checksum for default config

* use 2.0.0 config

* use correct checksums

* switch to 2.0.1, resolve issue where systemd service is 'not running'

* use 2.0.1 for example version

* use 2.0.1 for example version

* fix config path

* fix config path

* fix checksum due to new path

* fix 2.0.1 service name

* temp disable some runs to save time

* detect service name based on ops agent version

* try 1.0.10 with 2.0.1 upgrade

* Add 1.0.10

* try split builtin

* remove 1.x.x support. test 2.0.1 on windows

* test windows 2.0.1

* enable all
Co-authored-by: Ling Huang <qingling128@users.noreply.github.com>
* pr feedback

* run ci on changes to metadata.json
* use custom config, not plugin

* test logging

* fix checksum for new test config

* re-enable all
Copy link
Contributor

@qingling128 qingling128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qingling128 qingling128 merged commit ff1be47 into GoogleCloudPlatform:master Jul 14, 2021
qingling128 pushed a commit that referenced this pull request Jul 14, 2021
…g Agent and Monitoring Agent, and integrate with Github actions. (#8)
@qingling128
Copy link
Contributor

Just merged this PR.

Looks like some integration tests failed. Might need some investigation.
image

BTW, I just noticed the last commit of the PR did not trigger integration tests:
image

I guess some of the commits that addressed code review feedback introduced a breakage somewhere, while the file is not captured by: https://github.com/GoogleCloudPlatform/puppet-google-logging/blob/ff1be4759a86a4a46c8f2c5b7bf961fce017f0ec/.github/workflows/linux.yml#L5

On the side, we are also looking for Puppet experts inside Google to see if there are any Puppet conventions we need to follow within Google or in general. If there is any feedback there, we can send follow up PRs to address them.

@jsirianni
Copy link
Contributor Author

jsirianni commented Jul 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants