Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds SAML SP #3

Merged
merged 6 commits into from
Apr 5, 2024
Merged

Adds SAML SP #3

merged 6 commits into from
Apr 5, 2024

Conversation

JPrevost
Copy link
Member

@JPrevost JPrevost commented Jan 24, 2024

Why are these changes being introduced:

  • A SAML SP is method we use to interact with MIT Touchstone authentication

Relevant ticket(s):

How does this address that need:

  • Borrowing heavily from our previous application bd_auth, this implements a SAML SP via python3-saml
  • a new make command make app-bash was created to streamline local development of this application while avoiding a problematic dependency
  • To test these changes locally
    • should create an SP cert/key pair and extract them to single lines with no spaces and set in your local .env file (see README for the env values to set these to)
    • start the local IdP (see README)
    • start a local shell for the app with make app-bash
    • start the local dev server in the shell from the previous state with make run-dev
    • access file that should allow for redirects such as http://localhost:5000/?cdn_resource=https://cdn.libraries.mit.edu/fakefile. You should be prompted to authenticate (credentials in README) and then redirect to a "file not found" page for the CDN. You cannot access actually restricted files in the CDN when running locally like this as we pass JWT tokens as domain cookies which cannot be read. You could set a valid domain cookie by modifying /etc/hosts and running your local dev environment on something like totallyrealserver.libraries.mit.edu. You'd need to pull the appropriate JWT secret from SSM.
    • See the tests for other URL patterns you may want to try (like leaving out the ?cdn_resource query param or setting it to a value that we won't redirect to for security reasons, etc)

What is missing:

  • unit tests for SAML interactions. We should add unit tests for the auth.py module, but for now we need to manually test in our staging environment before moving code to production. As unit tests won't test against live Touchstone, this manual test will remain necessary even after we have automated tests in place.

Document any side effects to this change:

  • Development is now expected to be in docker as libxmlsec1 dependency is complicated (at best) to get installed reliably locally (the upstream libraries are slowly working through the bugs and hope to have fixes in April 2024 which should allow us to move back to non-dockerized development if preferred)
  • mypy linter was removed as we aren't using types in this application

Why are these changes being introduced:

* A SAML SP is method we use to interact with MIT Touchstone
  authentication

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-121

How does this address that need:

* Borrowing heavily from our previous application `bd_auth`, this
  implements a SAML SP via `python3-saml`
* a new make command `make app-bash` was created to streamline local
  development of this application while avoiding a problematic
  dependency

What is missing:

* unit tests. We need to set this app aside for a bit, but want to push
  forward with the Touchstone registration so we will eliminate that
  delay once we return to this. We should add unit tests to this work
  when we move forward with implementing the business logic.

Document any side effects to this change:

* Development is now expected to be in docker as libxmlsec1 dependency
  is complicated (at best) to get installed reliably locally
* mypy linter was removed as we aren't using types in this application
Why are these changes being introduced:

* Users directed to this application for authentication will be
  redirected to Touchstone and then be granted a JWT token and
  redirected back to the originally requested resource

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-122
* https://mitlibraries.atlassian.net/browse/GDT-101

How does this address that need:

* Accessing cdn auth with no cdn_resource parameter or a cdn_resource
  parameter that is not in one of our CDNs is rejected as invalid
* A user with an invalid, expired, or missing JWT token is redirected
  to Touchstone
* Valid users coming back from Touchstone will have a valid JWT token
  set as a domain cookie
* Users with a valid JWT token and valid cdn_resource are redirected to
  the resource (which is then intercepted by the cdn auth lambda to
  validate the token before sending them the requested file)
* A preconfigued local IdP can be run in a container for development
  purposes (only SP key/cert need to be generated manually and set in
  .env prior to running `make app-bash`)

Document any side effects to this change:

* Adding tests for the SAML reponse handling are still required, but
  tests exist for all of the core business logic. This has been opened
  as https://mitlibraries.atlassian.net/browse/GDT-250. For now, it is
  expected that we will manually confirm SAML functionality in our
  staging environment prior to moving changes into production.
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think it looks good! Easy to follow and poke around. Also like the blueprints approach.

I left a few comments: a couple of format-y nit picks, some notes on getting it up and running locally for me, and a couple suggestsions of docstrings. But all optional, nothing blocking.

I see in the PR comments that typing is not applied to this repository. If we did want to go back and add typing, that could also be a good time to drop in a couple docstrings.

- Run tests: `make test`
- Run linters: `make lint`
- Run dev server: `make run-dev`
- access localhost:5000 and localhost:5000/ping
- (optional) run prod-mode server locally: `make run-prod`
- access localhost:8000 and localhost:8000/ping

## Required ENV
Copy link

@ghukill ghukill Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit that this markdown renders a little funky (github rendered form). Took me awhile to build all the correct env vars as the newlines appear to be dropped. But seeing it here in the code view, I see the intent and how it would have been easier to discern.

That said, had all the info I needed to get it up and running.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With new render, easy to parse!

- pipenv in your python 3.12 install: `pip install pipenv`
- note: failing to do this may not result in a noticeable problem, but this is still good practice as it will ensure you are using a `pipenv` within the correct version of python instead of one from your default version. Do this every time you install a new python version.
- note: a python virtual environment will be created via `pipenv` when following the next steps if one does not yet exist. If you run into issues at any point, you can remove that virtualenv with `pipenv --rm` from the project root and start over.
This application expects to be developed in docker. [There are dependency issues with libxmlxec1 on macos](https://github.com/SAML-Toolkits/python3-saml/issues/356) at this time and rather than each of us fighting with that, we will develop inside the docker container so it can be solved programatically and repeatably.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I read this and the make app-bash approach, I struggled to get things installed as well. Ultimately I did get it installed, where this github issue workaround worked well for me.

Related, I also needed to pip install setuptools, as it sounds like disutils does not ship with python 3.12+.

With those two things done, make install worked as expected, and I'm able to run make run-dev and fire up the flask app locally (outside of a docker container).

All that said, I think the docker approach also works well! Nice to have options for developer's choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad (sort of) you were able to get it running locally in addition to trying out the docker approach. The upstream requirement issues are starting to resolve themselves and this should be much more easily installable locally soon(TM).

README.md Outdated
Comment on lines 94 to 96
`SP_CERT` = obtained from self signed cert generated for this app. Note: remove all spaces/linebreaks as well as the "BEGIN" and "END" lines from file for ENV setting.
`SP_ENTITY_ID` = domain name of app + /saml
`SP_KEY` = obtained from self signed key generated for this app
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that SP_CERT and SP_KEY were multiline after I ran through the -- very helpful -- instructions in page linked above. I realized I wasn't immediately sure how to set a multiline env var in a .env file. Probably better ways to do it, but I popped them into python as a multiline string, and then could access the string variable which included the \n characters, which allowed me to set them like:

SP_CERT="-----BEGIN CERTIFICATE-----\nabc123...\ndef456...\n-----END CERTIFICATE-----"
SP_KEY="-----BEGIN PRIVATE KEY-----\nabc123...\ndef456...\n-----END PRIVATE KEY-----"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should likely be clarified better in the docs... I just manually remove the line breaks to create a single line string. I feel like a make stringify_cert_or_key command that can spit out what works from an input key or cert would be useful. I've just been too lazy to write one.

Comment on lines +9 to +12
@bp.route("/")
@login_required
def item():
return render_template("debug.html")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dig it! This is really handy to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an interesting side effect the way this works. The cookie setting only happens in the cdn module instead of the auth module so while we can debug the session like this, we can't actually debug the cookies because they don't get set. I think eventually we should see if we can move the cookie setting logic into auth. That added enough complexity to not include that refactor in this initial version because you set cookies as part of the response which is not the domain if auth so I left it alone for now.

return json_settings


def prepare_flask_request(request):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feeling like maybe a docstring here would be helpful. I had to poke around a bit to understand that the variable req which is set from the output of this function (around line 76), is a dictionary in the expected form that the python3-saml class OneLogin_Saml2_Auth expects for init.

Otherwise, I wondered a bit why we'd collect things from the request object that are readily available.

cdnauth/auth.py Outdated
Comment on lines 101 to 102
print("Errors: %s", errors)
print("Last error reason: %s", auth.get_last_error_reason())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if it's something we want to tackle now, but we might want to consider moving these to logging statements. Perhaps the overhead here is setting up logging globally for the full Flask app.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god I have no idea why these are print statements. I suspect I was debugging and forgot to clean up after I got it all working. Oops :)

cdnauth/cdn.py Outdated
Comment on lines 48 to 54
# valid_redirect(url) ensures the redirect is to a domain we trust to prevent
# our app being used to trick users into clicking on malicious links
def valid_redirect(url):
domain = urlparse(url).netloc
if domain in valid_domains():
return True
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: these could be """....""" docstrings for the function. Same below for valid_domains().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good input. I just read the PEP on that and agree it makes sense to move this into the method as a docstring . It still feels backwards to document inside the method instead of before the method as Python is not how I think :)

cdnauth/cdn.py Outdated
Comment on lines 57 to 63
# valid_domains() is a list of domains we trust for redirect purposes
def valid_domains():
return [
"cdn.libraries.mit.edu",
"cdn.dev1.mitlibrary.net",
"cdn.stage.mitlibrary.net",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might propose setting this as a constant at the top of the file, something along the lines of:

VALID_DOMAINS = [...]

Or, even as a property on the Config class (probably only if accessed from multiple files/modules).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like moving it to Config so we could set it but allow overriding via ENV. It's actually convenient in development to allow redirects to example.com (or similar) to allow focusing on business logic and not actually invoking a CDN call.

app.config.from_object("cdnauth.config.TestingConfig")
else:
app.config.from_object("cdnauth.config.Config")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting! Never seen app.config.from_object(...). But I dig it. Seems like it streamlines updating an object with worrying about getter/setter considerations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I copied that from our old bd_auth app (which I also wrote but don't remember doing) and I suspect strongly that it copied it from ebooks https://github.com/MITLibraries/ebooks/blob/main/ebooks/__init__.py#L18-L23. It's convenient that the various configs inherit from the base one so you can read everything from env by default but in testing just hard code the values you want to be consistent in a fairly clean way. You only have to set the differences from the base is another way to say that I guess.

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="refresh" content="5; url={{ cdn_resource }}" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a silly question, but does this <meta> tag force a refresh of the page, with that CDN link as the URL, thereby prompting a download? If so, cool!

Also, for people like me unfamiliar with this behavior, might be nice to have a small HTML comment above this that notes this line should force a page refresh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good input. I think everyone in EngX would immediately understand what this does because we live on the HTML side so much more than DataEng. I'll see if I can find a way to better document that (possibly with a comment to link to external docs that explain it clearly). For now it has a 5 second delay before the refresh... once we launch we'll likely set this to an immediate redirect

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. And, while unsure if necessary in code, thanks for the context about this template page serving as a way to navigate away from the Touchstone page; makes sense.

Why are these changes being introduced:

* Code Review feedback

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/GDT-122

How does this address that need:

* Adds docstrings to complex methods
* Adds useful logging, including some errors
* Fixes formating of README
* Moves list of `VALID_DOMAINS` to an ENV overridable config option

Document any side effects to this change:

* There are two known states we don't handle well still. They are noted
  by `TODO`s in the code (not a great practice) and will receive
  a Jira ticket to consider how best to handle these possible but
  unlikely states.
@JPrevost JPrevost requested a review from ghukill April 4, 2024 20:33
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thanks for the comments and docstrings, I think helpful for a quick footing jumping in.

My only question -- and non blocking -- was around the /saml route and whether or not pass + default return of None are sufficient. But given testing, most likely is!

- Run tests: `make test`
- Run linters: `make lint`
- Run dev server: `make run-dev`
- access localhost:5000 and localhost:5000/ping
- (optional) run prod-mode server locally: `make run-prod`
- access localhost:8000 and localhost:8000/ping

## Required ENV
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With new render, easy to parse!

Comment on lines +93 to +97
- `SP_CERT` = obtained from self signed cert generated for this app.
- Note: remove all spaces/linebreaks as well as the "BEGIN" and "END" lines from file for ENV setting.
- `SP_ENTITY_ID` = domain name of app + /saml
- `SP_KEY` = obtained from self signed key generated for this app
- Note: remove all spaces/linebreaks as well as the "BEGIN" and "END" lines from file for ENV setting.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these notes.



def prepare_flask_request(request):
"""Return a dictionary in a format that OneLogin_Saml2_Auth uses during init"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to docstring

Comment on lines +83 to +94
"""This route handles two important situations.

The first, "sso" path, redirects a user to the IdP and sets a value of where to redirect
the user to after they come back to this application. "next_page" is the resource in our
CDN they originally requested but were redirected to this auth app because they were not
authenticated. Keeping track of what they wanted is essential to the functionality of
this solution.

The second, "acs" path is handles the response back from the IDP. We should never encounter
the path with "acs" in which a user is not authenticated, as they would not be redirected
back from the IDP to us and not be authenticated.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super helpful, thanks for adding.

# That said, this should never happen.
# TODO: return a useful error to the user
logging.error("User returned from IdP with no authentication")
logging.error(f"Errors: {errors}")
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this on the first review (and same for IdP response parsing error below): are we okay with a pass for these conditionals, where this route will then effectively return None by default?

I'm assuming a human wouldn't interact with this route, just the authentication handshaking... I think if a Flask route returns None, the actual HTTP response is a 204 - no content response. Assuming that's okay for whatever is interacting with this route?

If not, we could put a fall through return at the end of the route function with a more explicit response; though unsure if that's helpful or if we know what that response should be (e.g. 500 error + explanation for why).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (sort of) noted that concern myself in the latest commit message. I'll open a ticket before merging this work and link it back to this discussion thread to address this. Your suggestion to maybe fallback to some sort of error page if the more specific expected responses aren't found makes a ton of sense. I stared at it for a bit and wasn't sure how I wanted to handle it other than wanting to handle it better, so I appreciate that option being shared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +23 to +26
VALID_DOMAINS = os.getenv(
"VALID_DOMAINS",
default="cdn.libraries.mit.edu,cdn.dev1.mitlibrary.net,cdn.stage.mitlibrary.net",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - seeing the associated README docs for this env var as well.

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="refresh" content="5; url={{ cdn_resource }}" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. And, while unsure if necessary in code, thanks for the context about this template page serving as a way to navigate away from the Touchstone page; makes sense.

@JPrevost JPrevost merged commit 764c831 into main Apr 5, 2024
1 check passed
@JPrevost JPrevost deleted the GDT-121-saml-sp branch April 5, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants