Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: LibraryCollectionKey and LibraryCollectionLocator created [FC-0062] #335

Merged

Conversation

ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented Aug 27, 2024

Description

Creates LibCollectionKey and LibCollectionLocator

Supporting information

Testing instructions

  • Check all the code
  • Verify that the test covers the changes

@openedx-webhooks
Copy link

Thanks for the pull request, @ChrisChV!

What's next?

Please work through the following steps to get your changes ready for engineering review:

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

🔘 Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently maintained by @openedx/axim-engineering. Tag them in a comment and let them know that your changes are ready for review.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 27, 2024
@ChrisChV ChrisChV changed the title feat: CollectionKey and CollectionLocator created feat: CollectionKey and CollectionLocator created [FC-0062] Aug 27, 2024
Copy link

codecov bot commented Aug 27, 2024

Codecov Report

Attention: Patch coverage is 94.73684% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.98%. Comparing base (80ddb73) to head (780e4b2).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
opaque_keys/edx/keys.py 75.00% 1 Missing and 1 partial ⚠️
opaque_keys/edx/locator.py 91.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #335      +/-   ##
==========================================
+ Coverage   93.96%   93.98%   +0.01%     
==========================================
  Files          29       30       +1     
  Lines        2850     2924      +74     
  Branches      640      649       +9     
==========================================
+ Hits         2678     2748      +70     
- Misses        145      148       +3     
- Partials       27       28       +1     
Flag Coverage Δ
unittests 93.98% <94.73%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1620,3 +1621,55 @@ def html_id(self) -> str:
# HTML5 allows ID values to contain any characters at all other than spaces.
# These key types don't allow spaces either, so no transform is needed.
return str(self)


class CollectionLocator(CheckFieldMixin, CollectionKey):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is specific to libraries, it should be called LibCollectionLocator

@@ -89,6 +89,36 @@ def make_asset_key(self, asset_type: str, path: str) -> AssetKey: # pragma: no
raise NotImplementedError()


class CollectionKey(LearningContextKey):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is specific to libraries, it should be called LibCollectionKey

@bradenmacdonald
Copy link
Contributor

@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?

The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.

@ormsbee
Copy link
Contributor

ormsbee commented Aug 27, 2024

@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?

@bradenmacdonald: This is the Slack thread where I asked the question about whether we really needed them (you, @kdmccormick, and @yusuf-musleh commented).

It boils down to the fact that OpaqueKey machinery is weird and complex for what it does (even explaining to someone why there's a CollectionKey and a CollectionLocator is weird), so do we really need it for this feature?

Also, it's plausible that we'd want to change a library's key at some point, which is more feasible to do now since we're not storing as many references to it. Keys in openedx_learning app models are almost all relative to the LearningPackage you're in. The only top level one that's stored is the library key on the LearningPackage itself (and I'm not sure even that's desirable, since we store it again in edx-platform side).

The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.

I was under the impression that any sort of identifier would do, and you mention in the thread that UUIDs would work as well. Is it important that they be humanly readable? If it's just backend machinery at the moment, I'd prefer to use UUIDs for now. Or even the primary key of the thing being tagged if it's easy–since that would be the most compact representation. I know some other tagging libraries actually make a model-per-thing-being-tagged.

In the longer term, I think that tagging functionality will be simpler and easier to adopt into other systems if it does not force a requirement for OpaqueKeys (e.g. forums posts/comments, catalog information, students, etc.). I would expect that openedx_tagging eventually moves into its own, top level repo.

All that being said, I don't know how much work any of this is. If making it more general threatens the Sumac timeline, I'm okay with rolling forward with a CollectionKey for now.

FYI @kdmccormick

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Aug 27, 2024

@ormsbee We can of course use UUIDs for the things that are being tagged. I was thinking that if we use primary keys, or perhaps even if we want to use UUIDs, we'd want some kind of namespacing so we have an idea of what sort of thing is being tagged and/or can avoid PK collisions. And if we did that ("lib-collection:23489" as an example), then whatever idiosyncratic namespacing we implement is not that different from opaque keys. So it would be weird to implement a second type of string key namespacing, when we already have a library for it (although the key vs. locator thing is weird, and it would be nice to get rid of it).

However, I guess if we're strictly using UUIDs, then we don't technically need namespacing at all. It may be hard to debug what thing is actually tagged though, since we'll just see a random UUID and have no idea what type of thing it is. Thoughts?

@ormsbee
Copy link
Contributor

ormsbee commented Aug 27, 2024

Oh right, I'm forgetting that there's a single global tagging table and that we need to namespace. Would it be terrible if we stored UUIDs and had a 1:1 join table that matched those rows with PublishableEntity? Or two tables, one to match against Published and the other Draft? (Sorry, I don't remember how we're distinguishing between those two in the tagging.)

@bradenmacdonald
Copy link
Contributor

Here is the current ID field; it's just a string ID. Tagging is not part of the publishing cycle so doesn't distinguish between published and draft things.

We want to associate that ObjectTag model with some Collection instances, which are themselves not PublishableEntities. So I think we just need some way of putting the Collection's primary key into the ObjectTag.object_id string field. We could also add an additional generic foreign key to ObjectTag but that's getting complicated. Since Collection is also in the same overall package as tagging, we could also create a join table specifically for Collection-ObjectTag, but I'm not sure what benefit that would offer.

@ormsbee
Copy link
Contributor

ormsbee commented Aug 27, 2024

@bradenmacdonald: Please pardon the silly questions stemming from my ignorance of the data model, and my brain being stuck on components and not collections.

If the same tag from the same taxonomy applies to three totally different types of things (e.g. a component, a collection, and a forum comment), does the data model have any way to differentiate them at the database layer? Or does it rely purely on the object_id (and consequently OpaqueKeys) to do the dispatch between them?

Edit: My sense is "no, there isn't", but while I don't think it's good to roll our own opaque-keys-lite and embed that knowledge into the object_id itself, I do think it makes sense for the data model to know there are different types of things being tagged at the database layer and to namespace them with a different column that's a foreign keys to the types of things that exist. Otherwise, it seems like basic filtering by type would be way harder than it should be?

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Aug 27, 2024

@ormsbee

Or does it rely purely on the object_id (and consequently OpaqueKeys) to do the dispatch between them?

That.

it seems like basic filtering by type would be way harder than it should be?

We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id (an indexed column), which means it's already extremely efficient even without a dedicated column.

We could add another column that's a foreign key to the contenttype of the thing that's being tagged, certainly - but we'd have to carefully review the code in various places which assumes object_id is the entire reference to the thing being tagged.

@ChrisChV ChrisChV force-pushed the chris/FAL-3785-collections-key branch from a16a3ee to 98ceab5 Compare August 27, 2024 20:29
@ormsbee
Copy link
Contributor

ormsbee commented Aug 27, 2024

Like I said before, I don't want to jeopardize the timeline over this, and I'm not going to block the merging of new key types. As a general rule, I would like more of our data model to be explicitly structured at the database layer, and not rely on Python code to introspect things (or having to know that key type X always starts with one of these two strings). I realize that to a certain extent we have to do that, because so much of our content is still in the ModuleStore and serialized to MongoDB, but I don't want it to be our long term direction.

@ChrisChV ChrisChV changed the title feat: CollectionKey and CollectionLocator created [FC-0062] feat: LibCollectionKey and LibCollectionLocator created [FC-0062] Aug 28, 2024
@mphilbrick211 mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Aug 28, 2024
@rpenido
Copy link

rpenido commented Aug 28, 2024

We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id (an indexed column), which means it's already extremely efficient even without a dedicated column.

Also, having some kind of hierarchy (which the OpaqueKey has) could be handy. If we nuke a Library with all the Components and Collections, we can use the object_id to clear all tags without iterating the library's children. It also makes it easier to find orphaned items.

Copy link

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Thank you for your work, @ChrisChV!

I only have a little to add here as I don't have much context with the OpaqueKeys.

CHECKED_INIT = False

# Allow usage IDs to contian unicode characters
USAGE_ID_REGEXP = re.compile(r'^[\w\-.]+$', flags=re.UNICODE)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the collection id (a number), but I'm unsure if we want to make it stricter here.

Maybe for clarity? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it's the ID? As in the primary key? We can't use that as part of the key if there's ever an intention to make it importable into a different instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we don't have any other identifier for collections; we'll need to add either a UUID or a slug (or both). If a slug, it only needs to be unique within a library. I prefer slugs for readability. For now the slug can be auto-generated by slugifying the collection name and appending a number until it's unique.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a slug is fine, as long as we write it to the database–say, a key field on Collection with a unique constraint on the combo of that and the LearningPackage (like how PublishableEntities work today).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, let's do that ^ @rpenido . Sorry for not realizing the need for a slug sooner. CC @pomegranited

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Hopefully these are easy to resolve. Other than these and Dave's point about having a slug, looks great.

@@ -89,6 +89,36 @@ def make_asset_key(self, asset_type: str, path: str) -> AssetKey: # pragma: no
raise NotImplementedError()


class LibCollectionKey(LearningContextKey):
Copy link
Member

@kdmccormick kdmccormick Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class LibCollectionKey(LearningContextKey):
class LibraryCollectionKey(OpaqueKey):
  1. My sense is that a collection is not a learning context. More concretely, I can't imagine us wanting to store learner state against a collection, the way that we store learner state against a CourseKey or (for previewing purposes) a LibraryKey. Would it cause any issue to simply subclass OpaqueKey directly?
  2. For the sake of the API, let's be consistent between "Lib" and "Library". We have lib: in the serialized keys, so I like that you to serialize with lib-collection:. But, in the code, we spell out LibraryLocatorV2, so I'd prefer if you spelled out LibraryCollectionKey and LibraryCollectionLocator as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that a collection is not a learning context and it should not subclass LearningContextKey. However, it can have a library_key field if that makes sense, since it does related to a library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald Sure, having a library_key field makes sense.

@ChrisChV Is there any reason we need both a field LibraryCollectionLocator.lib_key and an identical property LibraryCollectionLocator.library_key ? Could there just be a field named LibraryCollectionLocator.library_key ? Or does that play poorly with inheritance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined both fields here: ce74081

Comment on lines 114 to 119
@property
def context_key(self) -> LearningContextKey: # pragma: no cover
"""
Get the learning context key (LearningContextKey) for this XBlock usage.
"""
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that a library is the "collection" of the context--I think that's mixing metaphors. Since we're making the assumption that a collection always belongs to a library, I would just be concrete and use the lib_key field that's already defined on your locator below. And, going along with my other request, I would spell it out as library_key.

Suggested change
@property
def context_key(self) -> LearningContextKey: # pragma: no cover
"""
Get the learning context key (LearningContextKey) for this XBlock usage.
"""
raise NotImplementedError()
@property
def library_key(self) -> LibraryLocatorV2: # pragma: no cover
"""
Get the key of the library that this collection belongs to.
"""
raise NotImplementedError()

@ChrisChV ChrisChV changed the title feat: LibCollectionKey and LibCollectionLocator created [FC-0062] feat: LibraryCollectionKey and LibraryCollectionLocator created [FC-0062] Aug 30, 2024
@ChrisChV
Copy link
Contributor Author

@kdmccormick Thanks for the review! I updated the PR with your suggestions

opaque_keys/edx/keys.py Outdated Show resolved Hide resolved
Copy link

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisChV Could you do a version bump and add a changelog entry?
Otherwise this looks great 👍

opaque_keys/edx/locator.py Outdated Show resolved Hide resolved
opaque_keys/edx/locator.py Outdated Show resolved Hide resolved
opaque_keys/edx/locator.py Outdated Show resolved Hide resolved
@ChrisChV ChrisChV force-pushed the chris/FAL-3785-collections-key branch from 168c373 to f2a904f Compare August 31, 2024 04:57
@bradenmacdonald
Copy link
Contributor

I think we just need to address @kdmccormick 's question about combining LibraryCollectionLocator.lib_key and LibraryCollectionLocator.library_key, and then this is good to go from my perspective.

Please also include a version bump so it'll create a new package version.

@ChrisChV
Copy link
Contributor Author

ChrisChV commented Sep 3, 2024

Please also include a version bump so it'll create a new package version.

@bradenmacdonald Done!

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change request, but LGTM otherwise!

opaque_keys/edx/locator.py Outdated Show resolved Hide resolved
opaque_keys/edx/locator.py Outdated Show resolved Hide resolved
@ChrisChV
Copy link
Contributor Author

ChrisChV commented Sep 4, 2024

@bradenmacdonald It's ready for merge

@bradenmacdonald bradenmacdonald merged commit ffa2b90 into openedx:master Sep 4, 2024
13 checks passed
@openedx-webhooks
Copy link

@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@bradenmacdonald bradenmacdonald deleted the chris/FAL-3785-collections-key branch September 4, 2024 17:06
@ChrisChV
Copy link
Contributor Author

ChrisChV commented Sep 4, 2024

@bradenmacdonald I see the tag for this version, but the pip package is not updated

@bradenmacdonald
Copy link
Contributor

@ChrisChV It seems like I needed to publish a "Release". I've done that now so you should see it on PyPI soon.

@pomegranited
Copy link

Hi @kdmccormick Sorry for pinging you directly, but we're having trouble bumping this requirement in edx-platform, cf openedx/edx-platform#35383. It doesn't seem to be actually constrained in anyone else's .in files that we can find, but it is in their .txt files.

What's the procedure for updating these fundamental packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants