Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the database with the new feature names as they become available. #19

Open
golnazads opened this issue Nov 4, 2024 · 3 comments

Comments

@golnazads
Copy link
Contributor

golnazads commented Nov 4, 2024

As new feature names become available, users will extract USGS terms and execute a pipeline command to check for any names that need to be added to the database. The CSV file used in this process must contain the following six columns: Feature_ID, Clean_Feature_Name, Target, Feature_Type, Approval_Date, and Approval_Status.

1- The pipeline reads this file and only considers rows where Approval_Status is set to "Approved." It then compares each Feature_ID with existing entities in the database to identify which rows should be newly added.

2- When adding a feature name, the pipeline first checks for any new targets associated with these names and inserts those targets if they don’t already exist. Next, it verifies if the feature type is present for the specified target, adding it if it’s missing. The pipeline also updates a separate table containing unique feature names without links to targets or feature types. If a feature name has not been used before, it is inserted into this table; if it has been used for another celestial body, it will not be added again.

3- Additional checks are in place for multi-word and ambiguous feature names. Specifically, the pipeline manages cases where one word in a multi-word name may represent another feature name (e.g., “C Herschel,” “C Herschel C”). It also identifies cases where a feature name is used for multiple celestial bodies (e.g., “Herschel Crater” on Mars, the Moon, and Mimas). The pipeline will update the database for these names as needed. However, if a name has other contextual uses (e.g., “Herschel” also refers to an asteroid), the pipeline will generate a log message. In this case, a power user will need to review and identify any additional contexts. The pipeline includes a list of new feature names for the power user to check. Once any new context is identified, the developer should be informed so they can add it to the database.

4- Update repo with the latest USGS feature names.

@golnazads
Copy link
Contributor Author

#1 and #2 completed.

@golnazads
Copy link
Contributor Author

#3 and #4 completed.

@golnazads
Copy link
Contributor Author

While importing the latest USGS feature names extracted on November 7, 2024, I noticed that the approval status column, in addition to Approved and Not Approved, also includes Dropped. As of the above date, there are 199 feature names that were marked as dropped. The approval dates range from 1958 to 2015, with three instances missing a date. There is no information about when these feature names were dropped, so it is unclear how long they were included. The top four categories of dropped features types are as follows, with the corresponding number of features names in parentheses: Satellite Feature (50), Crater, craters (37), Patera, paterae (20), and Rima, rimae (10).

My recommendation is to retain these entries in the database and treat them as historically active. My reasoning is that they were used at some point and may have historical relevance. I suggest adding an additional column to the feature name table to indicate that these features are currently dropped, also include this information to be passed to Solr, informing users of their dropped status.

@aaccomazzi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant