Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G4.76 STEP 1 --Divides trait import value plugin instance into 3 separate instances #77

Merged
merged 50 commits into from
Jun 28, 2024

Conversation

carolyncaron
Copy link
Contributor

@carolyncaron carolyncaron commented May 13, 2024

Issue #76

Motivation

After spending time thinking about/discussing how to organize plugin instances for validating the phenotype share importer, we realized that the plugin instance to validate columns in the trait importer could be split up into 3 separate instances:

  1. Empty values
  2. That the combination of Trait Name + Method Short Name + Unit is not duplicated within the datafile or the database
  3. The column "Type" contains either the value "Qualitative" or "Quantitative"

See issue #53 for the original design of the current validators

What does this PR do?

This PR creates 3 new validators:

  1. EmptyCell - Given values from a single row in an input file, and a list of indices to check for, this validator reports pass if the values at the specified indices are not empty, and status fail if any are.
  2. ValueInList - Given values from a single row in an input file, a list of indices to check for, and a list of valid input values, this validator reports pass if the values at the specified indices contain any one of the specified valid values. Otherwise, a status of fail is returned.
  3. DuplicateTraits - Given values from a single row in an input file, and an array mapping the index for each of: trait name, method name and unit. Checks for duplicates at both the file level and database level, returns pass if no duplicates found, fail if the combination of trait + method + unit is found.

Additionally, 2 new methods were added to the Validator Base class:

  1. validateRow($row_values, $context): The 3 validators above implement this method instead of validate(), in order to pass in the values of the row and any relevant context to that row (ex: indices within the row that the validator should act on).
  2. checkIndices($row_values, $indices): Given values from a single row in an input file, and a list of indices to check for, throws an error if the list of indices does not make sense in the following contexts:
    • If no indices are provided
    • If the number of indices surpasses the number of values in $row_values
    • If any of the keys in indices is outside of the range of keys for $row_values

IMPORTANT NOTE: This PR does NOT remove the class TraitImportValues nor does it change the Trait Importer to call these new validators instead of the TraitImportValues class. This will be addressed in a separate PR. The purpose was to flesh out these validators first, and test them independently before moving on to having them utilized by an importer. Some tests were marked as skipped which will also be addressed in the next PR.

Testing

Automated Testing

  • All validator tests are now located in Kernel/Validators
  • New test classes: ValidatorBaseTest, ValidatorTraitImporterTest and ValidatorFileRowScopeTest

ValidatorBaseTest->testValidatorBaseCheckIndices

  • Provide an empty list of indices
  • Provide too many indices for the number of row values
  • Provide invalid indices (out of bounds for the indices in row_values array)

ValidatorFileRowScopeTest->testValidatorValueInList

  • Check for valid value in a single column (1 index)
  • Check for valid value in multiple columns (3 indices)
  • Check for column with invalid value (1 index)
  • Check for 1 column with a valid value, 1 with invalid (2 indices)

ValidatorFileRowScopeTest->testValidatorEmptyCell

  • Provide a list of indices for cells that are not empty
  • Provide a list of indices for cells that are empty
  • Provide a list of indices for the entire row (mixture of empty and non-empty cells)

ValidatorTraitImporterTest->testValidatorDuplicateTraitsFile

  • Enter a single row and check it is in the $unique_traits array
  • Re-enter the previous row (duplicate)
  • Provide a key in $context['indices'] that is not one of: trait, method, unit
  • Enter a 2nd unique row
  • Enter a third row with same trait + method as row 1, same unit as row 2 (not a duplicate)

ValidatorTraitImporterTest->testValidatorDuplicateTraitsDatabase

  • Validate a single row and check against an empty database
  • Enter a trait into the database and then try to validate the same trait details
  • Validate trait details where trait name and method name already exist in the database, but unit is unique

Manual Testing

  1. Build a docker image + container off of this branch.
  2. Go to localhost (or whichever port you're using for this container) and sign in
  3. Add a new organism by navigating to Tripal > Content > Add Tripal Content and select Organism from the list
  4. Configure the module by going to Tripal > Extensions > Tripal Cultivate Phenotypes Configuration
  5. Open Execute PHP by going to Devel > Execute PHP. You may need to add it as an option by going to Configure and checking PHP Execute in the list first.

The following are templates for testing each validator individually. Play around with different values in $file_row, or $context['indices']. You can also try to trigger an exception by the checkIndices() method!

EmptyCell

// Set the genus variable to a genus you've already created
$genus = 'Tripalus';

$traits_service = \Drupal::service('trpcultivate_phenotypes.traits');
$traits_service->setTraitGenus($genus);

// Create the plugin manager
$plugin_manager = \Drupal::service('plugin.manager.trpcultivate_validator');

// An array that contains the contents of a row in a file for the Trait Importer
$file_row = [
  'My trait',
  '',
  'My method',
  '',
  'My unit',
  ''
];

$validator_id = 'trpcultivate_phenotypes_validator_empty_cell';
$instance = $plugin_manager->createInstance($validator_id);

// We need a context array to specify which indices are NOT allowed to be empty
$context['indices'] = [ 0, 2, 4 ];

// Validate our file_row. Expected validation status = pass
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

// Set our required columns to be ones that are empty. Expected status = fail
$context['indices'] = [ 1, 3, 5 ];

$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

ValueInList

// Set the genus variable to a genus you've already created
$genus = 'Tripalus';

$traits_service = \Drupal::service('trpcultivate_phenotypes.traits');
$traits_service->setTraitGenus($genus);

// Create the plugin manager
$plugin_manager = \Drupal::service('plugin.manager.trpcultivate_validator');

// An array that contains the contents of a row in a file for the Trait Importer
$file_row = [
  'My trait',
  'My trait description',
  'My method',
  'My method description',
  'My unit',
  'Quantitative'
];

// A list of valid inputs
$valid_inputs = ['Quantitative', 'Qualitative'];

$validator_id = 'trpcultivate_phenotypes_validator_value_in_list';
$instance = $plugin_manager->createInstance($validator_id);

// We need a context array to specify which indices to check for valid input
$context['indices'] = [ 5 ];
// Set the context array of valid inputs
$context['valid_values'] = $valid_inputs;

// Validate our file_row. Expected validation status = pass
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

// Try checking for valid inputs at other indices. Expected status = fail
$context['indices'] = [ 1, 3 ];
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

DuplicateTraits: file level

// Set the genus variable to a genus you've already created
$genus = 'Tripalus';

$traits_service = \Drupal::service('trpcultivate_phenotypes.traits');
$traits_service->setTraitGenus($genus);

// Create the plugin manager
$plugin_manager = \Drupal::service('plugin.manager.trpcultivate_validator');

// An array that contains the contents of a row in a file for the Trait Importer
$file_row = [
  'My trait',
  'My trait description',
  'My method',
  'My method description',
  'My unit',
  'Quantitative'
];

// Test the DuplicateTraits validator at the file level
// Each call of validateRow() is treated like a new row within the same file, unless the plugin gets re-instantiated
$validator_id = 'trpcultivate_phenotypes_validator_duplicate_traits';
$instance = $plugin_manager->createInstance($validator_id);
// We need a context array to specify which indices are the name of the trait, method and unit
$context['indices'] = [ 'trait' => 0, 'method' => 2, 'unit' => 4 ];

// Validate our file_row. Expected validation status = pass
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

// Re-validate the same row. Expected status = fail
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

DuplicateTraits: database level

// Set the genus variable to a genus you've already created
$genus = 'Tripalus';

$traits_service = \Drupal::service('trpcultivate_phenotypes.traits');
$traits_service->setTraitGenus($genus);

// Create the plugin manager
$plugin_manager = \Drupal::service('plugin.manager.trpcultivate_validator');

// An array that contains the contents of a row in a file for the Trait Importer
// Keys are set to the column headers as the insertTrait() method expects an associative array
$file_row_insert = [
  'Trait Name' => 'My trait',
  'Trait Description' => 'My trait description',
  'Method Short Name' => 'My method',
  'Collection Method' => 'My method description',
  'Unit' => 'My unit',
  'Type' => 'Quantitative'
];

// Test the DuplicateTraits validator at the database level
$validator_id = 'trpcultivate_phenotypes_validator_duplicate_traits';
$instance = $plugin_manager->createInstance($validator_id);
// We need a context array to specify which indices are the name of the trait, method and unit
$context['indices'] = [ 'trait' => 0, 'method' => 2, 'unit' => 4 ];

// Insert the values of our row into the database
$combo_ids = $traits_service->insertTrait($file_row_insert);

// Now validate our row. Expected validation status = fail
$file_row = array_values($file_row_insert);
$validation_status = $instance->validateRow($file_row, $context);
print_r($validation_status);

@carolyncaron carolyncaron changed the title G4.76 divide trait import value G4.76 Divides trait import value plugin instance into 3 separate instances May 15, 2024
…e traits validator test for the database level
@carolyncaron carolyncaron marked this pull request as ready for review June 21, 2024 20:45
@reynoldtan
Copy link
Contributor

reynoldtan commented Jun 24, 2024

Manual Test:
First and foremost, impressive work @carolyncaron

I performed manual test outlined above and the method/logic works as described. Suggestions/Improvements are as follow:

  1. The validation error report references index number of the column header instead of the column header itself. By using the readable column header, saves the task of cross referencing of the index to column header.
    image

Information surrounding the file being uploaded such as the file id, genus, project and the column headers are readily available through the validator base loadAssets() method.

public function loadAssets($project, $genus, $file_id, $headers, $skip = 0) {

In this case, the column headers are available in this property

$this->column_headers[0] // is the Trait Name
$this->column_headers[1] // is the Trait Description
and so on.

There is also no reference what line number the error occurred, but I am assuming this will all be addressed in another PR.

  1. In empty row check, given this row with double spaces for trait name cell, the validator returns a pass status.
$file_row = [
  '  ',
  'My trait description',
  'My method',
  'My method description',
  'My unit',
  'Quantitative'
];

$context['indices'] = [ 0 ];
$validation_status = $instance->validateRow($file_row, $context);

Try the trim() function and trim the value for empty value check.
  1. When an empty row is provided, it triggers an error which is a good catch but the error is about indices more than the empty row.

image

  1. In value in list validator, the validator disregards the case of the valid value, so for example when I set the valid value context array to ['Quantitative'], any case of the word Quantitative (quantitative, QUANTITATIVE, Quantitative) will pass. I think the case is important since in the instruction for the Import file is

Type: One of "Qualitative" or "Quantitative".

Source code:
DuplicateTraits.php

  1. We can omit this line:
  2. This service should be D. Injected
    $service_traits = \Drupal::service('trpcultivate_phenotypes.traits');

I have example here:

$genus_config = $this->service_genus_ontology->getGenusOntologyConfigValues($this->genus);

EmptyCell.php

  1. File class can go
  2. The comment mentions of a unrelated variable $row
    * - indices => an array of indices corresponding to the cells in $row to act on

ValueInList.php

  1. File class can go
  2. This should be using the key valid_values instead of indeces

checkIndices()

  1. Kindly expand more on the @param indices, I am confused since sometime it can be one dimensional array and sometimes an associative array.

@carolyncaron
Copy link
Contributor Author

Thank you Reynold 😊 I have addressed your suggestions specific to the source code in commit ebede3a. Specifically:

  1. I removed the use of the file class in all 3 validators (good catch!) and I also found I was including it unnecessarily in my test files as well (in addition, my test files used TripalLogger which also wasn't needed).
  2. Switched to use drupal injection for the Traits service - thanks for your example, it made it easy!!
  3. Both doc blocs in EmptyCell and ValueInList referred to undeclared variable $row, I changed this instead to be $row_Values
  4. I didn't change the key of 'indices' at line 52 of ValueInList, but I tried to make the wording a bit more clear
  5. I also tried to clarify the param $indices for checkIndices(). I can see how it can get confusing, but I think it does make more sense here that checkIndices() expects a one dimensional array, versus the associative array expected by each of the validators, since checkIndices only ever cares about the indices. Since checkIndices() is only ever called by a validator directly, the indices array gets passed in using the param $context['indices'] given to the validator. Let me know if this doesn't make sense!

@carolyncaron
Copy link
Contributor Author

carolyncaron commented Jun 26, 2024

Here are my comments to @reynoldtan's suggestions based on his manual test:

  1. We have collectively decided to not address this in this PR, instead see Issue G4.82 Revisit the use of scope by the Validator API #82 for the start of our discussion regarding the future of theloadAssets() method.
  2. Addressed in commit ca00acc. Thanks for your suggestion! I also changed my unit test to include one of each in my file row: No whitespace, a single whitespace, and double whitespace.
  3. This one took me a bit of time to think about, but I actually don't think I need to make any changes here. Here are my reasons:
    • I could include a quick check for an empty row in the checkIndices() method, but I think this method should strictly only be checking the indices, it only needs the row values to evaluate whether the indices make sense in context
    • I actually think this case should be handled at the importer level - because then what is the point of calling the validators on it? For example, in the Traits Importer currently, an empty file row gets skipped and thus none of the validators will see it
    • If anyone feels strongly that there should be a check somewhere for empty row (I guess in case the developer forgets to check in an importer 😅 ), then my suggestion is to create a new method for it in ValidatorBase. Potentially there might be other checks we haven't thought about yet specifically for $row_values? I may have talked myself into this option TBH, any thoughts by either @reynoldtan or @laceysanderson ?
  4. Addressed in commit 9472dde . Now, the ValueinList validator will do a straight check for valid values that is case sensitive. If there's no match, it does a second check for any case insensitive matching, which will set a flag for the purpose of formatting the title in validation_status. This was tricky to do well with the knowledge that return values for the validators will soon be changing (see issue G4.86 Update Trait Importer formValidate to use new return value style. #86). Let me know if you have any alternative suggestions! I also changed my unit tests to be case sensitive, and added a new test that checks for a case insensitive match and that the validation_status indeed reflects that.

laceysanderson
laceysanderson previously approved these changes Jun 27, 2024
Copy link
Member

@laceysanderson laceysanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise this looks great 😍
All the suggestions are comment focused.

carolyncaron and others added 3 commits June 27, 2024 11:08
Redeclares protected variables as arrays where it was removed

Co-authored-by: Lacey-Anne Sanderson <las166@mail.usask.ca>
All of Lacey's suggestions within commented parts of the code

Co-authored-by: Lacey-Anne Sanderson <las166@mail.usask.ca>
Added skip test message

Co-authored-by: Lacey-Anne Sanderson <las166@mail.usask.ca>
@carolyncaron
Copy link
Contributor Author

Thanks for your suggestions Lacey! I committed all of them in 3 batches 😄

@reynoldtan
Copy link
Contributor

All good, DI of trait service looks beautiful. One last thing I noticed when I do a file level duplicate check the message indicate the that the duplication is in the database not the in-file.

image

laceysanderson added a commit that referenced this pull request Jun 27, 2024
@carolyncaron
Copy link
Contributor Author

All good, DI of trait service looks beautiful. One last thing I noticed when I do a file level duplicate check the message indicate the that the duplication is in the database not the in-file.

image

Thanks for catching this, Reynold! In this case, if a trait combo has been inserted into the database, ideally the validator should still recognize when a trait combo has been validated before (ie. it is a duplicated row in the input file) and tell the user about that too. I've updated the DuplicateTrait validator to account for this and added a test for this situation in commit 503879e

Copy link
Member

@laceysanderson laceysanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updates look good!

@carolyncaron carolyncaron merged commit fb5a194 into 4.x Jun 28, 2024
8 checks passed
@carolyncaron carolyncaron deleted the g4.76-divideTraitImportValue branch June 28, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

G4.76 - Divide Trait Import Value plugin instance into 3 separate instances
3 participants