Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failing validation of matnwb-generated files with extension in cached namespace #351

Open
cechava opened this issue Nov 19, 2021 · 8 comments
Labels
status: need more info unclear what the issue is or what needs to be done

Comments

@cechava
Copy link
Collaborator

cechava commented Nov 19, 2021

pynwb validation of matnwb-generated files fails when an extension is in the cached namespace.
This is and independent issue from issue #349 and is not fixed by PR #350

To replicate:

generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');

nwb = NwbFile( ...
    'session_description', 'mouse in open exploration', ...
    'identifier', 'Mouse5_Day3', ...
    'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);

cortical_surfaces = types.ndx_ecog.CorticalSurfaces;

parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ... 
    'data', randi(5, 1, 20), ... 
    'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);

nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');
$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb 
Validating ndx-ecog_file_test_matnwb.nwb against cached namespace information using namespace 'ndx-ecog'.
Traceback (most recent call last):
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 229, in get_validator
    return self.__validators[dt]
KeyError: 'NWBFile'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 136, in <module>
    main()
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 130, in main
    ret = ret or _validate_helper(io=io, namespace=ns)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/validate.py", line 23, in _validate_helper
    errors = validate(**kwargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 587, in func_call
    return func(**pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/pynwb/__init__.py", line 198, in validate
    return validator.validate(builder)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
    return func(args[0], **pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 247, in validate
    validator = self.get_validator(dt)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/utils.py", line 583, in func_call
    return func(args[0], **pargs)
  File "/Users/cesar/anaconda3/envs/pynwb-dev/lib/python3.9/site-packages/hdmf/validate/validator.py", line 232, in get_validator
    raise ValueError(msg)
ValueError: data type 'NWBFile' not found in namespace ndx-ecog

No issue when cached namespace is not used:

$ python -m pynwb.validate ndx-ecog_file_test_matnwb.nwb --no-cached-namespace
Validating ndx-ecog_file_test_matnwb.nwb against pynwb namespace information using namespace 'core'.
 - no errors found.
@cechava
Copy link
Collaborator Author

cechava commented Nov 19, 2021

Doing some printing. It looks like pynwb is trying to use key 'NWBFile' to access an entry in a dictionary that has the relevant subset of classes as keys (e.g., Data, NWBContainer, Subject). This is an incorrect behavior.
When successfully validating a file without an extension in the cached namespace, the 'NWBFile' keys is used to a access an entry in a dictionary with all core and hdmf-common classes as keys (e.g., NWBFile, Subject, DynamicTable)

@cechava
Copy link
Collaborator Author

cechava commented Nov 19, 2021

Issue seems to be in how matnwb caches the namespace of the extension. Compare how the namespace cached by matnwb:

import h5py
import json
from pprint import pprint
f = h5py.File('ndx-ecog_file_test_matnwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))


{'namespaces': [{'author': ['Ben Dichter'],
                 'contact': ['ben.dichter@catalystneuro.com'],
                 'doc': 'An NWB extension for storing the cortical surface of '
                        'subjects in ECoG experiments.',
                 'name': 'ndx-ecog',
                 'schema': [{'namespace': 'core',
                             'neurodata_types': ['NWBDataInterface',
                                                 'Subject',
                                                 'Images',
                                                 'NWBData']},
                            {'source': 'ndx-ecog.extensions'}],
                 'version': '0.3.1'}]}

with how the namespace is cached by pynwb

f = h5py.File('ndx-ecog_file_test_pynwb.nwb', "r")
pprint(json.loads(f["/specifications/ndx-ecog/0.3.1/namespace"][()]))

{'namespaces': [{'author': ['Ben Dichter'],
                 'contact': ['ben.dichter@catalystneuro.com'],
                 'doc': 'An NWB extension for storing the cortical surface of '
                        'subjects in ECoG experiments.',
                 'name': 'ndx-ecog',
                 'schema': [{'namespace': 'core'},
                            {'source': 'ndx-ecog.extensions'}],
                 'version': '0.3.1'}]}

Note that pynwb does not cache the neurodata_types key in schema. Modifying matnwb caching to exclude this key resolves the issue. Opening a PR with this change.

@oruebel
Copy link
Contributor

oruebel commented Nov 19, 2021

@rly can you clarify why caching the neurodata_types key is a problem and why PyNWB does not cache it?

@rly
Copy link
Contributor

rly commented Nov 19, 2021

This may be a bug in the PyNWB/HDMF validator. I thought this was resolved in hdmf-dev/hdmf#613 . @cechava can you share that test file that you created?

@rly
Copy link
Contributor

rly commented Nov 19, 2021

Separately, PyNWB/HDMF should cache the neurodata_types key. That is a bug.

@cechava
Copy link
Collaborator Author

cechava commented Nov 19, 2021

Sure. The code to generate both files is below. They both make sure of the ndx-ecog extension

python

from pynwb import load_namespaces, get_class, NWBHDF5IO, NWBFile
from datetime import datetime
import numpy as np
from ndx_ecog import CorticalSurfaces, ECoGSubject, Surface, Parcellations
nwbfile = NWBFile('my first synthetic recording', 'EXAMPLE_ID', datetime.now())
cortex_module = nwbfile.create_processing_module(name='cortex',
                                                 description='description')
with NWBHDF5IO('ndx-ecog_file_test_pynwb.nwb', 'w') as io:
    io.write(nwbfile)

MATLAB

generateExtension('/Users/cesar/Repositories/ndx-ecog/spec/ndx-ecog.namespace.yaml');

nwb = NwbFile( ...
    'session_description', 'mouse in open exploration', ...
    'identifier', 'Mouse5_Day3', ...
    'session_start_time', datetime(2018, 4, 25, 2, 30, 3) ...
);

cortical_surfaces = types.ndx_ecog.CorticalSurfaces;

parcellations = types.ndx_ecog.Parcellations();
parcellation = types.ndx_ecog.Parcellation( ... 
    'data', randi(5, 1, 20), ... 
    'labels', {'a','b', 'c', 'd', 'e'} ...
);
parcellations.parcellation.set('my_map', parcellation);

nwbExport(nwb, 'ndx-ecog_file_test_matnwb.nwb');

@cechava
Copy link
Collaborator Author

cechava commented Nov 22, 2021

@rly Just realized you were asking for the actual files. To make life easy here are links to download:
matnwb-generated file
pynwb-generated file

@ehennestad ehennestad added the status: need more info unclear what the issue is or what needs to be done label Oct 31, 2024
@ehennestad
Copy link
Collaborator

@rly Could you help me understand if this is a bug with the way matnwb embeds an extension. Or was this a bug in pynwb? If the latter, do you know if it was resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need more info unclear what the issue is or what needs to be done
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants