Add data read #85

oruebel · 2024-08-31T09:51:57Z

This PR is to try and implement the proposed approach for data read from #83 to see if this approach is viable. This PR is experimental right now and should not be merged.

2. Proposed Implementation for reading data arrays

`BaseReadData`

Create a new ReadDatasetWrapper and ReadAttributeWrapper classes for reading data. This is modified from the proposal, which suggested a single BaseReadData class for reading any array (datasets, attributes) from a file.
Support conversion to boost multi-dimensional array for convenience
Note I did not update BaseRecordingData to inherit from ReadDataWrapper because the two are not compatible right now. BaseReadData uses the io and the path to access the data, whereas BaseRecordingData leaves that choice to the I/O backend and stores the references to the dataset. We may or may not want to change this.

`BaseIO`

Note I did not add abstract methods for lazy reading objects from a file to BaseIO (or more accurately I remove them) because: 1) I wanted to use the shared_ptr to the I/O rather than a raw pointer, which I can't get from BaseIO, and 2) with the ReadDatasetWrapper this is more approbriately done in the Container class directly.
Add pure virtual method to allow us to get the storage object type (Group, Dataset, Attribute) for a given path
Add pure virtual methods to read data values from a Dataset or Attribute that the BaseReadData can call for read

`HDF5IO`

Note In contrast to the proposal, I did not implement specific version of the BaseReadData for HDF5 but left read logic to HDF5IO itself so that the ReadDatasetWrapper can remain generic. To make this more manageable, I defined ReadAttributeWrapper separately
Implement the methods for reading data values from a Dataset or Attribute that the HDF5ReadDataSet and HDF5ReadAttribute wrappers can call can call for read
Implement the getObjectType method for getting the storage object type (Group, Dataset, Attribute) for a given path

`Container`

Store the io object on the Container so that we can call io->readDataset and io->readAttribute in the read methods

NWB types: `TimeSeries`, `ElectricalSeries` etc.

Remove storage of properties from the Container classes and replace them with access methods that return BaseReadData objects instead. This allows for reading in both read and write mode and avoids keeping data in memory that we have already written to disk. For example, in TimeSeries, these variables would need to change to properties:

aqnwb/src/nwb/base/TimeSeries.hpp

Lines 91 to 140 in e873d95

    
             /** 
        
              * @brief Base unit of measurement for working with the data. Actual stored 
        
              * values are not necessarily stored in these units. To access the data in 
        
              * these units, multiply ‘data’ by ‘conversion’ and add ‘offset’. 
        
              */ 
        
             std::string unit; 
        
             /** 
        
              * @brief The description of the TimeSeries. 
        
              */ 
        
             std::string description; 
        
             /** 
        
              * @brief Human-readable comments about the TimeSeries. 
        
              */ 
        
             std::string comments; 
        
             /** 
        
              * @brief Size used in dataset creation. Can be expanded when writing if 
        
              * needed. 
        
              */ 
        
             SizeArray dsetSize; 
        
             /** 
        
              * @brief Chunking size used in dataset creation. 
        
              */ 
        
             SizeArray chunkSize; 
        
             /** 
        
              * @brief Scalar to multiply each element in data to convert it to the 
        
              * specified ‘unit’. 
        
              */ 
        
             float conversion; 
        
             /** 
        
              * @brief Smallest meaningful difference between values in data, stored in the 
        
              * specified by unit. 
        
              */ 
        
             float resolution; 
        
             /** 
        
              * @brief Scalar to add to the data after scaling by ‘conversion’ to finalize 
        
              * its coercion to the specified ‘unit’. 
        
              */ 
        
             float offset; 
        
             /** 
        
              * @brief The starting time of the TimeSeries. 
        
              */ 
        
             float startingTime = 0.0;

Add access methods that return BaseReadData for missing fields

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Add access methods on the respective Container that owns the respective objects, e.g., NWBFile owning ElectricalSeries objects to retrieve the object
Add abstract factory method (that is templated on the return type) to Container to create an instance of the specific Container type using only the io and path for the Container as input. The specific Container classes, such as TimeSeries will then need to implement a corresponding constructor that uses io and path as input.

Step 1: Define the Template Factory Method in `Container`

class Container {
public:
   
    template <typename T>
    static std::unique_ptr<T> create(const BaseIO& io, const std::string& path) {
        static_assert(std::is_base_of<Container, T>::value, "T must be a derived class of Container");
        return std::unique_ptr<T>(new T(path, io));
    }
};

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

Add the necessary constructor

class TimeSeries : public Container {
public:
    TimeSeries(const std::string& path, const BaseIO& io) {
        // Implementation of TimeSeries constructor
    }
};

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

I'm not sure we'll need do this, since a group by itself does not define data. To access the contents we could define access methods on the parent Container class (e.g., NWBFile) that owns the untyped group to access its contents.

TODO

Next steps

Check that workflows are running the correct code version. The build in the test fails with /Users/runner/work/aqnwb/aqnwb/src/nwb/ecephys/ElectricalSeries.cpp:61:25: error: use of undeclared identifier 'BaseDataType'; did you mean 'IO::BaseDataType'? m_io->createAttribute(BaseDataType::I32, But that is not what in line 61 in this PR https://github.com/NeurodataWithoutBorders/aqnwb/actions/runs/11023552242/job/30615439516?pr=85#step:6:249
Once the workflow issue is resolved, I propose we merge Add read for neurodata_types, e.g., Container, TimeSeries #91 with this PR to create a single PR
See Add read for neurodata_types, e.g., Container, TimeSeries #91 for open ToDo items that should be addressed before merge

…r from BaseIO

oruebel · 2024-09-01T09:08:33Z

@stephprince when you get a chance ,could you please do a first code review of this PR to make sure this is heading in the right direction. I now have a first outline of one possible solution for how we might implement read. There is still a lot more work to be done before this PR is ready, but it would be useful if you could take a look before I go any further with this approach.

I would start by looking at:

tests/examples/test_ecephys_data_read.cpp which shows an example of how read works for the user
BaseIO then defines the main new classes used for reading and HDF5IO then implements the actual reading
Container and ElectricalSeries also have some relevant changes to allow us to construct Container objects for read and how we can get specific datasets/attributes

oruebel · 2024-09-01T22:26:28Z

@stephprince I just added a documentation page as well, which hopefully helps explain the current proposed design for read so we can review and discuss.

…iven path

…ve args to initalize

* Merged main into add_read * Fix docs build for SpikeEventSeries * Fix code formatting * Fix segfault due to duplicate declaration of NWBFile.io parameter

stephprince

Finally got a chance to look through this and added some comments / questions! A couple of comments In addition to what I added in code:

I think we want to add tests for reading in string datasets, attributes, and multidimensional datasets. There were a couple of sections in the docs examples maybe working on testing these things that were commented out, not sure if those were WIP or resolved elsewhere.
How do links/references work on read?

I'm partway through looking through the follow up PR, will ping you there when I finish.

stephprince · 2024-09-24T14:51:04Z

.github/workflows/codespell.yml

@@ -6,8 +6,8 @@ on:
    - main

  pull_request:
-    branches:
-    - main
+    #branches:


uncomment workflow triggers before merging

The reason I had modified this is because it would'nt trigger workflows if they didn't target the main branch. I think it would be useful to run the tests on all PRs, even if they target branches other than main

Got it, that makes sense. We can remove the branch section entirely then or yes just leave it commented out.

Either way is fine. I just commented them out here, because I wanted to see what tests are working/failing on this PR. We can change workflows also in a separate PR and remove this change here before merging if you prefer.

src/nwb/NWBFile.cpp

src/nwb/hdmf/base/Data.hpp

src/io/hdf5/HDF5IO.hpp

stephprince · 2024-09-24T19:18:34Z

src/nwb/base/TimeSeries.hpp

+  template<typename VTYPE = std::any>
+  inline std::unique_ptr<
+      IO::ReadDataWrapper<AQNWB::Types::StorageObjectType::Dataset, VTYPE>>
+  dataLazy() const


Is there an alternative to appending Lazy to the name of every member variable that is a readDataWrapper? I think as a user I would prefer to access data in the same way whether it's in memory or read from the file, and we can specify that data will always be lazy loaded on read. I'm wondering if that's possible to use a single this->data() variable here if we use some approach such as

std::variant<BaseRecordingData, ReadDataWrapper>, or

by using a common base class for those two classes.

I can't remember if we had discussed this approach in the past or not

In #91 I replaced dataLazy (and other read fields) with the DEFINE_FIELD macro to create these access methods. I changed the name of the functions there to use the naming convention read<FieldPath>, e.g., readData or readTimestamps . To avoid confusion for attributes I used the full path in naming the field, e.g., readDataUnit and readTimestampsUnit.

I don't think we want to store the read wrappers on the container. They are lightweight and a user may need to use a data type different than the default, so creating the read wrapper on request is useful. I.e., I don't think it is strictly necessary for BaseRecordingData and ReadDataWrapper to have a common base. We could rename TimeSeries.data to TimeSeries.recordData to make them more consistent, or we could leave it as is and say that attributes without a prefix are for acquisition and with the prefix are for read only.

With all of this said, I think it is worth discussing whether having a common base class for BaseRecordingData and ReadDataWrapper would be useful.

stephprince · 2024-09-24T19:52:10Z

src/io/ReadIO.hpp

+/**
+ * @brief Generic structure to hold type-erased data and shape
+ */
+class DataBlockGeneric


Would there be a way to only expose DataBlock to the user and use DataBlockGeneric under the hood as needed?

I feel it would be simpler if they could use DataBlock for reading data, and if the type is unknown use the default std::any. I might just be missing reasons to keep it separate at the user level, but was wondering if that could simplify things

I like the idea of only using DataBlock but we'll need to think a bit more about how to do that. One key difference is that DataBlockGeneric stores the entire data as std::any so we can use a single cast std::any_cast<std::vector<DTYPE>>(genericData.data) to transform the data to the correct type without having to copy the data. DataBlock on the other hand stores the data as and std::vector and to cast std::vector<std::any> required copying the data vector and casting each individual element. Similarly, on read we would probably also need to do another copy in order to move the data into the std::vector<std::any>. Happy to chat about how we may be able to simplify this.

Maybe one approach could be to have DataBlock store the data internally as std::any and then access the data via a function instead so that we do the casting when accessing the data. However, when reading data where the user may not know the data type before-hand, the user still needs to then define the type to cast to since we cannot simply cast std::any to std::vector<std::any>, but we need to cast to the correct type.

stephprince · 2024-09-24T19:54:35Z

tests/testHDF5IO.cpp

+{
+  std::vector<int32_t> testData = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
+
+  SECTION("read 1D data block of a 1D dataset")


I think we would want to add additional tests for at least one type of multidimensional data here

The unit tests are still very minimal right now. I totally agree, there are a lot more tests that need to be added. However, I wanted to get your take on the overall design first before diving even deeper into testing all the different cases.

stephprince · 2024-09-24T20:01:28Z

src/io/hdf5/HDF5IO.cpp

+  size_t pos = path.find_last_of('/');
+  if (pos == std::string::npos) {
+    return nullptr;
+  }
+
+  std::string parentPath = path.substr(0, pos);
+  std::string attrName = path.substr(pos + 1);


these manipulations as well as the mergePath functions I think are good reasons to switch to using the std::filesystem::path representation?

stephprince · 2024-09-24T20:04:18Z

src/io/hdf5/HDF5IO.cpp

+  std::string parentPath = path.substr(0, pos);
+  std::string attrName = path.substr(pos + 1);
+
+  try {


could use getH5ObjectType here instead of trying to open object as group or a dataset?

docs/pages/userdocs/read.dox

oruebel · 2024-09-24T21:47:53Z

Thanks for taking a look at the PR.

think we want to add tests for reading in string datasets, attributes, and multidimensional datasets. There were a couple of sections in the docs examples maybe working on testing these things that were commented out, not sure if those were WIP or resolved elsewhere.

Totally agree. This PR + #91 together outline how read can work, but there are still a lot of tests that need to be added to make sure everything is working as expected. I think in particular on the HDF5IO there are probably still some gremlins hiding. This is part of the todo list in #91. Part of the question for the review is to decide on what items we view as essential before merging (e.g., unit tests) vs. items we want to address in separate PRs. I'd suggest to maybe just add any items to the TODO list in the description of #91 and then we can go through and prioritize.

2. How do links/references work on read?

Links should be handled transparently by HDF5 so I think read should work in the same way as for groups and datasets. I have not yet addressed the case of object references that are stored as values of attributes and datasets. I think this will require additional logic in the ReadDataWrapper and HDF5IO.

Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

src/io/ReadIO.hpp

Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

oruebel · 2024-09-25T00:02:44Z

I'm a bit confused why the tests are failing after submitting the suggestions from the review. The build in the test fails with /Users/runner/work/aqnwb/aqnwb/src/nwb/ecephys/ElectricalSeries.cpp:61:25: error: use of undeclared identifier 'BaseDataType'; did you mean 'IO::BaseDataType'? m_io->createAttribute(BaseDataType::I32, But that is not what in line 61 in this PR

https://github.com/NeurodataWithoutBorders/aqnwb/actions/runs/11023552242/job/30615439516?pr=85#step:6:249

oruebel · 2024-10-22T17:51:32Z

@stephprince looking at the workflow details https://github.com/NeurodataWithoutBorders/aqnwb/actions/runs/11023552242/job/31901884513?pr=85 The workflow is checking out:

/opt/homebrew/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +b2fb27fdba42dd5183bce7d2cc44421c50e2f428:refs/remotes/pull/85/merge

However, looking at that commit hash at b2fb27f , GitHub says that "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

Digging around a bit more with ChatGPT, it says:

When a GitHub Action runs for a pull request, it often checks out a temporary merge commit. This commit represents the state of the code if the pull request were to be merged into the target branch (usually main). This allows the CI to test the combined code without actually merging it. [...] The commit b2fb27f is likely a temporary merge commit created by GitHub to test the pull request. These commits are not part of the repository's history and exist only in the context of the CI job.

I didn't know that this is what GitHub was doing, but it makes sense that GitHub wants to simulate the state of the code as if the pull request were merged into the target branch. Looking at the code in b2fb27f it indeed shows the error that the build is showing in line ElectricalSeries.cpp:61:25: error: use of undeclared identifier 'BaseDataType'; did you mean 'IO::BaseDataType'? m_io->createAttribute(BaseDataType::I32, I.e., something seems to be going wrong when git tries to automatically merge the branches.

We could modify the tests.yml to run all the tests with both states, i.e., with the code as is in the PR as well as with the temporary merge. This would make the CI runtime longer (since all tests would run twice) and make the workflow a bit longer but would help with finding merge errors. So its a tradeoff between runtime and complexity and being more rigorous. What do you think? Here is how the modified tests.yml workflow could look, where:

test-pr-state Job: Checks out the specific PR branch state using github.event.pull_request.head.sha.
test-merged-stateJob: Checks out the temporary merge commit using refs/pull/${{ github.event.pull_request.number }}/merge.
sanitize Job: Runs after both test-pr-state and test-merged-state jobs
validate Job: Runs after all previous jobs and performs validation.

name: Run tests

on:
  push:
    branches:
      - main
  pull_request:
  workflow_dispatch:

jobs:
  test-pr-state:  # Job to test the state of the PR branch as it is
    if: github.event_name == 'pull_request'  # Run only on pull request events
    name: Test PR State
    defaults:
      run:
        shell: bash
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}
      cancel-in-progress: true
    strategy:
      fail-fast: false
      matrix:
        os: [macos-latest, ubuntu-latest]

    runs-on: ${{ matrix.os }}

    steps:
      - name: Checkout PR branch
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}  # Checkout the PR branch directly

      - name: Install dependencies - ubuntu
        if: matrix.os == 'ubuntu-latest'
        run: |
          sudo apt-get update
          sudo apt-get install -y libhdf5-dev libboost-all-dev
          git clone https://github.com/catchorg/Catch2.git
          cd Catch2
          git checkout "v3.5.3"
          cmake -Bbuild -H. -DBUILD_TESTING=OFF
          sudo cmake --build build/ --target install

      - name: Install dependencies - macos
        if: matrix.os == 'macos-latest'
        run: brew install hdf5 boost catch2

      - name: Configure
        shell: pwsh
        run: cmake "--preset=ci-$("${{ matrix.os }}".split("-")[0])"

      - name: Build
        run: cmake --build build --config Release -j 2

      - name: Install
        run: cmake --install build --config Release --prefix prefix

      - name: Test
        working-directory: build
        run: ctest --output-on-failure --no-tests=error -C Release -j 2

      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: test-files-${{ matrix.os }}
          path: |
            build/tests/data/*.nwb

  test-merged-state:  # Job to test the state of the code after merging the PR into the target branch
    if: github.event_name == 'pull_request'  # Run only on pull request events
    name: Test Merged State
    defaults:
      run:
        shell: bash
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}
      cancel-in-progress: true
    strategy:
      fail-fast: false
      matrix:
        os: [macos-latest, ubuntu-latest]

    runs-on: ${{ matrix.os }}

    steps:
      - name: Checkout merged state
        uses: actions/checkout@v4
        with:
          ref: refs/pull/${{ github.event.pull_request.number }}/merge  # Checkout the merge commit

      - name: Install dependencies - ubuntu
        if: matrix.os == 'ubuntu-latest'
        run: |
          sudo apt-get update
          sudo apt-get install -y libhdf5-dev libboost-all-dev
          git clone https://github.com/catchorg/Catch2.git
          cd Catch2
          git checkout "v3.5.3"
          cmake -Bbuild -H. -DBUILD_TESTING=OFF
          sudo cmake --build build/ --target install

      - name: Install dependencies - macos
        if: matrix.os == 'macos-latest'
        run: brew install hdf5 boost catch2

      - name: Configure
        shell: pwsh
        run: cmake "--preset=ci-$("${{ matrix.os }}".split("-")[0])"

      - name: Build
        run: cmake --build build --config Release -j 2

      - name: Install
        run: cmake --install build --config Release --prefix prefix

      - name: Test
        working-directory: build
        run: ctest --output-on-failure --no-tests=error -C Release -j 2

      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: test-files-${{ matrix.os }}
          path: |
            build/tests/data/*.nwb

  sanitize:  # Job to run sanitization tests on the merged state
    if: github.event_name == 'pull_request'  # Run only on pull request events
    needs: [test-pr-state, test-merged-state]

    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          ref: refs/pull/${{ github.event.pull_request.number }}/merge  # Checkout the merge commit

      - name: Install dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y libhdf5-dev libboost-all-dev
          git clone https://github.com/catchorg/Catch2.git
          cd Catch2
          git checkout "v3.5.3"
          cmake -Bbuild -H. -DBUILD_TESTING=OFF
          sudo cmake --build build/ --target install

      - name: Configure
        run: cmake --preset=ci-sanitize

      - name: Build
        run: cmake --build build/sanitize -j 2

      - name: Test
        working-directory: build/sanitize
        env:
          ASAN_OPTIONS: "strict_string_checks=1:\
            detect_stack_use_after_return=1:\
            check_initialization_order=1:\
            strict_init_order=1:\
            detect_leaks=1:\
            halt_on_error=1"
          UBSAN_OPTIONS: "print_stacktrace=1:\
            halt_on_error=1"
        run: ctest --output-on-failure --no-tests=error -j 2

  validate:  # Job to validate the results using the test files
    needs: [test-pr-state, test-merged-state, sanitize]
    defaults:
      run:
        shell: bash
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}
      cancel-in-progress: true
    strategy:
      fail-fast: false
      matrix:
        os: [macos-latest, ubuntu-latest]

    runs-on: ${{ matrix.os }}

    steps:
      - name: Download test files
        uses: actions/download-artifact@v3
        with:
          name: test-files-${{ matrix.os }}
          path: nwb_files

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install pynwb and run validation
        run: |
          python -m pip install --upgrade pip
          python -m pip install nwbinspector
          nwbinspector nwb_files --threshold BEST_PRACTICE_VIOLATION

oruebel · 2024-10-23T00:21:55Z

We could modify the tests.yml to run all the tests with both states, i.e., with the code as is in the PR as well as with the temporary merge. This would make the CI runtime longer (since all tests would run twice) and make the workflow a bit longer but would help with finding merge errors.

We decide not to do this and to continue testing only for the merged version. We decided to add a note in the developer docs to clarify this behavior.

oruebel added 15 commits August 30, 2024 22:36

Define base classes for reading

1e1f041

First draft of reading datasets and attributes

e58f50f

Split reading of attribute and dataset to separate functions

250a66b

Remove debug print

e1bfa2e

Add functions to construct ReadDatasetWrapper and ReadAttributeWrappe…

4aedefd

…r from BaseIO

Fix formatting

4dae171

Add test for using the ReadDatasetWrapper

140809e

Start refactor containers for read

4d7b1ea

Fix format

2db4bfe

Read ElectricalSeries.data example working

b2959d2

Revert change to tests used for debugging

37d30d3

Fix codespell issue

6cfe416

Move read example to an example file

3d86aff

Fix bug in Container::create

3fc1d7a

Add example for data read

b09a4fa

oruebel requested a review from stephprince September 1, 2024 08:45

Fix spelling error

970f4c8

Add user docs for data read

26fbae9

oruebel added 6 commits September 1, 2024 16:35

Update read user docs

4299f99

Add toc

9c9301c

Add read software design figure and more details on the design

447337a

Some adjustment to the edges in the fig

e3d4330

Some adjustment to the edges in the fig

77f0861

Add intro for data read page

5190731

oruebel mentioned this pull request Sep 3, 2024

Propose refactor of I/O class organization #88

Open

oruebel added 3 commits September 2, 2024 22:33

Make Container::create inline

f08e42f

Make DataBlock::fromGeneric inline

1a754b9

Implement function to allow us to get the storage object type for a g…

ebfbdf1

…iven path

oruebel and others added 4 commits September 6, 2024 01:05

Some minor fixes

b7cc149

Merge branch 'main' into add_read

b9c6c9b

Update ElectrodGroup to provide standard Container constructor and mo…

69c3c4e

…ve args to initalize

Fix formatting

e75c7ad

oruebel mentioned this pull request Sep 8, 2024

Add read for neurodata_types, e.g., Container, TimeSeries #91

Open

52 tasks

oruebel and others added 6 commits September 8, 2024 23:43

Merge branch 'main' into add_read

5ab8464

Fix ElectricalSeries

803dfcf

Run test workflows on all PRs

cdffc74

Fix static assert for ReadDataWrapper

89f6d29

Sync add_read with main branch (#101)

5608c5a

* Merged main into add_read * Fix docs build for SpikeEventSeries * Fix code formatting * Fix segfault due to duplicate declaration of NWBFile.io parameter

Merge branch 'main' into add_read

c7ced4e

oruebel mentioned this pull request Sep 19, 2024

Update member variable names to avoid shadowing #102

Merged

17 tasks

oruebel added 2 commits September 20, 2024 13:50

Merge branch 'main' into add_read

dd01ad3

Fix formatting and docs error after merge with main

968fcf0

oruebel mentioned this pull request Sep 20, 2024

neurodata_type attribute not set correctly for subtypes of TimeSeries #108

Open

stephprince reviewed Sep 24, 2024

View reviewed changes

oruebel and others added 2 commits September 24, 2024 14:58

Update docs/pages/userdocs/read.dox

bfaabb8

Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

Update src/io/ReadIO.hpp

a439212

Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

oruebel commented Sep 24, 2024

View reviewed changes

src/io/ReadIO.hpp Outdated Show resolved Hide resolved

oruebel and others added 5 commits September 24, 2024 15:47

Update src/io/ReadIO.hpp

5f28fb0

Rename ReadDataWrapper.is_dataset

a9725b2

Update src/io/hdf5/HDF5IO.hpp

e70949c

Co-authored-by: Steph Prince <40640337+stephprince@users.noreply.github.com>

Remove duplicate code in NWBFile

d1da5d1

Remove outdated ToDo item

12a86a5

oruebel added 2 commits October 22, 2024 10:53

Merge branch 'main' into add_read

ae66f33

Fix build error after merge

69df3e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data read #85

Add data read #85

oruebel commented Aug 31, 2024 •

edited

Loading

oruebel commented Sep 1, 2024 •

edited

Loading

oruebel commented Sep 1, 2024

stephprince left a comment

stephprince Sep 24, 2024

oruebel Sep 24, 2024 •

edited

Loading

stephprince Sep 27, 2024

oruebel Sep 27, 2024

stephprince Sep 24, 2024

oruebel Sep 24, 2024

stephprince Sep 24, 2024

oruebel Sep 24, 2024

stephprince Sep 24, 2024

oruebel Sep 24, 2024 •

edited

Loading

stephprince Sep 24, 2024

stephprince Sep 24, 2024

oruebel commented Sep 24, 2024 •

edited

Loading

oruebel commented Sep 25, 2024

oruebel commented Oct 22, 2024 •

edited

Loading

oruebel commented Oct 23, 2024

	/**
	* @brief Base unit of measurement for working with the data. Actual stored
	* values are not necessarily stored in these units. To access the data in
	* these units, multiply ‘data’ by ‘conversion’ and add ‘offset’.
	*/
	std::string unit;

	/**
	* @brief The description of the TimeSeries.
	*/
	std::string description;

	/**
	* @brief Human-readable comments about the TimeSeries.
	*/
	std::string comments;

	/**
	* @brief Size used in dataset creation. Can be expanded when writing if
	* needed.
	*/
	SizeArray dsetSize;

	/**
	* @brief Chunking size used in dataset creation.
	*/
	SizeArray chunkSize;

	/**
	* @brief Scalar to multiply each element in data to convert it to the
	* specified ‘unit’.
	*/
	float conversion;

	/**
	* @brief Smallest meaningful difference between values in data, stored in the
	* specified by unit.
	*/
	float resolution;

	/**
	* @brief Scalar to add to the data after scaling by ‘conversion’ to finalize
	* its coercion to the specified ‘unit’.
	*/
	float offset;

	/**
	* @brief The starting time of the TimeSeries.
	*/
	float startingTime = 0.0;

Add data read #85

Are you sure you want to change the base?

Add data read #85

Conversation

oruebel commented Aug 31, 2024 • edited Loading

2. Proposed Implementation for reading data arrays

BaseReadData

BaseIO

HDF5IO

Container

NWB types: TimeSeries, ElectricalSeries etc.

3. Proposed implementation for reading whole Containers (e.g., to read an ElectricalSeries)

Step 1: Define the Template Factory Method in Container

Step 2: Implement the constructors on the specific Container classes (e.g., TimeSeries)

4. Proposed implementation for reading untyped groups (e.g., /acquisition)

TODO

Next steps

oruebel commented Sep 1, 2024 • edited Loading

oruebel commented Sep 1, 2024

stephprince left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oruebel Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oruebel Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oruebel commented Sep 24, 2024 • edited Loading

oruebel commented Sep 25, 2024

oruebel commented Oct 22, 2024 • edited Loading

oruebel commented Oct 23, 2024

oruebel commented Aug 31, 2024 •

edited

Loading

`BaseReadData`

`BaseIO`

`HDF5IO`

`Container`

NWB types: `TimeSeries`, `ElectricalSeries` etc.

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Step 1: Define the Template Factory Method in `Container`

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

oruebel commented Sep 1, 2024 •

edited

Loading

oruebel Sep 24, 2024 •

edited

Loading

oruebel Sep 24, 2024 •

edited

Loading

oruebel commented Sep 24, 2024 •

edited

Loading

oruebel commented Oct 22, 2024 •

edited

Loading