MONAI Datasets for TCIA #4227

AHarouni · 2022-05-04T19:20:22Z

AHarouni
May 4, 2022

I have seen code to download decathlon dataset. Great work! Similarly, I would like to do the same for The Cancer Imaging Archive TCIA. Me and Justin Kirby created a colab notebook to walk users through the process. The main steps are:

Using rest api calls to find all datasets which have segmentation data as Dicom Seg or RT struct.
Pick a dataset, download it after accepting user agreement
open dicom tags to find referening orignal dciom series
Download dicom imaging series.
Convert dicom images and dicom seg or dicom RT to nifti

The notebook is a bit long for a use who just want to get data and start training. What would be the steps to have an monai core api call to do it.

Nic-Ma · 2022-05-05T00:19:32Z

Nic-Ma
May 5, 2022
Maintainer

Hi @AHarouni ,

Thanks for your notebook.
MONAI can support DICOM images directly, so I think maybe you don't need to convert them to NIfTI format?

Thanks.

2 replies

AHarouni May 5, 2022
Author

Hi @Nic-Ma
I knew MONAI supports dicom images. Does it also support dicom SEG and DICOM RT ?

Could you also comment on the performance when reading multiple dicom images VS single Nifti file.
We can easily ignore the converting step if it performance is the same.

Nic-Ma May 5, 2022
Maintainer

Hi @AHarouni ,

OK, I didn't test dicom SEG and DICOM RT before, maybe you can have a try first?

Thanks.

wyli · 2022-05-05T05:24:03Z

wyli
May 5, 2022
Collaborator

thanks let's discuss at Friday's meeting. (perhaps a relevant inference use case #4136)

0 replies

kirbyju · 2022-05-09T20:11:40Z

kirbyju
May 9, 2022
Collaborator

Please use the following notebook instead of the one originally posted: https://colab.research.google.com/drive/18Xh2WPEzQP7Ly-TvuKOyXQG5E79-3iU1?usp=sharing. The code is mostly the same, but things are slightly reorganized and presented in a clearer way with additional description of what's going on in the code.

0 replies

Nic-Ma · 2022-05-17T15:55:04Z

Nic-Ma
May 17, 2022
Maintainer

Hi @AHarouni ,

I think this can be a very useful enhancement for MONAI, could you please help submit a pull request for the TCIADataset?

Thanks in advance.

0 replies

wyli · 2022-05-23T17:46:23Z

wyli
May 23, 2022
Collaborator

Hi @kirbyju,

@AHarouni and I were trying the rest APIs, we realised we could fetch the images with queries built with pure python -- without any command-line tools and other user interactions. Do you think this is the correct way to use the TCIA rest API? we try to wrap this as a monai module to allow for easy pythonic downloading for the users:

# imports for TCIA API calls (in case you didn't run the imports in the optional section above)

import os
import requests
from monai.apps import download_and_extract

baseurl = "https://services.cancerimagingarchive.net/nbia-api/services/v1/"


def restCall(url, itemName):
    response = requests.get(baseurl + url)
    if len(response.text) == 0:  # some calls returns empty response
        return []
    # some calls return empty dict items
    # retList = set(item[itemName] for item in response.json())
    retList = []
    for d in response.json():
        if itemName in d:
            retList.append(d[itemName])
    return retList


pickedCollection, pickedModality = "C4KC-KiTS", "SEG"  # works correcly
seriesLst = restCall("getSeries?Collection=" + pickedCollection + "&Modality=" + pickedModality, "SeriesInstanceUID")

# print the total number of segs in the manifest
print(len(seriesLst))

## reduce to the first 5 segs for demo purposes
seriesLst2Download = seriesLst[:5]


data_dir = os.path.join("./", f"{pickedCollection}-{pickedModality}")
for series in seriesLst2Download:
    url = baseurl + "getImage?SeriesInstanceUID=" + series
    subfolder = os.path.join(data_dir, series)
    download_and_extract(url=url, filepath=data_dir + ".zip", output_dir=subfolder)

0 replies

kirbyju · 2022-05-23T18:08:14Z

kirbyju
May 23, 2022
Collaborator

Sure, you can do it that way if you like. I used the command line utility because it takes care of organizing and naming the files/directories in a useful way and provides auto-retry for downloads that fail in the middle without the need to write a bunch of additional code. However, if you're trying to keep from relying on any external tools/dependencies I can see why you might want to write your own solution in python. It's probably also worth noting that there is another option for downloading full series via the REST API that also includes a spreadsheet in the zip file which lets one check the md5 hash of each image: https://services.cancerimagingarchive.net/nbia-api/services/v1/getImageWithMD5Hash?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.6919.4624.313514201353787659031503464798 Justin Kirby (contractor) Technical Project Manager, Frederick National Laboratory for Cancer Research Technical Director, Cancer Imaging Informatics Lab ORCiD: https://orcid.org/0000-0003-3487-8922 240-276-6016 ***@***.******@***.***>

…

________________________________ From: Wenqi Li ***@***.***> Sent: Monday, May 23, 2022 1:46 PM To: Project-MONAI/MONAI ***@***.***> Cc: Kirby, Justin (NIH/NCI) [C] ***@***.***>; Mention ***@***.***> Subject: [EXTERNAL] Re: [Project-MONAI/MONAI] MONAI Datasets for TCIA (Discussion #4227) Hi @kirbyju<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkirbyju&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0TzdUa3%2FCvT6yei4ghk4%2B23mmEBKNMSS09u%2FLhqLvbU%3D&reserved=0>, @AHarouni<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAHarouni&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vliJXSszd1lgeUomuOQ%2BxO75e66F305wGKmQfHc1%2Fy4%3D&reserved=0> and I were trying the rest APIs, we realised we could fetch the images with queries built with pure python -- without any command-line tools and other user interactions. Do you think this is the correct way to use the TCIA rest API? we try to wrap this as a monai module to allow for easy pythonic downloading for the users: # imports for TCIA API calls (in case you didn't run the imports in the optional section above) import os import requests from monai.apps import download_and_extract baseurl = "https://services.cancerimagingarchive.net/nbia-api/services/v1/" def restCall(url, itemName): response = requests.get(baseurl + url) if len(response.text) == 0: # some calls returns empty response return [] # some calls return empty dict items # retList = set(item[itemName] for item in response.json()) retList = [] for d in response.json(): if itemName in d: retList.append(d[itemName]) return retList pickedCollection, pickedModality = "C4KC-KiTS", "SEG" # works correcly seriesLst = restCall("getSeries?Collection=" + pickedCollection + "&Modality=" + pickedModality, "SeriesInstanceUID") # print the total number of segs in the manifest print(len(seriesLst)) ## reduce to the first 5 segs for demo purposes seriesLst2Download = seriesLst[:5] data_dir = os.path.join("./", f"{pickedCollection}-{pickedModality}") for series in seriesLst2Download: url = baseurl + "getImage?SeriesInstanceUID=" + series subfolder = os.path.join(data_dir, series) download_and_extract(url=url, filepath=data_dir + ".zip", output_dir=subfolder) — Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProject-MONAI%2FMONAI%2Fdiscussions%2F4227%23discussioncomment-2806340&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7VBTljcZxxve9E0pC92Uc5Dp07WqEOw00C7l6NigB2E%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAASE6BY4HUMJLMTC7WWAPITVLO77ZANCNFSM5VC5RZNQ&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HefPMr25612Nh%2BIGeziRqtJAXuQrPjiBwKkA%2Bp3orsg%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

2 replies

wyli May 24, 2022
Collaborator

thanks for confirming!

AHarouni Jun 2, 2022
Author

Hi @wyli
As you requested, I have updated the colab notebook to use the API you pointed to. Please check section 2.3 "Download Both (Seg/RT and Img) using native python code".

Please using this section to download the data. It will download the seg/ rtstruct and the referenced image in the same loop.
You would then continue the notebook to convert the dicom, dicomseg, rt struct to nifti using plastimatch

Let me know if you face any issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MONAI Datasets for TCIA #4227

{{title}}

Replies: 6 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

MONAI Datasets for TCIA #4227

AHarouni May 4, 2022

Replies: 6 comments · 4 replies

Nic-Ma May 5, 2022 Maintainer

AHarouni May 5, 2022 Author

Nic-Ma May 5, 2022 Maintainer

wyli May 5, 2022 Collaborator

kirbyju May 9, 2022 Collaborator

Nic-Ma May 17, 2022 Maintainer

wyli May 23, 2022 Collaborator

kirbyju May 23, 2022 Collaborator

wyli May 24, 2022 Collaborator

AHarouni Jun 2, 2022 Author

AHarouni
May 4, 2022

Replies: 6 comments 4 replies

Nic-Ma
May 5, 2022
Maintainer

AHarouni May 5, 2022
Author

Nic-Ma May 5, 2022
Maintainer

wyli
May 5, 2022
Collaborator

kirbyju
May 9, 2022
Collaborator

Nic-Ma
May 17, 2022
Maintainer

wyli
May 23, 2022
Collaborator

kirbyju
May 23, 2022
Collaborator

wyli May 24, 2022
Collaborator

AHarouni Jun 2, 2022
Author