Replies: 6 comments 4 replies
-
Hi @AHarouni , Thanks for your notebook. Thanks. |
Beta Was this translation helpful? Give feedback.
-
thanks let's discuss at Friday's meeting. (perhaps a relevant inference use case #4136) |
Beta Was this translation helpful? Give feedback.
-
Please use the following notebook instead of the one originally posted: https://colab.research.google.com/drive/18Xh2WPEzQP7Ly-TvuKOyXQG5E79-3iU1?usp=sharing. The code is mostly the same, but things are slightly reorganized and presented in a clearer way with additional description of what's going on in the code. |
Beta Was this translation helpful? Give feedback.
-
Hi @AHarouni , I think this can be a very useful enhancement for MONAI, could you please help submit a pull request for the Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
Hi @kirbyju, @AHarouni and I were trying the rest APIs, we realised we could fetch the images with queries built with pure python -- without any command-line tools and other user interactions. Do you think this is the correct way to use the TCIA rest API? we try to wrap this as a monai module to allow for easy pythonic downloading for the users: # imports for TCIA API calls (in case you didn't run the imports in the optional section above)
import os
import requests
from monai.apps import download_and_extract
baseurl = "https://services.cancerimagingarchive.net/nbia-api/services/v1/"
def restCall(url, itemName):
response = requests.get(baseurl + url)
if len(response.text) == 0: # some calls returns empty response
return []
# some calls return empty dict items
# retList = set(item[itemName] for item in response.json())
retList = []
for d in response.json():
if itemName in d:
retList.append(d[itemName])
return retList
pickedCollection, pickedModality = "C4KC-KiTS", "SEG" # works correcly
seriesLst = restCall("getSeries?Collection=" + pickedCollection + "&Modality=" + pickedModality, "SeriesInstanceUID")
# print the total number of segs in the manifest
print(len(seriesLst))
## reduce to the first 5 segs for demo purposes
seriesLst2Download = seriesLst[:5]
data_dir = os.path.join("./", f"{pickedCollection}-{pickedModality}")
for series in seriesLst2Download:
url = baseurl + "getImage?SeriesInstanceUID=" + series
subfolder = os.path.join(data_dir, series)
download_and_extract(url=url, filepath=data_dir + ".zip", output_dir=subfolder) |
Beta Was this translation helpful? Give feedback.
-
Sure, you can do it that way if you like. I used the command line utility because it takes care of organizing and naming the files/directories in a useful way and provides auto-retry for downloads that fail in the middle without the need to write a bunch of additional code. However, if you're trying to keep from relying on any external tools/dependencies I can see why you might want to write your own solution in python.
It's probably also worth noting that there is another option for downloading full series via the REST API that also includes a spreadsheet in the zip file which lets one check the md5 hash of each image:
https://services.cancerimagingarchive.net/nbia-api/services/v1/getImageWithMD5Hash?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.6919.4624.313514201353787659031503464798
Justin Kirby (contractor)
Technical Project Manager, Frederick National Laboratory for Cancer Research
Technical Director, Cancer Imaging Informatics Lab
ORCiD: https://orcid.org/0000-0003-3487-8922
240-276-6016
***@***.******@***.***>
…________________________________
From: Wenqi Li ***@***.***>
Sent: Monday, May 23, 2022 1:46 PM
To: Project-MONAI/MONAI ***@***.***>
Cc: Kirby, Justin (NIH/NCI) [C] ***@***.***>; Mention ***@***.***>
Subject: [EXTERNAL] Re: [Project-MONAI/MONAI] MONAI Datasets for TCIA (Discussion #4227)
Hi @kirbyju<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkirbyju&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0TzdUa3%2FCvT6yei4ghk4%2B23mmEBKNMSS09u%2FLhqLvbU%3D&reserved=0>,
@AHarouni<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAHarouni&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vliJXSszd1lgeUomuOQ%2BxO75e66F305wGKmQfHc1%2Fy4%3D&reserved=0> and I were trying the rest APIs, we realised we could fetch the images with queries built with pure python -- without any command-line tools and other user interactions. Do you think this is the correct way to use the TCIA rest API? we try to wrap this as a monai module to allow for easy pythonic downloading for the users:
# imports for TCIA API calls (in case you didn't run the imports in the optional section above)
import os
import requests
from monai.apps import download_and_extract
baseurl = "https://services.cancerimagingarchive.net/nbia-api/services/v1/"
def restCall(url, itemName):
response = requests.get(baseurl + url)
if len(response.text) == 0: # some calls returns empty response
return []
# some calls return empty dict items
# retList = set(item[itemName] for item in response.json())
retList = []
for d in response.json():
if itemName in d:
retList.append(d[itemName])
return retList
pickedCollection, pickedModality = "C4KC-KiTS", "SEG" # works correcly
seriesLst = restCall("getSeries?Collection=" + pickedCollection + "&Modality=" + pickedModality, "SeriesInstanceUID")
# print the total number of segs in the manifest
print(len(seriesLst))
## reduce to the first 5 segs for demo purposes
seriesLst2Download = seriesLst[:5]
data_dir = os.path.join("./", f"{pickedCollection}-{pickedModality}")
for series in seriesLst2Download:
url = baseurl + "getImage?SeriesInstanceUID=" + series
subfolder = os.path.join(data_dir, series)
download_and_extract(url=url, filepath=data_dir + ".zip", output_dir=subfolder)
—
Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FProject-MONAI%2FMONAI%2Fdiscussions%2F4227%23discussioncomment-2806340&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7VBTljcZxxve9E0pC92Uc5Dp07WqEOw00C7l6NigB2E%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAASE6BY4HUMJLMTC7WWAPITVLO77ZANCNFSM5VC5RZNQ&data=05%7C01%7Ckirbyju%40mail.nih.gov%7C299bdb0256254930cda908da3ce4314e%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C637889248039186016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HefPMr25612Nh%2BIGeziRqtJAXuQrPjiBwKkA%2Bp3orsg%3D&reserved=0>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
|
Beta Was this translation helpful? Give feedback.
-
Hi All, @Nic-Ma, @wyli, @ericspod
I have seen code to download decathlon dataset. Great work! Similarly, I would like to do the same for The Cancer Imaging Archive TCIA. Me and Justin Kirby created a colab notebook to walk users through the process. The main steps are:
The notebook is a bit long for a use who just want to get data and start training. What would be the steps to have an monai core api call to do it.
Beta Was this translation helpful? Give feedback.
All reactions