23.1 C
Kathmandu
Tuesday, June 15, 2021

Currently in beta testing - RECAP Token. Interested? Join here!

Freely available COVID-19 medical imaging datasets to train deep learning algorithms

  • 47.53 RECAP
  • 80.03%

Must read

Israel’s Current PM Netanyahu poised to fall off power this time

Benjamin Netanyahu, the longest reigning prime minister of Israel is poised to lose his seat in the new government which is underway....

DishHome to provide free Internet to its subscribers

DishHome, the only Direct to Home (DTH) service provider of Nepal, has come with a package in which 25 MBPS internet service...

Nepali Idol 2 Winners release new music video

Nepal idol-2 winner Ravi Oad and singer Smita Dahal have come up with a music video for their song” Raja-Rani”.

Ex President Ram Baran Yadav says goverment traitor on Chure issue

Former president Ram Baran Yadav has termed the recent decision made by the government to export sand, gravel, rocks, pebbles, and other...

The world has been suffering from COVID-19 pandemic since the beginning of January last year. Today even after 17 months, the pandemic is the major concern for the world. Recently there seems to be light at the end of the tunnel after the development of vaccines at a record pace. The leading global health practitioners, doctors, global health executives, researchers have been advocating the importance of testing, contact tracing and isolating the infected individual to combat the pandemic. The gold standard for detecting Covid-19 is Reverse transcription Polymerase Chain Reaction (RT-PCR) tests. However, there has been substantial research showing that artificial intelligence powered medical imaging tools such as deep learning could detect Covid-19 from x-rays and Computed Tomography (CT) scans of the chest. Such machine learning models can perform better when there is larger diverse annotated dataset availability. The inadequate unavailability of a diverse annotated dataset has limited the performance and generalizability of existing deep learning models.

In this article, we explore different publicly available x-rays and CT scan datasets that can be used by the research community to develop tools to address COVID-19. The data is compiled from European Institute for Biomedical Imaging Research (EIBIR), Stanford University Center for Artificial Intelligence in Medicine & Imaging, Kamrul et al.  and from other publicly available sources. The publicly available dataset is listed in the table below and is described in brief in the following paragraphs.

Name/CompilerSize and ModalityCountry
British Society of Thoracic Imaging59 patients X-raysUK
AIforCOVID imaging983 X-raysItaly
COVID-19 Open Initiative16352 X-rays and 201103 CT slices, & 12943 ultrasound imagesGlobal
Radiopaedia 101 X-rays and CT patients Global
Eurorad database50 X-rays and CT patients Global
BIMCV-COVID19+ Dataset2265 X-rays and 163 CT volumesSpain
Società Italiana di Radiologia Medica 68 X-rays and CT patients Italy
Cohen at al.931 X-rays and 20 CT volumesGlobal
MosMed COVID-19 Chest CT 110 CT volumesRussia
Zhao et al.349 CT slices Global
Coronacases.org10 CT patients China
medicalsegmentation.com100 CT slices and 9 CT volumesGlobal
Ma Jun et al.20 CT patients Global
Chest CT COVID+ (MIDRC-RICORD-1a)120 CT patients Global
Zhang et al. 90 CT volumesChina
Soares, Eduardo et al.2482 CT slices Brazil
Yang et al. 812 CT slicesChina
iCTCF 256356 CT slicesChina

British Society of Thoracic Imaging

The database is collected by British Society of Thoracic Imaging (BSTI). The database consists of medical imaging data of 59 patients from the UK. There is also clinical data including PCR results available. The data available online is free to view and use for educational purposes. 

AIforCOVID imaging

AIforCOVID imaging dataset is obtained from CDI Centro Diagnostico Italiano and Bracco Imaging (Milan). The dataset hosts 983 DICOM Chest x-rays of Covid-19 patients from Italy and other related clinical data. The data can be downloaded, and used for commercial, scientific and educational purposes after registering on the website.

COVID-19 Open Initiative

The data is compiled by Darwin AI Corp., Canada, Vision and Image Processing Research Group, University of Waterloo, Canada, and others. The data is collected from various publicly available datasets like Cohen at al., MIDRC-RICORD-1a, RSNA pneumonia Kaggle, Covid-19 radiography database Kaggle and so on. As of March 19, 2021 that had the latest data update, there are 16352 images with 2358 positive COVID-19 cases. The data download and project information is provided on the github page of COVID-NET. The data is free to use for research and educational purposes.

Radiopaedia 

The dataset is compiled by the global team of radiologists and other health professionals and available at the web domain radiopaedia.org. The dataset contains axial chest CTs of 101 patients from all around the world. There is clinical data and PCR test results for some patients. The license for data use is provided under a modified creative common license.

Eurorad database

The dataset is collected by Eurorad, which is a peer-reviewed education tool of the European Society of Radiology. This dataset contains chest X-rays and CT scans of 50 COVID-19 patients from all around the world. The format is JPG/PDF and clinical information with PCR status is provided. The data is licensed under Creative Commons Attribution Noncommercial Share-Alike 4.0 (CC BY-NC-SA 4.0).

BIMCV-COVID19+ Dataset

The data is compiled by BIMCV Medical Imaging Databank of the Valencia Region, Antonio Pertusa & Maria de la Iglesia Vaya. The dataset contains the 2265 images and 163 CT studies of COVID-19 patients along with their radiographic findings. The dataset is free to use for research purposes.

Società Italiana di Radiologia Medica(SIRM)

The dataset is collected from Italian Society of Radiology. The database hosts axial CT images of 68 COVID-19 patients from Italy. The format is JPG and the data can be used for non-commercial research.

Cohen at al.

The dataset is collected by Joseph Paul Cohen (Université de Montréal, CA). As of today, the dataset contains  931 images from 461 patients and 20 CT volumes  which are diagnosed to be different diseases like bacterial pneumonia, viral pneumonia, COVID-19, fungal, SARS, and so on.The dataset is compiled from different sources, including Eurorad, Radiopaedia, SIRM and various publications. The format available is  JPG and NIfTI and available to download at the github repository. The data is free to use for non-commercial purposes. 

MosMed COVID-19 Chest CT 

MosMedData is collected from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department (MosMed). The dataset contains CT images of 110 patients diagnosed with COVID-19 from Russia. The data is available in Neuroimaging Informatics Technology Initiative (NifTI) format. The license for the data use is governed by Creative Commons Attribution Noncommercial No Derivatives 4.0 (CC BY-NC-SD 4.0)

Zhao et al.

The dataset is collected by Jinyu Zhao, Yichen Zhang, Xuehai He, Pengtao Xie, who are associated  with University of California San Diego, US. The dataset has 349 CT images from 216 patients that have been collected from preprint articles. The image format is PNG, and has medical history and PCR status for some cases. The data is available to download in a github link and is free to use for non-commercial research. 

Coronacases.org

The dataset is collected from Radiology and Artificial Intelligence One-Step Shop (RAIOSS) and Livon Saúde (Brazil), Rodrigo Caruso Chate (Hospital Israelita Albert Einstein, Brazil). The dataset contains CT images of 10 patients with COVID-19 obtained from Wenzhou Medical University, China. The data is available online and available to use for everyone.

Medicalsegmentation.com

Dataset I 

The dataset is compiled by Håvard Bjørke Jenssen (University Hospital of Oslo, NO) and available on medicalsegmentation.com. The dataset contains 100 axial CT images from more than 40 Italian patients with COVID-19 that were converted from the JPG images from the Italian Society of Radiology to NIfTI format. There is availability of Clinical information including PCR Status for some cases. The data is free to use for non-commercial purposes.

Dataset II

The dataset is compiled by Håvard Bjørke Jenssen (University Hospital of Oslo, NO) and available on medicalsegmentation.com. The dataset contains segmented axial volumetric CTs of 9 patients from all around the world obtained from Radiopedia.There is availability of Clinical information including PCR Status for some cases. The data usage is licensed under modified creative commons license.

Ma Jun et al.

The dataset is collected by Ma Jun (Nanjing University of Science and Technology, China) et al. The dataset contains labeled COVID-19 CT scans of 20 patients globally. The data format is NIfTI. The data usage is licensed under Creative Commons Attribution Noncommercial Share-Alike 2.0 (CC BY-NC-SA 2.0)

Chest CT COVID+ (MIDRC-RICORD-1a)

The dataset is collected by the Radiological Society of North America (RSNA) and Society of Thoracic Radiology (STR). The dataset called RSNA International COVID-19 Open Radiology Database (RICORD) contains 120 thoracic CT scans obtained globally and has detailed segmentation as diagnostic labels.The image format is DICOM and is available to download at The Cancer Imaging Archive. The data is free to use for non-commercial purposes.

Zhang et al. 

The dataset is compiled by China China Consortium of Chest CT Image Investigation (CC-CCII). The dataset contains 90 CT volumes and some segmentation masks of background, lung field, ground-glass opacity (GGO), and consolidation.The images are classified into COVID-19, common pneumonia and normal. The dataset is available to download at China National Center for Bioinformation website and free to use for research purposes. 

Soares, Eduardo et al.

The dataset is compiled by Soares, Eduardo et al. which contains 2482 CT slices obtained from the hospital in Sao Paulo, Brazil. The 1252 CT scans are positive for COVID-19 infection and 1230 CT scans are negative for the virus. The dataset is available to download for free on the kaggle website

Yang et al. 

The dataset is collected from 760 prepreints about COVID-19 from medRxiv and bioRxiv, posted from January 19th – March 25, 2020. The images are extracted from the pdf files of prepreints. The dataset contains 349 COVID-19 CT images and 463 Non COVID-19 CT images. The data is available to download at the github repository and is free to use for research purposes.

iCTCF 

The dataset is collected from Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. The dataset contains 256356 chest CT images and some clinical features from 1170 patients. The dataset is available to download on the iCTCF website at http://ictcf.biocuckoo.cn/ and is available under a CC BY-NC 4.0 license.

Fig. The image taken from Radiopaedia website shows the chest X-rays of the patient diagnosed with positive COVID-19 PCR test. The findings to that of pneumonia are seen on chest radiographs and CT scans of COVID-19 diagnosed patients. Such abnormalities seen in the Chest x-rays can be detected by a deep learning based algorithm. However, for the patients in the early course of disease, the chest radiographs are seen as normal in most cases. Moreover, deep learning algorithms are notorious with their limited generalization ability. Nevertheless, the chest radiographs findings can be vital in detecting the progression of the disease.

References : 

  1. https://www.eibir.org/covid-19-imaging-datasets/
  2. https://aimi.stanford.edu/resources/covid19
  3. https://www.medrxiv.org/content/10.1101/2020.11.07.20227504v1.full.pdf
  4. https://bit.ly/BSTICovid19_Teaching_Library
  5. https://aiforcovid.radiomica.it/
  6. https://medium.com/@sheldon.fernandez/covid-net-an-open-source-neural-network-for-covid-19-detection-48b8a55e6d44
  7. https://radiopaedia.org/articles/covid-19-3
  8. https://www.eurorad.org/advanced-search?search=COVID
  9. https://osf.io/nh7g8/
  10. https://www.sirm.org/en/category/articles/covid-19-database/
  11. https://github.com/ieee8023/covid-chestxray-dataset
  12. https://mosmed.ai/datasets/covid19_1110
  13. https://github.com/UCSD-AI4H/COVID-CT
  14. https://coronacases.org/
  15. http://medicalsegmentation.com/covid19/
  16. https://zenodo.org/record/3757476
  17. https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742
  18. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196900/
  19. https://doi.org/10.1101/2020.04.24.20078584
  20. https://arxiv.org/abs/2003.13865
  21. https://www.researchsquare.com/article/rs-21834/v1
  22. https://josephpcohen.com/
  23. https://www.cancerimagingarchive.net/
  24. https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data
  25. https://www.kaggle.com/tawsifurrahman/covid19-radiography-database
  26. https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md
  27. http://ncov-ai.big.ac.cn/download?lang=en
  28. https://www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset
  29. https://github.com/UCSD-AI4H/COVID-CT
  30. http://ictcf.biocuckoo.cn/
  31. https://creativecommons.org/licenses/by-nc/4.0/
  32. https://alexswong.github.io/COVID-Net/
- Advertisement -

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest article

Israel’s Current PM Netanyahu poised to fall off power this time

Benjamin Netanyahu, the longest reigning prime minister of Israel is poised to lose his seat in the new government which is underway....

DishHome to provide free Internet to its subscribers

DishHome, the only Direct to Home (DTH) service provider of Nepal, has come with a package in which 25 MBPS internet service...

Nepali Idol 2 Winners release new music video

Nepal idol-2 winner Ravi Oad and singer Smita Dahal have come up with a music video for their song” Raja-Rani”.

Ex President Ram Baran Yadav says goverment traitor on Chure issue

Former president Ram Baran Yadav has termed the recent decision made by the government to export sand, gravel, rocks, pebbles, and other...

kalks Recipe Series – Chocolate Brownie

Name of the dish: Chocolate brownie A chocolate brownie is a cake that is firm on the outside and...