
Figure: Examples of colon tissue images acquired with a multi-photon microscope
The dataset consists of a set of 44 samples of lesions obtained by colonoscopies and colectomies carried out between the years 2012 and 2017 at Digestive Service at Basurto University Hospital (OSI Bilbao Basurto). These are 23 malignant neoplasms (adenocarcinoma), 19 preneoplastic lesions (adenoma) and 2 hyperplasia, obtained from 23 men and 18 women. The samples were diagnosed by the Pathological Anatomy Department of OSI Bilbao Basurto and FFPE (Formalin-Fixed Paraffin-Embedded) blocks were stored in the Basque Biobank. Samples were processed after signing Informed Consent and following standard operation procedures. These samples were used in the PICCOLO project with the appropriate approval of the Ethics Committee of the Basque Country on August 22nd of 2019. Samples were sent to Florence and scanned using a MPM (multiphoton microscopy) between November 2019 and January 2020. The dataset is provided as a .zip file.
The dataset includes the two-photon fluorescence (TPF) images acquired with the multiphoton microscope and converted to PNG file format. The scanning of each sample results in several image tiles, each one corresponding to a tissue area of 511 x 511 µm2. The images are grouped into two classes: benign for images of tissue samples with hyperplasia or adenoma; and malignant for images of tissue samples with adenocarcinoma.
The resulting dataset is composed of 14,712 images of 1024 x 1024 px (in the ./images folder) and it is well balanced with 6,985 images from benign lesions (in the /benign subfolder) and 7,727 images from malignant lesions (in the /malignant subfolder).
The image files of each sample are located in a folder (within the /benign or /malignant subfolder) named as PICCOLO_PATIENT_ID_LESION_ID_MODALITY.. The files have the following naming convention: 
PICCOLO_PATIENT_ID_LESION_ID_MODALITY_SECTION_ID_TILE_ID.EXTENSION where
- PATIENT_ID = a different number per patient
- LESION_ID = a number that identifies the lesion, in general only one per patient, but there can two
- MODALITY = TPF (two-photon fluorescence)
- SECTION_ID = a number that identifies the scanned tissue section the image belongs to; in general, only one tissue section per sample has been scanned, but for a few samples several sections have been scanned
- TILE_ID = tile identifier in the format p00z000yNNxMM, where NN and MM are the y and x coordinates of the tile in the slide starting by y01x01 on the top-left corner of the slide
- EXTENSION = PNG
To access these images, it is necessary to fill out this form: https://forms.office.com/r/ycdFxGaMEE or to contact the Basque Biobank, solicitudes.biobancovasco@bioef.euswhich will inform you of the conditions for access.
Please cite reference when using this collection:
Terradillos E., Saratxaga C. L., Mattana S., Cicchi R., Pavone F. S., Andraka N. Glover B. J., Arbide del Rio N., Velasco J., Etxezarraga M.C., Picon A. Analysis on the characterization of multiphoton microscopy images for malignant neoplastic colon lesion detection under deep learning methods. 2021.
The use of the dataset is restricted for research and educational purposes and use for commercial purposes is forbidden without prior written permission.