QGdata

verify-tagPCOS-XAI Ultrasound:Real-World Training Dataset

Model ExplainabilityComputer Vision

$900

Sold 0
372.28MB

Data Identifier:D17527419222559798

Publish Time:2025/07/17

Data Description

File Information

This dataset simulates real-world medical data complexities to provide hands-on experience in data cleaning and preprocessing for AI in healthcare. Deliberately retains common data issues (common data issues) found in real clinical settings, serving as a practical sandbox for:

  • 🚮 Data Cleaning Mastery: Tackle duplicates, inconsistent resolutions, and naming heterogeneity.
  • ⚖️ Class Imbalance Solutions: Experiment with techniques like SMOTE or augmentation.
  • 🎯 Medical AI Readiness: Prepare raw clinical data for XAI (Explainable AI) models.

Content

11,784 ultrasound images intentionally uncleaned:

  • infected/: 6,784 images (PCOS-positive cases)
  • noninfected/: 5,000 images (Healthy ovaries)

🚩 Real-World Challenges Included:

  1. Duplicates: 1,956+ groups of identical images (intra-class & cross-class)
  2. Multi-Resolution Mix: Images from 255x247px to 984x848px
  3. Metadata Gaps: No clinical patient data (simulating HIPAA-restricted scenarios)
  4. Class Imbalance: 57.5% vs 42.5% distribution
  5. Noise Artifacts: Blurring, rotation variants, and naming inconsistencies

Sources & Methodology

  • Curated from: Retrospective ultrasound studies across 3 clinics (2018-2022)
  • Ethical Compliance: Patient identifiers removed, DICOM metadata stripped
  • Annotated: By radiology residents under consultant supervision

Inspiration

Born from the need to bridge the gap between:

  • 📊 Clean Tutorial Datasets (MNIST, CIFAR)
  • 🏥 Messy Real Clinical Data
    Use this to practice the unglamorous but critical 80% of ML work - data wrangling!

Potential Applications

  • 🚮 Data cleaning pipelines for medical imaging
  • 🔍 Duplicate detection algorithms
  • ⚖️ Benchmarking class-balancing techniques
  • 📏 Resolution standardization methods

Note to Practitioners

"The true test of an ML engineer isn't model architecture choice, but transforming messy data into trainable gold." - Use these imperfections as your training ground

Verification Report

The following data verification reports are provided by the seller:

data icon
PCOS-XAI Ultrasound:Real-World Training Dataset
$900
Sold 0
372.28MB
Apply Report