Publicly Available Datasets Sources
A concise overview of various datasets and repositories across different domains, including government, finance, healthcare, NLP, and computer vision, along with domain-specific repositories like AWS.
Popular Dataset Repositories:
Google Dataset Search: A search engine to find datasets across the web.
Kaggle: A platform for data science competitions with a vast collection of datasets.
UCI Machine Learning Repository: A collection of datasets for machine learning research.
AWS Public Datasets: A repository of datasets hosted on Amazon Web Services.
Datasets for Specific Domains:
Computer Vision: ImageNet, CIFAR-10, MNIST
Natural Language Processing: Wikipedia, Common Crawl, Gutenberg Corpus
Healthcare: MIMIC-III, PhysioNet
Finance: Yahoo Finance, Quandl
Other Resources:
Papers With Code: A website that links research papers with their corresponding code and datasets.
Awesome Public Datasets: A curated list of datasets on GitHub.
Accessing Datasets in Colab:
You can access these datasets in Colab using various methods such as:
Downloading: Download the dataset directly from the source and upload it to your Colab environment.
Mounting Google Drive: Mount your Google Drive to Colab and access datasets stored there.
Using APIs: Many platforms provide APIs to access their datasets directly within Colab.
Using Libraries: Some libraries, like TensorFlow Datasets, provide pre-built functions to load popular datasets.
Last updated