IncluSet: A data surfacing repository for accessibility datasets. (Summary)
Kacorri, H., Dwivedi, U., Amancherla, S., Jha, M., & Chanduka, R. (2020). IncluSet: A data surfacing repository for accessibility datasets. In T. Guerreiro, H. Nicolau, & K. Moffatt (Eds.), ASSETS ’20: The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (pp. 1-4, No. 72). New York: ACM. DOI: https://doi.org/10.1145/3373625.3418026 PMCID: PMC8375514
Data is at the heart of technologies that use artificial intelligence (AI) and machine learning. Because data is used to train models and define the variability of real-world phenomena, not including data generated by people who represent the full range of human diversity can lead to AI and machine learning models that see and understand some aspects of the real world and do not see or understand others.
One challenge is the scarcity of large datasets generated by people with disabilities and older adults that can be used for AI-infused technologies. There are a number of reasons for this scarcity: smaller populations of users, wide variability of user characteristics even within one disability group, lack of expertise for data annotation, as well as privacy concerns. Even when data are collected and are publicly available, it is often difficult for researchers and developers of new technologies to locate them. It is important to make the datasets that do exist more visible and to facilitate dataset sharing.
This paper describes the creation of IncluSet, an accessibility data sharing repository, where researchers and developers can discover and link to datasets that include data generated by people with disabilities and older adults and that can be used for machine learning models.
Launched in July 2020, IncluSet stores metadata about the datasets, rather than the datasets themselves. Each dataset included has an image, a title, dataset creators, a short description of the dataset, the year it was created/released, the number of people who contributed to it, one or more disability categories describing the population of interest, and a list of data types. Users can reach the dataset through the direct link when it is available for download, read the paper where the data is described, or contact the dataset creators if contact information is available.
The repository is pre-populated with a total of 139 existing accessibility datasets that were manually located between 2018 and 2020. In addition to searching the datasets listed in IncluSet, researchers can link to their own datasets or can submit information about a dataset they find, helping surface datasets produced by others. Submitted datasets are reviewed by IncluSet moderators for completeness.
Update for 2022: IncluSet now includes 191 datasets. Researchers both from academia and industry have started using IncluSet to list their datasets either directly by creating an account or by reaching out to us and pointing us to their datasets (e.g., the ORBIT dataset by Microsoft Research is now included in IncluSet). As IncluSet is used by researchers including those in industry, it is increasing the availability of data that is generated by people with disabilities and older adults for the purpose of accessibility research and engineering.
Learn more about IncluSet and other publications and presentations related to this work on the Inclusive AI project page. This research project is funded by Inclusive Information and Communications Technology RERC (90REGE0008) from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), Administration for Community Living (ACL), Department of Health and Human Services (HHS). Learn more about the work of the Inclusive ICT RERC.