Designing a language specific data collection platform with gamified results and open data

Erin M. Buchanan

Outline

  • Linguistic Data Collection
  • ManyLanguages Community
  • The Platform
  • Test Case

The World’s Languages

Science’s Languages

ManyLanguages

  • A big team science community supported by the Psychological Science Accelerator
  • Our mission is to facilitate the connection between language science researchers to diversify the languages, participants, researchers, and projects represented in the language sciences.

ManyLanguages

The Platform

  • Open architecture / infrastructure for data collection and sharing
  • Dockerized platform that connects the participants, researchers, data collection, and sharing components

The Platform: Users

  • Participants:
    • Sign up and complete the studies
    • Review the results / feedback on your results
  • Researchers:
    • Launch studies
    • Review results / download the data
    • Export anonymized datasets

The Platform: Users

  • Administrators:
    • Review uploaded studies for adherence to platform requirements
    • Ensure data sharing anonymity
    • Assist in feedback programming

The Platform: Tech Specs

The Platform: Tech Specs

The Platform: Tech Specs

The Platform: Example Screens

The Platform: Example Screens

The Platform: Example Screens

The Platform: Example Screens

The Platform: Sustainability

  • The platform is “pay to close” - it’s free to post for researchers who will use JATOS + share data
  • Researchers can use other software and/or not share data by paying a small fee to post the study

The Platform: Test Case

The Platform: Test Case

  • 45 languages across the globe including multiple writing systems
  • Collect priming data to “finish” languages based on previous power estimates
  • Collective subjective data: age of acquisition, familiarity, valence, arousal, concreteness, and imageability

Final Thoughts

  • Linguistics is ripe for studies in this area to increase the breadth of language, researcher, and participants included
  • Digital infrastructure can advance open, global, and interdisciplinary science
  • Citizen science to engage the public given the increase in interest in language due to AI / LLMS

Final Thoughts

  • Go visit Addie’s poster to learn more!
  • Join us for the study:
    • Authorship on three research papers as part of dissertation
    • Other side projects expected
    • Funding is available for low resource languages
  • Should start in 2026, continue data collection throughout the year based on translation / lab timelines