A Collection of Tools to Help Shape the Digital Future of the Manx Language

Speech Transcription Data

This page brings together information and links related to Manx speech transcription resources, with a particular emphasis on open evaluation sets and research-ready data. The goal is to support the development of speech technologies for Manx, including automatic speech recognition (ASR), text-to-speech (TTS), and machine translation (MT).

None of the original audio files will be uploaded here due to file size constraints and the potential for copyright infringement or restrictive licensing conditions. Where the original recordings are publicly available, a link to the source will be provided alongside the transcriptions, usually in a metadata.tsv file.


Why Build Manx Speech Datasets?

Manx is a low-resource language, making the development of reliable speech and language technologies difficult without carefully curated data. Publicly available transcription datasets help by:


Loayr is the first segmented speech corpus for Manx, offering manually validated and automatically segmented transcriptions across a range of domains. It supports robust evaluation of ASR, and is structured into training, development, and test sets with consistent metadata and formatting.

For detailed information, data format, statistics, and experimental results, visit the Loayr repository.


Contributing or Requesting Data

Due to size and licensing restrictions, this repository does not host the audio files directly. For data access or to contribute new speech recordings, please contact:

📧 csjbartley1@sheffield.ac.uk

We welcome:


Licensing and Attribution

All data referenced in this project has been sourced from publicly available materials and is distributed in accordance with the original creators’ licensing terms. For detailed attribution, consult the metadata files in each dataset.