Expressive Anechoic Recordings of Speech (EARS) https://github.com/facebookresearch/ears_dataset
Find a file
2024-07-17 23:28:46 +02:00
.gitignore completed download script + checksums 2024-07-17 23:28:46 +02:00
audio_files.sha512 completed download script + checksums 2024-07-17 23:28:46 +02:00
CODE_OF_CONDUCT.md Initial commit 2024-06-09 15:07:50 -04:00
CONTRIBUTING.md Initial commit 2024-06-09 15:07:50 -04:00
db_archive.sha512 completed download script + checksums 2024-07-17 23:28:46 +02:00
download.sh completed download script + checksums 2024-07-17 23:28:46 +02:00
download_blind_testset.py added download scripts 2024-06-09 15:48:49 -04:00
download_ears.py added download scripts 2024-06-09 15:48:49 -04:00
LICENSE Initial commit 2024-06-09 15:07:50 -04:00
README.md Update README.md 2024-06-25 10:07:25 -04:00
speaker_statistics.json Update speaker_statistics.json 2024-06-24 16:01:42 -04:00
transcripts.json added statistics 2024-06-11 15:35:47 -04:00

EARS Dataset

We release the Expressive Anechoic Recordings of Speech (EARS) dataset.

If you use the dataset or any derivative of it, please cite our Paper

@inproceedings{richter2024ears,
  title={{EARS}: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation},
  author={Richter, Julius and Wu, Yi-Chiao and Krenn, Steven and Welker, Simon and Lay, Bunlong and Watanabe, Shinjii and Richard, Alexander and Gerkmann, Timo},
  booktitle={Interspeech},
  year={2024}
}

For audio samples or scripts to generate the speech enhancement benchmarks, please visit the project page.

Highlights

  • 100 h of speech data from 107 speakers
  • high-quality recordings at 48 kHz in an anechoic chamber
  • high speaker diversity with speakers from different ethnicities and age range from 18 to 75 years
  • full dynamic range of human speech, ranging from whispering to yelling
  • 18 minutes of freeform monologues per speaker
  • sentence reading in 7 different reading styles (regular, loud, whisper, high pitch, low pitch, fast, slow)
  • emotional reading and freeform tasks covering 22 different emotions for each speaker

Download EARS Dataset

using bash

for X in $(seq -w 001 107); do
  curl -L https://github.com/facebookresearch/ears_dataset/releases/download/dataset/p${X}.zip -o p${X}.zip
  unzip p${X}.zip
  rm p${X}.zip
done

using python

run the EARS download script

python download_ears.py

Download Blind Testset with Noisy Speech

using bash

curl -L https://github.com/facebookresearch/ears_dataset/releases/download/blind_testset/blind_testset.zip -o blind_testset.zip
mkdir blind_testset
unzip blind_testset.zip -d blind_testset
rm blind_testset.zip

using python

run the blind testset download script

python download_blind_testset.py

Statistics and Transcripts

The speaker statistics (age, ethnicity, gender, weight, height, native language) for the 107 speakers are collected in speaker_statistics.json.

Transcripts of the reading portions of the dataset are available in transcripts.json.

License

The code and dataset are released under CC-NC 4.0 International license.