开放数据-标贝科技

TTS Voice dataset

TTS TN dataset

Chinese Mandarin Female Corpus(10,000 Sentences)

Non-commercial use only. Any problem, please contact us: Fbd-data@data-baker.com

Speech synthesis is a technology that generates artificial speech by mechanical and electronic means, and TTS technology (also known as text-to-speech conversion technology) is part of speech synthesis, which is a technology that transforms textual information generated by a computer itself or by external input into an understandable and fluent spoken output.

TTS speech synthesis technology is one of the key technologies to realize human-computer speech communication. Computers with human-like speaking ability are an important competitive market for the information industry in this era. Compared with speech recognition ASR, the technology of speech synthesis is relatively more mature and is a technology with a wider range of applications.

With the rapid development of artificial intelligence industry, speech synthesis system has been more widely used. In addition to the initial clarity and intelligibility of speech synthesis, people have higher and higher requirements for the naturalness, rhythm and sound quality of speech synthesis. The quality of speech library is also a key factor to determine the effect of speech synthesis.

[Chinese Mandarin Female Corpus] The voices of contributors are gentle and warm in standard mandarin, delivering positive feelings to listeners. All resources are recorded by professional equipment in professional and unchanged studio, with the SNR less than 35dB; mono recording with 48KHz 16-bit sampling frequency in pcm or wav format.
Our corpus is sourced from a variety of data types, involving news, novel, sci-tech, entertainment, and dialogue. The design of our corpus is based on comprehensive linguistic data, as part of our efforts to cover all syllabic consonants, types, tones, links, and prosody. We also work to annotate prosodic hierarchy

Reserch

Navigation

Smart Technology

Education

Technical Parameters

Content:
Chinese Mandarin Female Database
Source:
A comprehensive corpus covers syllabic consonants, types, tones, links, and prosody.
Time:
12 hours
Average word count:
16 words
Language:
Mandarin
Speaker:
Female, 20-30, elegant and optimistic voice
Environment:
Professional recording studio: 1) In line with professional standards 2)unchanged recording environment and equipment 3) SNR less than 35dB
Equipment:
Professional recording equipment and software
Format:
Uncompressed pcm or wav format, sampling rate of 48Hz, 16bit
Tagging content:
Sound-word proofreading, rhyme labeling, boundary labeling of Chinese vowels and rhymes
Format:
Tagged text save as .txt; boundary labeling text save as .interval
Standards:
1. Save audio file as wav format with 48KHz 16bit, unchanged tone color, volume and speed, and without zero drift or waveform clipping.
2. The word accuracy of annotated text is high than 99.8%.
3. The proportion of phoneme boundary errors great than 10ms is less than 1%; the accuracy of syllable boundary is higher than 98%.
Storage:
FTP
Format:
Audio file: WAV; Text annotation file: TXT; Boundary annotation file: INTERVAL

Dataset manual

Data download

Open-Source Datasets

Chinese Mandarin Female Corpus(10,000 Sentences)

Technical Parameters