Latest News
- New LRs in the ELRA Catalogue July 25, 2024
- New LRs in the ELRA Catalogue June 5, 2024
- New LRs in the ELRA Catalogue Dec. 7, 2023
- New LRs in the ELRA Catalogue Nov. 13, 2023
- The LDS vision by Philippe Gelin Oct. 17, 2023
Details on AURORA Databases
The notebook below shows the detailed description, downloadable licenses and link to the ELRA catalogue (when relevant) for each of the 9 AURORA Databases.
Please scroll horizontally on the right arrow (or on the left arrow) to see the tabs that are not displayed.
AURORA Project Database - Subset of SpeechDat-Car Italian database (AURORA/CD0003-05)
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:
This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car:
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use only: EUR 1000
AURORA Project Database - Subset of SpeechDat-Car German database (AURORA/CD0003-03)
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:
This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: EUR 200
Price for research use by commercial organisations: EUR 1000
AURORA Project Database - Subset of SpeechDat-Car Finnish database (AURORA/CD0003-01)
This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in the following driving conditions inside a car:
The database also contains the software needed to run simulations using the Entropic’s HTK, which has been adopted as the "standard" HMM recogniser for the Aurora standard evaluation.
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: EUR 200
Price for research use by commercial organisations: EUR 1000
The Aurora project was originally set up to establish a worldwide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system.
The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system.
The earlier three Aurora experiments had a focus on additive noise and the influence of some telephone frequency characteristics. Aurora-5 tries to cover all effects as they occur in realistic application scenarios. The focus was put on two scenarios. The first one is the hands-free speech input in the noisy car environment with the intention of controlling either devices in the car itself or retrieving information from a remote speech server over the telephone. The second one covers the hands-free speech input in a type of office or in a type of living room to control e.g. a telephone device or some audio/video equipment.
The AURORA-5 database contains the following data:
Further information is also available at the following address: http://aurora.hsnr.de
Two original copies of the contract (word | pdf) must be sent to ELDA. To be valid these contracts must be initialled and signed. The user should annex to the contract the proof that he obtained the right to use the TI digits from LDC (ref. LDC93S10). This may be a signed licence agreement or a proof of membership payment for 1993.
Price for research use by academic organisations : Free
Price for research use by commercial organisations : EUR 250
An additional database has been released. It contains noisy versions of the Nov’92 WSJO development set.
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: Free
Price for research use by commercial organisations: EUR 1000
AURORA Project Database - Subset of SpeechDat-Car Danish database (AURORA/CD0003-04)
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:
This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car:
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: EUR 200
Price for research use by commercial organisations: EUR 1000
AURORA Project Database - Subset of SpeechDat-Car Spanish database (AURORA/CD0003-02)
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:
This database is a subset of the SpeechDat-Car database in Spanish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Spanish digits spoken in the following noise and driving conditions inside a car:
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: EUR 200
Price for research use by commercial organisations: EUR 1000
AURORA Project Database 2.0 (AURORA/CD0002)
The Aurora project is releasing a revised version of the Noisy TI digits database to follow on the work of ETSI. This CD set is a replacement for the previous set (version 1.0 consisted of 2 CDs while version 2.0 now consists of 4 CDs) .
This database is intended for the evaluation of algorithms for front-end feature extraction algorithms in background noise but may also be used more widely by speech researchers to evaluate and compare the performance of noise robust speech recognition algorithms.
Compared to version 1.0 the changes are as follows:
Two original copies of the contract (word | pdf) must be sent to ELDA. To be valid these contracts must be initialled and signed. The user should annex to the contract the proof that he obtained the right to use the TI digits from LDC (ref. LDC93S10). This may be a signed licence agreement or a proof of membership payment for 1993.
Price for research use by academic organisations: Free
Price for research use by commercial organisations: EUR 250
The Aurora project is now releasing a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0 with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multicondition training sets and 14 evaluation sets with different noise types and microphones.
Two original copies of the contract (word | pdf) must be sent to ELDA.
Price for research use by academic organisations: Free
Price for research use by commercial organisations: EUR 1000