Downloads - SPEAR Challenge

Scripts

The following scripts are made available to the participants to gauge their performances through the challenge. More details are found in the README of the repo here.

Datasets

The working datasets, split for convenience into Train, Development and Evaluation sets. To avoid downloading unecessary data, they are split into Core, Extra and Tascar. Only the Core is necessary to run the challenge and the rest are extra information about the scenes and how they were generated.

It is advised to decompress in a drive with at least 100GB of free memory and using the command line tar -xzf <desired-tar-file-to-decompress> from the desired output folder. This way, new tar files will be unpacked in the same directory. The complete unpacked structure can be found below. Be aware that unpacking takes some time.

Download links:

Core files
- Core Train Dataset 1 (14 GB)
- Core Train Dataset 2 (20 GB)
- Core Train Dataset 3 (20 GB)
- Core Train Dataset 4 (24 GB, 35 GB decompressed)
- Core Dev set (24 GB, 34GB decompressed)
Optional files
- Optional Train Metadata (155 MB)
- Optional Train Audio Dataset 2 (44 GB)
- Optional Train Audio Dataset 3 (44 GB)
- Optional Train Audio Dataset 4 (43 GB)
- Optional Dev set (43 GB)
Tascar simulation files
- Tascar Train set (5 GB)
- Tascar Dev set (2 GB)
ATFs file
- Device ATFs (63 MB)

For windows user, the powershell command line (new-object System.Net.WebClient).DownloadFile('desired-download-link','desired-download-location-pathfile.tar') can help bypass any dowload folder restriction.

Mac and Linux users can use the command curl 'desired-download-link' --create-dirs -o 'desired-download-location-pathfile.tar'.

Complete SPEAR data structure

┬── Main (directory): Contains all relevant information for SPEAR
│ ├── Train (directory): Contains the train datasets
│ │ ├── Dataset 1 (directory): EasyCom original
│ │ │ ├── Microphone_Array_Audio (directory): Contains the
│ │ │ │ │ microphone array audio for the AR glasses wearer
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── array_D#_S##_M##.wav (files): Six channels audio
│ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ ├── Array_Orientation (directory): Orientation of the array
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ori_D#_S##_M##.csv (files): Quaternions
│ │ │ │ Sampled 50ms/20Hz
│ │ │ └── DOA_target (directory): Direction of all sources
│ │ │ │ relative to the array
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── doa_D#_S##_M##_ID#.csv (files): Azimuth and elevation
│ │ │ Sampled 50ms/20Hz
│ │ │
│ │ ├── Dataset 2 (directory): EasyCom reproduced with Tascar
│ │ │ ├── Microphone_Array_Audio (directory): Contains the
│ │ │ │ │ microphone array audio for the AR glasses wearer.
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── array_D#_S##_M##.wav (files): Six channels audio
│ │ │ │ Sampled 48kHz, 32 bits float using HOA order 15
│ │ │ ├── Array_Orientation (directory): Orientation of the array
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ori_D#_S##_M##.csv (files): Quaternions
│ │ │ │ Sampled 50ms/20Hz
│ │ │ └── DOA_target (directory): Direction of all sources
│ │ │ │ relative to the array
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── doa_D#_S##_M##_ID#.csv (files): Azimuth and elevation
│ │ │ Sampled 50ms/20Hz
│ │ │
│ │ ├── Dataset 3 (directory): EasyCom augmented with Tascar
│ │ │ ├── Microphone_Array_Audio (directory): Contains the
│ │ │ │ │ microphone array audio for the AR glasses wearer.
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── array_D#_S##_M##.wav (files): Six channels audio
│ │ │ │ Sampled 48kHz, 32 bits float using HOA order 15
│ │ │ ├── Array_Orientation (directory): Orientation of the array
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ori_D#_S##_M##.csv (files): Quaternions
│ │ │ │ Sampled 50ms/20Hz
│ │ │ └── DOA_target (directory): Direction of all sources
│ │ │ │ relative to the array
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── doa_D#_S##_M##_ID#.csv (files): Azimuth and elevation
│ │ │ Sampled 50ms/20Hz
│ │ │
│ │ └── Dataset 4 (directory): EasyCom artificial with Tascar
│ │ ├── Microphone_Array_Audio (directory): Contains the
│ │ │ │ microphone array audio for the AR glasses wearer.
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── array_D#_S##_M##.wav (files): Six channels audio
│ │ │ Sampled 48kHz, 32 bits float using HOA order 15
│ │ ├── Array_Orientation (directory): Orientation of the array
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── ori_D#_S##_M##.csv (files): Quaternions
│ │ │ Sampled 50ms/20Hz
│ │ └── DOA_target (directory): Direction of single target source
│ │ │ relative to the array
│ │ └── Session_# (directories): The session directories
│ │ └── doa_D#_S##_M##_ID#.csv (files): Azimuth and elevation
│ │ Sampled 50ms/20Hz
│ │
│ └── Dev (directory): Contains the development datasets
│ └── Same as Train directory
│
├── Extra (directory): Contains additional information to understand
│ │ the datasets but not immediately necessary for the challenge
│ ├── Train (directory): Information on the train set
│ │ ├── Dataset 1 (directory): EasyCom original
│ │ │ ├── Reference_Audio (directory): Time aligned close mic
│ │ │ │ │ audio used as reference (monaural)
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ └── ref_D#_S##_M##_ID#.wav (files): One channel audio
│ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ ├── Reference_PosOri (directory): Reference positions and orientations
│ │ │ │ │ of all participants
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ ├── refPos_D#_S##_M##_ID#.csv (files): Cartesian position of ID#
│ │ │ │ └── refOri_D#_S##_M##_ID#.csv (files): Quaternion orientation of ID#
│ │ │ └── VAD (directory): Voice Activation Detection
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── vad_D#_S##_M##.csv (files): All sources VAD
│ │ │ Sampled 50ms/20Hz
│ │ │
│ │ ├── Dataset 2 (directory): EasyCom reproduced with Tascar
│ │ │ ├── Reference_Audio (directory): Direct path (anechoic scene)
│ │ │ │ │ of denoised close mic audio (binaural)
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ ├── cedar_D#_S##_M##_ID#.wav (files): Two channels audio
│ │ │ │ │ Sampled 48kHz, 32 bits float of the close mics recordings
│ │ │ │ │ denoised using CEDAR Audio
│ │ │ │ ├── array_full_ID#.wav (files): Six channels simulated audio
│ │ │ │ │ of a single speaker in full simulated room condition
│ │ │ │ │ using HOA order 15, level matched with Dataset 1
│ │ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ │ ├── array_full_Ls.wav (files): Six channels simulated audio
│ │ │ │ │ of background noise only in full simulated room condition
│ │ │ │ │ using HOA order 15, level matched with Dataset 1
│ │ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ │ └── ref_D#_S##_M##_ID#.wav (files): Binaural audio
│ │ │ │ of all speakers exctracted from the corresponding array ref audio
│ │ │ ├── Reference_PosOri (directory): Reference positions and orientations
│ │ │ │ │ of all participants
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ ├── refPos_D#_S##_M##_ID#.csv (files): Cartesian position of ID#
│ │ │ │ └── refOri_D#_S##_M##_ID#.csv (files): Quaternion orientation of ID#
│ │ │ ├── VAD (directory): Voice Activation Detection
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── vad_D#_S##_M##.csv (files): All sources VAD
│ │ │ │ Sampled 50ms/20Hz
│ │ │ └── TASCAR (directory): All files necessary for Tascar simulations
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── ## (directories): Minute of the current session
│ │ │ ├── Tascar_scenes.tsc (files): Tascar scenes for HOA simulations
│ │ │ ├── audio_ID#.wav (files): Audio used in the scene
│ │ │ │ for source #
│ │ │ ├── pos_ID#.csv (files): Position used
│ │ │ │ for source #
│ │ │ └── ori_ID#.csv (files): Orientation used
│ │ │ for source #
│ │ │
│ │ ├── Dataset 3 (directory): EasyCom augmented with Tascar
│ │ │ ├── Reference_Audio (directory): Direct path (anechoic scene)
│ │ │ │ │ of denoised close mic audio (binaural)
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ ├── cedar_D#_S##_M##_ID#.wav (files): Two channels audio
│ │ │ │ │ Sampled 48kHz, 32 bits float of the close mics recordings
│ │ │ │ │ denoised using CEDAR Audio
│ │ │ │ ├── array_full_ID#.wav (files): Six channels simulated audio
│ │ │ │ │ of a single speaker in full simulated room condition
│ │ │ │ │ using HOA order 15, level matched with Dataset 1
│ │ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ │ ├── array_full_Ls.wav (files): Six channels simulated audio
│ │ │ │ │ of background noise only in full simulated room condition
│ │ │ │ │ using HOA order 15, level matched with Dataset 1 with added variation
│ │ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ │ └── ref_D#_S##_M##_ID#.wav (files): Two channels audio
│ │ │ │ Sampled 48kHz, 32 bits float of all speakers
│ │ │ ├── Reference_PosOri (directory): Reference positions and orientations
│ │ │ │ │ of all participants
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── ## (directories): Minute of the current session
│ │ │ │ ├── refPos_D#_S##_M##_ID#.csv (files): Cartesian position of ID#
│ │ │ │ └── refOri_D#_S##_M##_ID#.csv (files): Quaternion orientation of ID#
│ │ │ ├── VAD (directory): Voice Activation Detection
│ │ │ │ └── Session_# (directories): The session directories
│ │ │ │ └── vad_D#_S##_M##.csv (files): All sources VAD
│ │ │ │ Sampled 50ms/20Hz
│ │ │ └── TASCAR (directory): All files necessary for Tascar simulations
│ │ │ ├── Tascar_augmentation.csv (file): List of the modifications
│ │ │ │ in the Tascar scenes of dataset 3 compared to Dataset 2
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── ## (directories): Minute of the current session
│ │ │ ├── Tascar_scenes.tsc (files): Tascar scenes for HOA simulations
│ │ │ ├── audio_ID#.wav (files): Audio used in the scene
│ │ │ │ for source #
│ │ │ ├── pos_ID#.csv (files): Position used
│ │ │ │ for source #
│ │ │ └── ori_ID#.csv (files): Orientation used
│ │ │ for source #
│ │ │
│ │ └── Dataset 4 (directory): EasyCom artificial with Tascar
│ │ ├── Reference_Audio (directory): Direct path (anechoic scene)
│ │ │ │ of denoised close mic audio (binaural)
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── ## (directories): Minute of the current session
│ │ │ ├── cedar_D#_S##_M##_ID#.wav (files): Two channels audio
│ │ │ │ Sampled 48kHz, 32 bits float of the close mics recordings
│ │ │ │ denoised using CEDAR Audio
│ │ │ ├── array_full_ID#.wav (files): Six channels simulated audio
│ │ │ │ of a single speaker in full simulated room condition
│ │ │ │ using HOA order 15, level matched with Dataset 1
│ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ ├── array_full_Ls.wav (files): Six channels simulated audio
│ │ │ │ of background noise only in full simulated room condition
│ │ │ │ using HOA order 15, level matched with Dataset 1 with added variation
│ │ │ │ Sampled 48kHz, 32 bits float
│ │ │ └── ref_D#_S##_M##_ID#.wav (files): Two channels audio
│ │ │ Sampled 48kHz, 32 bits float of all speakers
│ │ ├── Reference_PosOri (directory): Reference positions and orientations
│ │ │ │ of all participants
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── ## (directories): Minute of the current session
│ │ │ ├── refPos_D#_S##_M##_ID#.csv (files): Cartesian position of ID#
│ │ │ └── refOri_D#_S##_M##_ID#.csv (files): Quaternion orientation of ID#
│ │ ├── VAD (directory): Voice Activation Detection
│ │ │ └── Session_# (directories): The session directories
│ │ │ └── vad_D#_S##_M##.csv (files): All sources VAD
│ │ │ Sampled 50ms/20Hz
│ │ └── TASCAR (directory): All files necessary for Tascar simulations
│ │ ├── Tascar_artificial.csv (file): Description of all Tascar scenes
│ │ └── Session_# (directories): The session directories
│ │ └── ## (directories): Minute of the current session
│ │ ├── Tascar_scenes.tsc (files): Tascar scenes for HOA simulations
│ │ ├── audio_ID#.wav (files): Audio used in the scene
│ │ │ for source #
│ │ ├── pos_ID#.csv (files): Position used
│ │ │ for source #
│ │ └── ori_ID#.csv (files): Orientation used
│ │ for source #
│ │
│ ├── Dev (directory): Contains the development datasets
│ │ └── Same as Train directory
│ │
│ └── Modif_D3D4.csv (file): Master file containing all simulation modifications
│ applied from D2 to D3. D4 uses the same architecture as D3 with different audio.
│
└── README.md (file): Complementary information

The Evaluation set is now available upon request by sending a message to spear.challenge@gmail.com

┬── Main (directory): Contains all relevant information for SPEAR
│ └── Eval (directory): Contains the Evaluation datasets
│ └── Same as Train directory
│
└── README.md (file): Complementary information

On this page

Scripts

Datasets

Complete SPEAR data structure