Alex Loiko 14fc998497 Boxplot for APM-QA
A script for producing boxplots by parsing data generated by the
apm_quality_assessment.py tool.

The script groups data by the values of one or several audioproc_f
parameters. For every such subgroup it draws a boxplot. All boxplots
are shown next to each other with the parameter value as the x axis.
It is similar to this matplotlib example:
https://matplotlib.org/mpl_examples/pylab_examples/boxplot_demo_06.png

The script
1. reads config file names from the pandas dataframe generated by
   quality_assurance.collect_data
2. parses the (JSON) config files to read the parameter values
3. groups data with matching param values together
4. draws a boxplot for each group using matplotlib

TBR=alessiob@webrtc.org # reviewed already in old gerrit https://chromium-review.googlesource.com/c/external/webrtc/+/660559

BUG: webrtc:7218
Change-Id: I380a1363d26721feb975fad1506835c622e9d926
Reviewed-on: https://webrtc-review.googlesource.com/6340
Reviewed-by: Alex Loiko <aleloi@webrtc.org>
Commit-Queue: Alex Loiko <aleloi@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#20139}
2017-10-04 12:49:54 +00:00
..
2017-09-15 04:25:06 +00:00
2017-09-15 04:25:06 +00:00
2017-10-04 12:49:54 +00:00
2017-10-04 12:49:54 +00:00
2017-09-15 04:25:06 +00:00
2017-10-04 12:49:54 +00:00

APM Quality Assessment tool

Python wrapper of APM simulators (e.g., audioproc_f) with which quality assessment can be automatized. The tool allows to simulate different noise conditions, input signals, APM configurations and it computes different scores. Once the scores are computed, the results can be easily exported to an HTML page which allows to listen to the APM input and output signals and also the reference one used for evaluation.

Dependencies

  • OS: Linux
  • Python 2.7
  • Python libraries: enum34, numpy, scipy, pydub (0.17.0+), pandas (0.20.1+)
  • It is recommended that a dedicated Python environment is used
    • install virtualenv
    • $ sudo apt-get install python-virtualenv
    • setup a new Python environment (e.g., my_env)
    • $ cd ~ && virtualenv my_env
    • activate the new Python environment
    • $ source ~/my_env/bin/activate
    • add dependcies via pip
    • (my_env)$ pip install numpy pydub scipy pandas
  • PolqaOem64 (see http://www.polqa.info/)
    • Tested with POLQA Library v1.180 / P863 v2.400
  • Aachen Impulse Response (AIR) Database
  • Input probing signals and noise tracks (you can make your own dataset - *1)

Build

  • Compile WebRTC
  • Go to out/Default/py_quality_assessment and check that apm_quality_assessment.py exists

Unit tests

  • Compile WebRTC
  • Go to out/Default/py_quality_assessment
  • Run python -m unittest -p "*_unittest.py" discover

First time setup

  • Deploy PolqaOem64 and set the POLQA_PATH environment variable
    • e.g., $ export POLQA_PATH=/var/opt/PolqaOem64
  • Deploy the AIR Database and set the AECHEN_IR_DATABASE_PATH environment variable
    • e.g., $ export AECHEN_IR_DATABASE_PATH=/var/opt/AIR_1_4
  • Deploy probing signal tracks into
    • out/Default/py_quality_assessment/probing_signals (*1)
  • Deploy noise tracks into
    • out/Default/py_quality_assessment/noise_tracks (*1, *2)

(*1) You can use custom files as long as they are mono tracks sampled at 48kHz encoded in the 16 bit signed format (it is recommended that the tracks are converted and exported with Audacity).

(*2) Adapt EnvironmentalNoiseTestDataGenerator._NOISE_TRACKS accordingly in out/Default/py_quality_assessment/quality_assessment/test_data_generation.py.

Usage (scores computation)

  • Go to out/Default/py_quality_assessment
  • Check the apm_quality_assessment.sh as an example script to parallelize the experiments
  • Adjust the script according to your preferences (e.g., output path)
  • Run apm_quality_assessment.sh
  • The script will end by opening the browser and showing ALL the computed scores

Usage (export reports)

Showing all the results at once can be confusing. You therefore may want to export separate reports. In this case, you can use the apm_quality_assessment_export.py script as follows:

  • Set --output_dir, -o to the same value used in apm_quality_assessment.sh
  • Use regular expressions to select/filter out scores by
    • APM configurations: --config_names, -c
    • capture signals: --capture_names, -i
    • render signals: --render_names, -r
    • echo simulator: --echo_simulator_names, -e
    • test data generators: --test_data_generators, -t
    • scores: --eval_scores, -s
  • Assign a suffix to the report name using -f <suffix>

For instance:

$ ./apm_quality_assessment_export.py \
  -o output/ \
  -c "(^default$)|(.*AE.*)" \
  -t \(white_noise\) \
  -s \(polqa\) \
  -f echo

Usage (boxplot)

After generating stats, it can help to visualize how a score depends on a certain audioproc_f parameter. The apm_quality_assessment_boxplot.py script helps with that, producing plots similar to this one.

Suppose some POLQA scores come from running audioproc_f with or without the intelligibility enhancer: --ie=1 or --ie=0. Then two boxplots side by side can be generated with

$ ./apm_quality_assessment_boxplot.py \
      -o /path/to/output
      -v polqa
      -n /path/to/dir/with/apm_configs
      -z ie

Troubleshooting

The input wav file must be:

  • sampled at a sample rate that is a multiple of 100 (required by POLQA)
  • in the 16 bit format (required by audioproc_f)
  • encoded in the Microsoft WAV signed 16 bit PCM format (Audacity default when exporting)

Depending on the license, the POLQA tool may take “breaks” as a way to limit the throughput. When this happens, the APM Quality Assessment tool is slowed down. For more details about this limitation, check Section 10.9.1 in the POLQA manual v.1.18.

In case of issues with the POLQA score computation, check py_quality_assessment/eval_scores.py and adapt PolqaScore._parse_output_file(). The code can be also fixed directly into the build directory (namely, out/Default/py_quality_assessment/eval_scores.py).