Analyzing Audio



The Short Version

This is how analysis works: select a WAV file, and presto!  Here is your animation.

The Long Version

FaceFX  Studio pre-analysis pipeline

  1. Analysis begins by running through the create animation wizard, or with a call to the analyze command.
  2. The following operations are performed for each audio file (these steps are performed multiple times for batch operations)
    1. The preloadaudio python callback is called as the audio is loaded to verify it is valid.
    2. If batch analysis, preloadtext python callback is called to potentially modify where to load the text.
    3. The preloadaudio python callback is called a second time as the audio is loaded for real.
    4. The analysistextpreprocessor python callback is called to modify the text before sending it to analysis.
    5. The audio is resampled to 16-bit 16 kHz.
    6. The audio and text is sent into FxAnalysis along with language, and analysis actor settings.

FxAnalysis pipeline

  1. The text is analyzed for chunking tags.  For each text chunk (or for the entire text file if no chunk tags exist)
    1. The text chunk is stripped of punctuation and valid text tags
    2. Invalid text tags that aren't in the expected format, but have an opening and closing text tag marker are removed, effectively being ignored by our system. 
    3. The text chunk is analyzed with the audio chunk to get phoneme and word times
    4. The phoneme and word times are appended to the results from prior chunks.
      • Gaps in the list due to non-contiguous chunks are filled with the SIL phoneme.
  2. Coarticulation is run on the final phoneme list, adding curves specified in the mapping to the output animation.
  3. Punctuation is stripped from the text prior to text tag processing.
  4. Curve text tags are analyzed, inserting curves into the output animation.
  5. Event text tags are analyzed, but the resulting events are not yet inserted into the output animation.  They are sent to the analysis actor in the next step.
  6. The analysis actor is loaded with a copy of the output animation (we'll call it the gesture animation).  Events generated from text tags are added to the gesture animation.
  7. Analysis events are generated from the audio and added to the gesture animation.
  8. An  event take is performed on the gesture animation.
  9. The gesture animation and its events are baked and analysis curves are generated for all nodes in the analysis actor that do not start with an underscore.  The gesture curves are copied from the gesture animation to the output animation.
  10. Some events from the gesture animation take are copied to the output animation event template.  Specifically, events from groups that begin with an underscore are left behind, and other events are copied to the output animation.
  11. The gesture animation's event template is cached and stored in the output animation's editor-only data for generating new analysis takes.

FaceFX Studio post-analysis pipeline

  1. The output animation is added to the currently loaded actor in FaceFX Studio.
  2. Curves in the output animation are marked as owned by analysis
  3. A new take is generated from the output animation event template.
  4. The posteventtake python callback is called.
  5. The postanalysis python callback is called.
  6. The animation is selected in FaceFX Studio, which loads the audio and calls the preloadaudio python callback

See Also


Version Number: 
2010