MODULE: TotalLipSync v0.6

Started by Snarky, Tue 18/04/2017 19:22:17

Previous topic - Next topic

Snarky

TotalLipSync


(Character head/animation by Preston Blair. King Graham sprite ripped from King's Quest II VGA by AGDI. Background from AGS Awards Ceremony by Ali.)


TotalLipSync is a module for voice-based lip sync. (It does not currently support text-based lip sync.) It allows you to play back speech animations that have been synchronized with voice clips.

Note: TotalLipSync does not do this synchronization (a.k.a. "tracking") for you! It only plays it back. So for each line you want to lip sync in-game, you need to provide a lip sync tracking file, generated by an external tool.

TotalLipSync has all the same capabilities as the voice-based lip sync that is built into AGS, and offers the following additional advantages:
  • It works with LucasArts-style speech (as well as with Sierra-style and full-screen speech modes).
  • In addition to the Pamela (.pam) and Papagayo/Moho Switch (.dat) formats supported by AGS, it also reads Annosoft SAPI 5.1 (.anno) and Rhubarb (.tsv) lip sync files.
  • In particular, Rhubarb support means that lip syncing can be 100% automated (with decent results): no manual tracking of the speech clips is required.
  • It is more flexible: You can switch speech-styles mid-game, change the phoneme mapping, use files with different data formats, etc.
  • You don't have to do the phonemes-to-frames mapping manually: The module comes with a default auto-mapping.

How to use
  • Create the lip sync tracking data files for the speech clips. You can use one of these tools (personally I would recommend Papagayo for manual tracking, and Rhubarb for automatic lip syncing, but the Lip Sync Manager plugin is good too):
    The filename of each sync file should be the same as the speech clip except for the extension (following the AGS convention, so for the line cRoger.SaySync("&8 Hi!"); you might have voice audio file ROGE8.OGG and corresponding sync file ROGE8.DAT), and you need to bundle them with the game (by default, in a folder names "sync/" inside the project folder, and add this folder to the "Package custom data folder(s)" setting under General settings | Compiler).
  • Create the speech animation for your character(s), with different animation frames for the different phonemes (see below), and set it as their speech view. Frame 0 should be the "silent" frame used when not speaking (like in the default AGS setup).
  • Download and import the TotalLipSync module into your AGS project.
  • Make sure your game settings are correct: the AGS built-in lip sync (in the project tree under "Lip sync") should be set to "disabled".
  • If you are going to use Sierra-style (or full-screen) speech for your lip sync animations, you must create a dummy view. Make sure to give it exactly one loop and one frame. If you name the view TLS_DUMMY it will automatically be used by the module. Otherwise you can set the view to use with TotalLipSync.SetSierraDummyView().

You are now ready to use the module. Add the code to initialize TotalLipSync on startup, making sure to select the format corresponding to the sync files you've created:

Code: ags
//In GlobalScript.asc

function game_start() 
{
  TotalLipSync.Init(eLipSyncRhubarb);    // Or whatever lip sync format you're using
  TotalLipSync.AutoMapPhonemes();
}

Or if you don't want to use the default phonemes-to-frames mapping, you can set up a custom mapping. In this case you have to make sure you use the correct phoneme codes for the particular lip sync data format you are using (as shown in the table behind a spoiler further down):

Code: ags
//In GlobalScript.asc

function game_start() 
{
  TotalLipSync.Init(eLipSyncPamelaIgnoreStress);    // Or whatever lip sync format you're using

  // Make sure you're using the correct phoneme codes for the format you're using,
  // and match them to the frames in your speech view
  TotalLipSync.AddPhonemeMapping("None",0); // Strongly recommend putting the silent frame as frame 0
  TotalLipSync.AddPhonemeMappings("B/M/P",1);
  TotalLipSync.AddPhonemeMappings("S/Z/IH/IY/SH/T/TH/D/DH/JH/N/NG/ZH",2);
  TotalLipSync.AddPhonemeMappings("EH/CH/ER/EY/G/K/R/Y/HH",3);
  TotalLipSync.AddPhonemeMappings("AY/AA/AH/AE",4);
  TotalLipSync.AddPhonemeMappings("AO/AW/UH",5);
  TotalLipSync.AddPhonemeMappings("W/OW/OY/UW",6);
  TotalLipSync.AddPhonemeMappings("F/V",7);
  TotalLipSync.AddPhonemeMapping("L",8);
}

To speak a line with lip syncing, you simply call the extender functions Character.SaySync() or Character.SayAtSync(), using a speech clip prefix:

Code: ags
  cGraham.SaySync("&1 This line will be animated with lip sync");
  cGraham.SayAtSync(320, 100, 240, "&2 ... and so will this");    // x_left, y_top, width, message

And that's all there is to it! (If you don't use a speech clip prefix, or if there is no matching sync file, the speech animation won't play at all.)

Phoneme-to-frame mappings
The principle of lip syncing is that different speech sounds (phonemes) correspond to different mouth shapes. If we display an animation frame with the right mouth shape at the same time as that sound appears in the voice clip being played, the animation will seem to match the speech. The first step, then, is to identify the phonemes and timing of them in the speech (that's what the tools listed above are for), and the second step is to choose an appropriate animation frame for each phoneme. We usually don't use different animation frames for all the different phonemes, so we combine phonemes into groups that are all mapped to a single frame. The different tools have different sets of phonemes (or phoneme groups), so we have to define different mappings from phonemes to frames.

So here is the default mapping for each data format used by TotalLipSync. It has been set up for a speech animation with ten different frames, each representing a different mouth position. (This is a fairly standard setup.) If you stick to these frames and these mappings, you can use the same speech view no matter what lip sync tool or data format you use:

Spoiler
Frame
Description
Rhubarb
phoneme ID
Moho
phoneme
Pamela
phonemes
0Mouth closed
(or slack)
[slack or same as 1]XrestNone
1M, B, PAMBPM/B/P
2Various consonants,
(Rhubarb: Ee-type
sounds)
BetcK/S/T/D/G/DH/
TH/R/HH/CH/Y/N/
NG/SH/Z/ZH/JH
3Eh-type sounds,
(Non-Rhubarb:
Ee-type sounds)
CEIH/IY/EH/AH/
EY/AW/ER
4Ah-type and
I-type sounds
DAIAA/AE/AY
5Aww-type sounds,
Ow-type sounds
(can also go in 6)
EOAO/OW
6U-type and
Oo-type sounds
(Non-Moho: W)
FUUW/OY/UH
7Moho: W[same as 6][same as 6][same as 6]WQW
8F, VGFVF/V
9L
(Th-type sounds
can also go here,
rather than in 2)
HLL
[close]
Where to get it
TotalLipSync and further documentation is hosted on Github:
https://github.com/messengerbag/TotalLipSync

You can download the current release from there:


Known bugs
None

Change log
v0.6
- Fixed parsing of .anno files failed to close file after read
- Fixed TotalLipSync.Init() would reset Data Directory setting
- Changed sync timing to use AudioChannel.PositionMs for greater accuracy, if available
- Wrapped sync functions in Game.SkippingCutscene checks to improve game performance when skipping cut scenes
- Set to use packaged data directory ($DATA$) by default
- Added TotalLipSync.SetFileCasing() to API to support case-sensitive file systems
- Added TotalLipSync.SetDefaultFrame() to API to support arbitrary speech view setups
- Added optional file extension argument to TotalLipSync.Init() for convenience
- Reorganized code for improved readability
- Updated documentation

v0.5
- Added APIs to get the currently lip syncing character, the current phoneme and current frame

v0.4 (initial release)
- Fixed support for Sierra-style speech
- Minor bug fixes for edge-cases
- Documentation

0.2 (pre-release)
- Added support for Papagayo/Moho Switch (.dat), Annosoft SAPI 5.1 LipSync (.anno) and Rhubarb (.tsv)

0.1 (pre-release)
- Pamela support for LucasArts-style speech

Originally based on code by Calin Leafshade (though very little of it remains in the current version).
Thanks to Grundislav for providing a speech view used in development and testing of the module!

Snarky

#1
A couple more things:

Pamela and Annosoft SAPI 5.1 LipSync use almost exactly the same phoneme sets, with only minor variation (TotalLipSync is not case sensitive). Pamela can tag vowels with three levels of stress (0-2), e.g. AY0, UW1. This is not particularly useful in AGS, and should be ignored (by setting TotalLipSync.Init(ePamelaIgnoreStress)) unless there's good reason not to. Anyway, here's the full list and what they represent (ones where Annosoft differs emphasized):

Spoiler
IPA
Example
PamelaAnno
[silence]-Nonex
ɑ:, ɒfather, box (AmEng)AAAA
æat, snackAEAE
ʌ, əhut, aloneAHAH
ɑ, ɔthaw, dogAOAO
cow, outAWAW
hide, guyAYAY
bbangBb
cheeseCHCH
ddamnDd
ðthese, batheDHDH
ɛ, ɛəbed, bearEHEH
ɜ:, ɚhurt, butterERER
ate, baitEYEY
ffineFf
ggoodGg
hhouseHHh
ɪ, ɪəit, fearIHIH
i:eat, freeIYIY
gee, jawJHj
kkey, crushKk
llipLl
mmonkeyMm
nnoNn
ŋping, pongNGNG
əʊoak, slowOWOW
ɔɪtoyOYOY
pputPp
rreadRr
ssapSs
ʃsharpSHSH
ttotalTt
θthinTHTH
ʊ, ʊəgood, poorUHUH
u:youUWUW
vvikingVv
wwe, questionWw
jyieldYy
zzooZz
ʒseizure, genreZHZH
(Based on the in-app list in PAMELA and the Annosoft list here.)
[close]

Also, if you use Rhubarb to do the lip syncing, this could be helpful:

(Edit: Note that Rhubarb has changed its command-line API since these instructions were written, so the syntax of the command in the example is out of date for the latest version. Consult the Rhubarb documentation on Github to see how to adapt it to the current API.)

Quote from: Snarky on Sat 15/04/2017 12:30:44You don't need to compile Rhubarb, just get the latest release for Windows or OSX: https://github.com/DanielSWolf/rhubarb-lip-sync/releases

The script doesn't call Rhubarb, you'll have to do all of that. Take all the speech clips, convert them to .wav if necessary, copy them into the Rhubarb directory, and for each one, call "rhubarb.exe myclip.wav > myclip.tsv" (where "myclip" is the name of the clip). You can also put the text corresponding to each clip in individual .txt files to assist with the speech recognition, and then you'd call "rhubarb.exe myclip.wav -d myclip.txt > myclip.tsv". Then once that's done, copy all the .tsv files over into the directory of your compiled AGS game, and the module will read them.

Obviously this process is tedious, and also Rhubarb takes quite a while to process each clip, so if you have more than a couple of dozen clips you'll definitely want to automate it (you could write a batch file to go through and process each .wav file in the directory)

In fact, I wrote a very simple version of such a batch file:

Code: bash
for %%F in (clips/*.wav) do (
    rhubarb.exe clips/%%~nxF -d guide/%%~nF.txt > sync/%%~nF.tsv
)

This assumes that the voice clips are in a folder called "clips/" inside the Rhubarb directory, that the text files are in a folder called "guide/", and that there is a folder called "sync/" where the .tsv files will be written. It also requires a .txt file for each .wav file. So there are a lot of possible improvements. Save this in a text file in the Rhubarb directory and name it something like agsbatch.bat, and you can run it to process all the speech clips in one go (which might take a while!).

Mehrdad

Hey Snarky . It's great module . Nice Job!!
My official site: http://www.pershaland.com/

Cassiebsg

I may give it a test try at some point, if I can figure out how to properly animate mouths.  (laugh)
Looks like an awesome module though, thanks for the hard work.  (nod)
There are those who believe that life here began out there...

Snarky

Thanks! I hope it turns out to be useful to someone.

As for figuring out how to animate mouths, if you look in the second spoiler-hidden section of the post, there are three references which might help.

Crimson Wizard

The easiest way to test this, I think, is to draw said letters on sprites instead of mouth animation :).

Cassiebsg

Oh, nice! Just what I needed!  (nod) and the drawing one is just perfect reference for me to try and "copy" into the Blender models.  :-D
I've been doing 6 to 8 frames lipsync, so jumping to 10 doesn't feel that scary.  (laugh)
There are those who believe that life here began out there...

Grundislav

This is a great module, thanks so much for making it!

It's made me reeeeeeeally tempted to use it...

bx83

Okay I've converted all files to TSV etc
One last question: where do I put the TSV files?
My game files are in K:\
The speech (as .OGG files) is in K:\gamename\Speech\
The WAV files for speech is in K:\gamename\Speech\WAV (I use .OGG files)

Do I put the TSV's in the Speech dir?...

I notice '$INSTALLDIR$/sync' - is this "C:\Program Files (x86)\Adventure Game Studio 3.4.0\sync" or "my installed game\sync" dir, or "my currently un-installed un-compiled game directory\sync", or...?

Snarky

While you're working on the game, it's â€" in your case â€" "K:\gamename\Compiled\sync" or "K:\gamename\Compiled\Windows\sync" (I believe AGS will read both directories). "K:\gamename\Compiled\Windows" is probably the directory you will ultimately distribute once the game is finished (unless you're aiming for another platform), so that's where I would put it.

bx83

Excellent, thankyou for all you've done :D

bx83

May I suggest changing
#define TLS_PHONEMES_LINE_MAX 50
To
#define TLS_PHONEMES_LINE_MAX 5000

This had me running out of space in a 13 second audio file.

Crimson Wizard

#12
I seem to be late for this, but I'd just make a small note -

Quote from: bx83 on Sat 22/04/2017 04:26:26
I notice '$INSTALLDIR$/sync' - is this "C:\Program Files (x86)\Adventure Game Studio 3.4.0\sync" or "my installed game\sync" dir, or "my currently un-installed un-compiled game directory\sync", or...?

Quote from: Snarky on Sat 22/04/2017 07:50:22
While you're working on the game, it's â€" in your case â€" "K:\gamename\Compiled\sync" or "K:\gamename\Compiled\Windows\sync" (I believe AGS will read both directories). "K:\gamename\Compiled\Windows" is probably the directory you will ultimately distribute once the game is finished (unless you're aiming for another platform), so that's where I would put it.

$INSTALLDIR$ is the directory where main game data file is located at (*.exe or else) the moment you run it.

AGS does not really check more than one directory normally; there are special rules when running from under the Editor only (debugger mode): in that case Editor passes couple of alternative paths to the engine (one includes AudioCache, for instance).

bx83

I have a strange problem.

I'm using a different movement view, and speech view, for my character. Before this, no bug; now, the compiler will randomly choose a point and then crash with a runtime error:



But then, if I toggle breakpoints in the file in one of the sections leading to the error (line 649), I can do the same thing, but no crash:




This would appear to be a timing issue, as I've checked everything else; all frames and movement and speech are the same as the original character, nothing is missing. No idea how the module works though; it appears to crashing on speechstyle=lucasarts...?

Ps. my game is lucasarts/rhubarb.

Please help.

bx83

False alarm, it was to do with the number of frames in the speech view (I added 9, but whatever, it's 10 frames... but 9 frame descriptors XAB..GH. So who knows). Should have checked all, sorry :(

Snarky

Ah, good! As I was just about to write, there are two things to check first:

-What version of the module are you using?
-Does your speech view have enough frames (in each direction)? It should be 10 for the default mapping.

Rhubarb only has 9 different phonemes/mouth positions (it uses the same frame for W as for U/OO sounds), but the auto-mapping still assumes a 10-frame speech view (W, frame 7, will never be used, so you can leave it blank or make it the same as frame 6) for consistency with the other formats. Also note that the order of the frames is not as listed on the Rhubarb page, but as in the table (behind spoiler tags) in the first post in this thread.

However, I'll see about adding some checking so the module can give a more informative error message if this happens.

bx83

I am curious - haven't yet tried a comipiled game...
If my sync/ folder is Compiled/; but my actual game is in Compiled/Windows/ - does this mean I include the sync directory in Windows/ too? Is the sync/ files just compiled into the game EXE?

Snarky

The sync files are not compiled in (I don't think there's really any way to do that), so you need to include the sync directory when you distribute the game.

bx83

But if the game searches for sync/ and not Windows/sync/ ?...
Oh well I'll change it.

Crimson Wizard

#19
AFAIK when you run the game from the Editor (with F5), game does not look into Compiled at all, it gets files and subfolders that are located right in the project's root folder.

But when you order "Build Exe" it builds it to Compiled/Windows. But it won't put extra files there automatically, so you would need to copy them over when preparing package.
(EDIT: actually just files from Compiled get copied into Compiled/Windows, but not subfolders.)

SMF spam blocked by CleanTalk