The Unaware's Guide to UTAU Part 2 (REUPLOAD)

DISCLAIMER: This guide was originally written nearly three years ago and is slightly outdated by now. Among other things, it predated the CV-VC recording method becoming more popular for Japanese, and the release of a plugin that made it easier to use. Also, the footnotes re-direct to the original page this was on.

NOTE: The reason that there’s no “4” footnote is because I substantially edited the section explaining CV-VC after I copied from my word processor. I doubt anyone would have noticed the missing footnote otherwise, though.


There are several different recording methods to use in UTAU. At first, there was only CV (consonant – vowel, called tandokuon in Japanese), but it has since expanded to VCV, (vowel – consonant – vowel, called renzokuon in Japanese) and CV-VC (consonant – vowel – vowel – consonant). CV, as the name suggests, is made up of consonant-vowel recordings that would be named, for example, “ka.wav.” CV voicebanks that are in romaji1 are usually aliased in hiragana. In the “ka.wav” example from earlier, it generally would be aliased as “か” in the program. The advantage of CV is that it’s relatively easy to record and configure, making it good for beginners. The drawback is that it can sound choppy unless configured well. This can partially be remedied by recording VV (vowel-vowel) samples, which make vowel transitions sound smoother. These CV banks are often called CV-VV voicebanks.


Example of a CV romaji voicebank aliased with hiragana.

The second type of voicebank recording method is VCV. VCV is recorded in strings. Typically, the files will be named to indicate so, such as “ka_ka_ki_ka_ku.wav.” This file would be aliased as “- ,” “a ,” “a ,” “i ,” and “a く” in the program, and split up accordingly.2 There are two types of VCV; VCV lite and VCV full. VCV lite requires fewer recordings and, when configured correctly, can sound nearly as smooth as VCV. The upside of VCV is that it can sound smoother than CV when used and configured properly. Not configured properly, and it can sound like a slurry mess. Slur is a problem that has plagued VCV in the past, however, it can be partially remedied by using consonant velocity. In addition, VCV takes longer to record and configure than CV. This type of voicebank would not be recommended for beginners, except for the most ambitious. Nonetheless, better understanding of the method among overseas users and better tutorials has lead to this method being used more and more recently; the general perception of the method among overseas users has gone from something that was considered nearly impossible except for advanced users, to something that can be done by anyone with patience. One final note about VCV: there are two widely used methods to record it. 5 mora has five syllables in a single recording, take for example, “ka_ka_ki_ka_ku.wav.” On the other hand, 7 mora has seven syllables in a recording, like “ka_ka_ki_ka_ku_ke_ka.” The advantage of 5 mora is that the recordings are shorter; with 7 mora, the advantage is that there are fewer recordings.


Example of an aliased VCV voicebank.

Finally, CV-VC is a method used usually for non-Japanese languages. Languages recorded with this method include Korean, Chinese, and English, though there are possibilities for more languages, provided a good reclist3 is made and used. For example, the UTAU CRINA has a Romanian voicebank. CV-VC is also used for Japanese. The advantage of this method when used for Japanese is that it’s not as recording intensive as VCV, while still maintaining smoothness. For most non-Japanese languages, CV-VC is arguably the best recording choice. The disadvantages of CV-VC mostly involve the fact that CV-VC UST’s are relatively scarce. Due to this, heavy UST editing is required in most cases. This goes for both Japanese and non-Japanese languages. Because of these reasons, CV-VC is not recommended for beginners.

If you’re simply recording an UTAU for somebody else, however, there are slightly different considerations. For one, CV-VC is a quite feasible option because of fewer recordings, however, a good amount of users are hesitant to use the method for Japanese at the moment. VCV, while, more difficult to record for someone new to recording in UTAU, is easier to use than CV-VC. CV-VV is also a good option in that it requires much fewer recordings with relatively smooth results, with only a couple of more recordings than CV. No matter what method you decide to record, it’s important that you learn the basics of Japanese pronunciation.


If you’re going to work in UTAU, one thing you need to know is how to convert UST’s from CV to VCV and vice-versa. To convert a VCV UST to CV, you need to navigate to Tools > Built-In Tools > Suffix Broker, and select “Remove Prefix.” The next thing is to go to the right-hand corner and select the button that resets envelopes. Then, go to Built-In Tools again and select “Crossfade.” This smooths out the vowels.5 Then, select all of the notes, right click, go to “properties,” and clear the boxes labeled “preutter” and “overlap.” Finally, use ACPT and either P2P3 or P1P4 to fit the UST to the voicebank you are using. Converting to VCV is a little trickier. You need to download and install a plugin, firstly.6 Once you’ve installed the plugin, go into UTAU and select all the notes. Then, you need to go to “Plug-Ins,” and select whatever you’ve named the VCV to CV plugin7, then press “OK.” Skip resetting the envelopes and fixing the crossfade in this case, and go ahead and clear the preutter and overlap. Also, add consonant velocity if the voicebank needs it. You can do this by selecting the box under the “Overlap” box. Enter the value recommended, which usually can be found in the UTAU’s readme. In some cases, you may have to convert a UST to hiragana before converting to VCV. To do this, you need the plugin for that, as well.


The ACPT, P2P3, P1P4, and envelope resetting buttons.

1Term for romanized Japanese.

2Note: romaji is not generally used in VCV.

3Recording list.

5Note: If you are using a romaji UST, do not crossfade “n”, as this will affect “na,” “ne,” et al.

6Plugin and tutorial on how to install is in “Additional Materials.”

7Note: the plugin comes with a text file labeled “plugin,” which has the name of said plugin. To change this, simply delete the name it has and write in whatever you want the name to be. Then, simply save the file.


Private comment