The Unaware's Guide to UTAU Part 3 (REUPLOAD)

DISCLAIMER: This guide was originally written nearly three years ago and is slightly outdated by now. Among other things, it predated the CV-VC recording method becoming more popular for Japanese, and the release of a plugin that made it easier to use. Also, the footnotes re-direct to the original page this was on.


One of the first questions you may be asking yourself, especially if you weren’t aware of UTAU or Vocaloid until recently may be, “Why do I have to record Japanese?” Well, there are several reasons why the vast majority of UTAU users and voicers start with Japanese.

One reason is because recording more complex languages, such as English, would be a nightmare for beginners1. One glance at the few available English reclists would make most people shy away from it. Secondly, Japanese is relatively easy to record, and one of the only ones that sound decent using just CV/VV recordings, save for, say, Hawaiian. Also, while, Korean and Chinese don’t have that many recordings either, they are decidedly harder to pronounce if you’re unfamiliar with them. Thirdly, there are tons and tons of readily available UST’s for Japanese-language songs, mostly Vocaloid originals, but also a few other songs.


While, Japanese is relatively easy to pronounce, there are a few things to keep in mind; major pronunciation errors are very obvious, especially the dreaded “American accent!” Accents in voicebanks aren’t necessarily bad; most overseas voicebanks are not going to have 100% perfect pronunciation, and in some cases, they can add to the uniqueness of a voice. This section will point out the most egregious errors.


The Japanese language only has five vowels; six, if you count “n.” Of those, two you especially need to watch your pronunciation on. However, I’ll go through all five here, and link to clips of them being pronounced.

a” – similar to the “a” in “father” (

e” – similar to the “e” in “elephant” or “egg” (

o” – This is the first vowel that English speakers may have trouble with. It’s best to listen to it to get the hang of pronunciation; the mistake some make here is to pronounce it like “oh.”

u” – Another vowel that some may have trouble pronouncing. Again, listen to get the hang of it. Note here, when pronouncing this, it’s best to not have your lips rounded like pronouncing “oo,” but not spread either; another mistake some make is to pronounce it like this.

i” – Long “e” sound, for example, like the one in “meet.”


In this section, I’m not going to go over every vowel, just the ones that need notes on how to pronounce them.

t” – Make sure not to pronounce them too strongly; Japanese “t’s” are softer than English “t’s.” The same is true for “d,” to an extent.

r” – By far, this is the consonant most native English speakers have trouble with. The Japanese “r” is best described as a cross between an English “l,” “r,” and the “d” sound in words like “pudding.”2 This is one of those sounds where practice really is perfect. You need to listen to it to get a hang on it. If you can’t get a hang on it, English L’s are okay as long as they’re not too obvious.


  • Recordings should be about 1 to 1.5 seconds long. Any shorter would get noticeably stretched out by the resampler, while any longer would not have a noticeable effect on quality.
  • ALWAYS SING YOUR SAMPLES. This is one of the most important things to remember. UTAU handles sung samples better, and the resulting UTAU is less likely to sound bored. Singing for an UTAU doesn’t take too much effort, even if you can’t sing; besides, UTAU’s pitch correction will handle that for you.
  • Be consistent. Try to sing at the same pitch in the same intonation. Don’t sing one sample soft, and then one loud. Very subtle pitch differences are okay, as long as they are subtle, and not, say, at completely different octaves.

1You would cry. No, seriously.

2In fact, the Japanese word for pudding is purin.


Private comment