A likelihood-ratio-based forensic speaker discrimination was conducted using the mean formant frequencies of Standard Chinese /i/ and /y/ tokens produced by 64 male speakers. The speech data were relatively forensically realistic in that they were relatively extemporaneous, were recorded over the telephone, and were from three non-contemporaneous recording sessions. A multivariate-kernel-density formula was used to calculate cross-validated likelihood ratios comparing all possible same-speaker and different-speaker combinations across sessions. Results were comparable with those previously obtained with laboratory speech in other languages. In general, greater strength of evidence was obtained for recording sessions separated by one week than for recording sessions separated by one month.