Milestones:Line spectrum pair (LSP), an essential technology for high-compression speech coding, 1975

From IEEE Milestones Wiki
Revision as of 20:16, 27 May 2015 by Administrator1 (talk | contribs) (Created page with "{{MilestoneLayout |pagename=Milestones:Line spectrum pair (LSP), an essential technology for high-compression speech coding, 1975 |title=Line Spectrum Pair (LSP) for high-comp...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Title

Line Spectrum Pair (LSP) for high-compression speech coding, 1975

Citation

Line Spectrum Pair, invented at NTT in 1975, is an important technology for speech synthesis and coding. A speech synthesizer chip was designed based on Line Spectrum Pair in 1980. In the 1990s, this technology was adopted in almost all international speech coding standards as an essential component and has contributed to the enhancement of digital speech communication over mobile channels and the Internet worldwide.

Street address(es) and GPS coordinates of the Milestone Plaque Sites

{{{gps}}}, NTT Musashino R&D center  9-11, Midori-cho 3-Chome Musashino-Shi, Tokyo 180-8585 Japan

Details of the physical location of the plaque

The plaque will be placed near the reception area in the ground floor entrance hall. All visitors have free access to this hall.

How the intended plaque site is protected/secured

NTT’s receptionists are always near the plaque, and the plaque will be displayed in a transparent hard case.

Historical significance of the work

The line spectrum pair (LSP), invented in 1975, is one of the most efficient feature representation technologies for speech signals. Due to its number of practical merits for high-compression speech coding, it is commonly used worldwide in speech coding standards for cellular and IP phones, including 3GPP AMR (3G cellular in Europe and Japan), 3GPP2 EVRC (3G cellular in the USA and Japan), ITU-T G.723.1 and G.729 (IP phones), IETF SILK (software IP phones, Skype) , and PDC half (2G cellular in Japan), which cover almost all high-compression telephone communications systems that are widely used around the world now and will be in the future.

Features that set this work apart from similar achievements

It is possible to transmit prediction coefficients directly. Quantizing predictive coefficients, however, needs many bits for maintaining the LPC spectral shape. It is also difficult to avoid the risk of instability of the coding system. Partial auto correlation (PARCOR), invented by Dr. F. Itakura and Dr. S. Saito in 1972, enables an easy stability check but still needs many bits of quantization to maintain the LPC spectral shape. It is possible to reduce bit consumption for quantizing PARCOR by applying adaptive bit allocation and variable length coding schemes. Both schemes are, however, extremely sensitive to transmission channel errors. LSP, an alternative representation technology for prediction coefficients, was invented by Dr. F. Itakura in 1975 [2]. It enables a simple stability check and can maintain the LPC spectrum shape with around 30% less bit consumption than PARCOR, even without using adaptive bit allocation or the variable length coding schemes [3] – [7]. This is because the quantization distortion of LSP has smaller and more natural influences on LPC spectral shape than PARCOR. Thus, small LPC spectral distortion is achieved by efficient coding of LSP in combination with prediction, interpolation, and vector quantization.

Significant references

[1] B. S. Atal, “The History of Linear Prediction”, IEEE SIGNAL PROCESSING MAGAZINE, pp. 154-157, MARCH 2006.

[2] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 533(A), 1975.

[3] JP Patent 56051116 - ALL POLE TYPE DIGITAL FILTER invented by F. Itakura http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=8&ND=6&adjacent=true&locale=en_EP&FT=D&date=19810508&CC=JP&NR=56051116A&KC=A

[4] US Patent 4,393,272 Sound synthesizer invented by F. Itakura and N. Sugamura http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=4,393,272.PN.&OS=PN/4,393,272&RS=PN/4,393,272

[5] F. Itakura, “Statistical Methods for Speech Analysis and Synthesis –from ML Vocoder to LSP through PARCOR –,” IEICE Fundamental Review Vol.3 No.3. 2010. (in Japanese) Abstract: The invention process of the line spectrum air (LSP), one of the most important analysis technologies for speech signals, is described. Partial auto correlation (PARCOR) and LSP are alternative representation methods for a speech spectrum shape or a vocal tract shape. Both methods were invented at NTT Labs in 1972 and 1975, respectively. This paper covers the processes for these inventions, starting from the original invention of a speech analysis method based on maximum likelihood estimation in 1966.

[6] F. Itakura, T. Kobayashi and M. Honda, “A Hardware implementation of a new narrow and medium band speech coding,”, Proc. ICASSP 82, pp. 1964 – 1967, 1982.

[7] F. Soong and B. H. Juan, “Line spectrum pair (LSP) and speech data compression,” Proc. ICASSP 84, Vol. 9, pp. 37 – 40, 1984.

Supporting materials

[8] ITU-T G.723.1(Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s) http://www.itu.int/rec/T-REC-G.723.1-200605-I,section 2.4-2.7

[9] ITU-T G.729(Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear prediction (CS-ACELP)) http://www.itu.int/rec/T-REC-G.729-200701-S, section 3.2

[10] 3GPP AMRhttp://www.etsi.org/deliver/etsi_ts/126000_126099/126090/10.01.00_60/ts_126090v100100p.pdf, section 5.2

[11] 3GPP2 EVRC http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf, section 4.2

[12] IETF SILKhttp://tools.ietf.org/html/draft-vos-silk-01, page 261

[13] Example of VoIP gateway supporting G. 729 http://www.cisco.com/en/US/docs/ios/solutions_docs/voip_solutions/CAC.html