Universal Serial Bus Device Class Definition for Audio Data Formats
Release 2.0 May 31, 2006 16
| Offset |
Field |
Size |
Value |
Description |
| 5 |
bBitResolution |
1 |
Number |
The number of effectively used bits from the available bits in an audio subslot. |
2.3.1.7 Type I Supported Formats
The following paragraphs list all currently supported Type I Audio Data Formats. The bit allocations in the bmFormats field of the class-specific AS interface descriptor for the different Type I Audio Data Formats can be found in Appendix A.2.1, “Audio Data Format Type I Bit Allocations.”
2.3.1.7.1 PCM Format
The PCM (Pulse Coded Modulation) format is the most commonly used audio format to represent audio data streams. The audio data is not compressed and uses a signed two’s-complement fixed point format. It is left-justified (the sign bit is the Msb) and data is padded with trailing zeros to fill the remaining unused bits of the subslot. The binary point is located to the right of the sign bit so that all values lie within the range [-1, +1).
2.3.1.7.2 PCM8 Format
The PCM8 format is introduced to be compatible with the legacy 8-bit wave format. Audio data is uncompressed and uses 8 bits per sample (bBitResolution = 8). In this case, data is unsigned fixed-point, left-justified in the audio subslot, Msb first. The range is [0,255].
2.3.1.7.3 IEEE_FLOAT Format
The IEEE_FLOAT format is based on the ANSI/IEEE-754 floating-point standard. Audio data is represented using the basic single-precision format. The basic single-precision number is 32 bits wide and has an 8-bit exponent and a 24-bit mantissa. Both mantissa and exponent are signed numbers, but neither is represented in two's-complement format. The mantissa is stored in sign magnitude format and the exponent in biased form (also called excess-n form). In biased form, there is a positive integer (called the bias) which is subtracted from the stored number to get the actual number. For example, in an eight-bit exponent, the bias is 127. To represent 0, the number 127 is stored. To represent -100, 27 is stored. An exponent of all zeroes and an exponent of all ones are both reserved for special cases, so in an eight-bit field, exponents of -126 to +127 are possible. In the basic floating-point format, the mantissa is assumed to be normalized so that the most significant bit is always one, and therefore is not stored. Only the fractional part is stored. Denormalized (exponent = 0) values are considered to be zero.
The 32-bit IEEE-754 floating-point word is broken into three fields. The most significant bit stores the sign of the mantissa, the next group of 8 bits stores the exponent in biased form, and the remaining 23 bits store the magnitude of the fractional portion of the mantissa. For further information, refer to the ANSI/IEEE-754 standard.
The data is conveyed over USB using 32 bits per sample (bBitResolution = 32; bSubslotSize = 4).
2.3.1.7.4 ALaw Format and μLaw Format
Starting from 12- or 16-bits linear PCM samples, simple compression down to 8-bits per sample (one byte per sample) can be achieved by using logarithmic companding. The compressed audio data uses 8 bits per sample (bBitsPerSample = 8). Data is signed fixed point, left-justified in the subslot, Msb first. The compressed range is [-128,128]. The difference between Alaw and μLaw compression lies in the formulae used to achieve the compression. Refer to the ITU G.711 standard for further details.
Universal Serial Bus Device Class Definition for Audio Data Formats
Release 2.0 May 31, 2006 17
2.3.1.7.5 Type I Raw Data
This audio format is included to allow transport of data (audio or other) over a USB AudioStreaming interface in the form of PCM-like audio slots when the actual format or even the meaning of the transported data is unknown. The USB pipe simply acts as a pass-through. As a consequence, such data can never be interpreted inside the audio function and can only be routed from an Input Terminal to one or more Output Terminals. From a USB standpoint, the data is packed as if it were Type I formatted audio data, but the data is never to be interpreted as being audio data.
2.3.2 Type II Formats
Type II formats are used to transmit non-PCM encoded audio data into bit streams that consist of a sequence of encoded audio frames.
2.3.2.1 Encoded Audio Frames
An encoded audio frame is a sequence of bits that contains an encoded representation of one or more physical audio channels. The encoding takes place over a fixed number of audio slots. Each encoded audio frame contains enough information to entirely reconstruct the audio samples (albeit not lossless), encoded in the encoded audio frame. No information from adjacent encoded audio frames is needed during decoding. The number of audio slots used to construct one encoded audio frame depends on the encoding scheme. (For MPEG, the number of slots per encoded audio frame (nf) is 384 for Layer I or 1152 for Layer II. For AC-3, the number of slots is 1536.)
In most cases, the encoded audio frame represents multiple physical audio channels. The number of bits per encoded audio frame may be variable. The content of the encoded audio frame is defined according to the implemented encoding scheme. Where applicable, the bit ordering shall be MSB first, relative to existing standards of serial transmission or storage of that encoding scheme. An encoded audio frame represents an interval longer than the USB (micro)frame. This is typical of audio compression algorithms that use psycho-acoustic or vocal tract parametric models.
&cite(Note}: It is important to make a clear distinction between a USB frame and an encoded audio frame. The overloaded use of the term frame could cause confusion. Therefore, this specification will always use the qualifier ‘encoded audio’ to refer to MPEG or AC-3 encoded audio frames.
2.3.2.2 Audio Bit Streams
An encoded audio bit stream is a concatenation of a potentially very large number of encoded audio frames, ordered according to ascending time. Subsequent encoded audio frames are independent and can be decoded separately.
2.3.2.3 USB Packets
Encoded audio bit streams are packetized when transported over an isochronous pipe. Each virtual frame packet potentially contains only part of a single encoded audio frame. Packet sizes are determined according to the short-packet protocol. The encoded audio frame is broken down into a number of packets, each containing wMaxPacketSize bytes except for the last packet, which may be smaller and contains the remainder of the encoded audio frame. If the MaxPacketsOnly bit D7 in the bmAttributes field of the class-specific endpoint descriptor is set, the last (short) packet must be padded with zero bytes to wMaxPacketSize length. No virtual frame packet may contain bits belonging to different encoded audio frames. If the encoded audio frame length is not a multiple of 8 bits, the last byte in the last packet is padded with zero bits. The decoder must ignore all padded extra bits and bytes. Consecutive encoded audio frames are separated by at least one Transfer Delimiter. A Transfer Delimiter must be sent in all virtual frames until the next encoded audio frame is due. The above rules guarantee that a new encoded audio frame always starts on a virtual frame packet boundary.
Universal Serial Bus Device Class Definition for Audio Data Formats
Release 2.0 May 31, 2006 18
2.3.2.4 Bandwidth Allocation
The encoded audio frame time tf equals the number of audio slots per encoded audio frame nf divided by the sampling rate fs of the original audio samples.
The allocated bandwidth for the pipe must accommodate for the largest possible encoded audio frame to be transmitted within an encoded audio frame time. This should take into account the Transfer Delimiter requirement and any differences between the time base of the stream and the USB (micro)frame timer. The device may choose to consume more bandwidth than necessary (by increasing the reported wMaxPacketSize) to minimize the time needed to transmit an entire encoded audio frame. This can be used to enable early decoding and therefore minimize system latency.
2.3.2.5 Timing
The timing reference point is the beginning of an encoded audio frame. Therefore, the USB packet that contains the first bits (usually the encoded audio frame sync word) of the encoded audio frame is used as a timing reference in USB space. This USB packet is called the reference packet. The transmission of the reference packet of an encoded audio frame should begin at the target playback time of that frame (minus the endpoint’s reported delay) rounded to the nearest USB (micro)frame time. This guarantees that, at the receiving end, the arrival of subsequent reference packets matches the encoded audio frame time tf as closely as possible.
2.3.2.6 Type II Format Type Descriptor
The Type II Format Type descriptor starts with the usual three fields bLength, bDescriptorType and bDescriptorSubtype.
The bFormatType field indicates this is a Type II descriptor. The wMaxBitRate field contains the maximum number of bits per second this interface can handle. It is a measure for the buffer size available in the interface. The wSlotsPerFrame field contains the number of PCM audio slots contained within a single encoded audio frame.
Table 2-3: Type II Format Type Descriptor
| Offset |
Field |
Size |
Value |
Description |
| 0 |
bLength |
1 |
Number |
Size of this descriptor, in bytes: 8 |
| 1 |
bDescriptorType |
1 |
Constant |
CS_INTERFACE descriptor type. |
| 2 |
bDescriptorSubtype |
1 |
Constant |
FORMAT_TYPE descriptor subtype. |
| 3 |
bFormatType |
1 |
Constant |
FORMAT_TYPE_II. Constant identifying the Format Type the AudioStreaming interface is using. |
| 4 |
wMaxBitRate |
2 |
Number |
Indicates the maximum number of bits per second this interface can handle. Expressed in kbits/s. |
| 6 |
wSlotsPerFrame |
2 |
Number |
Indicates the number of PCM audio slots contained in one encoded audio frame. |
Universal Serial Bus Device Class Definition for Audio Data Formats
Release 2.0 May 31, 2006 19
2.3.2.7 Rate feedback
If the isochronous data endpoint needs explicit rate feedback (adaptive source, asynchronous sink), the feedback pipe must report the number of equivalent PCM audio slots. The host will accumulate this data and start transmission of an encoded audio frame whenever the current number of audio slots exceeds the number of slots per encoded audio frame. The remainder is kept in the accumulator.
2.3.2.8 Type II Supported Formats
The following sections list all currently supported Type II Audio Data Formats. The bit allocations in the bmFormats field of the class-specific AS interface descriptor for the different Type II Audio Data Formats can be found in Appendix A.2.2, “Audio Data Format Type II Bit Allocations.”
2.3.2.8.1 MPEG Format
Refer to the ISO/IEC 11172-3:1993 “Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s -- Part 3: Audio” and the ISO/IEC 13818-3:1998 “Information technology -- Generic coding of moving pictures and associated audio information -- Part 3: Audio” specifications for detailed format information.
2.3.2.8.2 AC-3 Format
Refer to the Digital Audio Compression Standard (AC-3), ATSC A/52A Aug. 20, 2001 for detailed format information.
2.3.2.8.3 WMA Format
2.3.2.8.4 DTS Format
2.3.2.8.5 Type II Raw Data
This audio format is included to allow transport of data (audio or other) over a USB AudioStreaming interface in the form of a bit stream when the actual format or even the meaning of the transported data is unknown. The USB pipe simply acts as a pass-through. As a consequence, such data can never be interpreted inside the audio function and can only be routed from an Input Terminal to one or more Output Terminals. From a USB standpoint, the data is packed as if it were Type II formatted audio data, but the data is never to be interpreted as being audio data.
2.3.3 Type III Formats
These formats are based upon the IEC61937 standard. The IEC61937 standard describes a method to transfer non-PCM encoded audio bit streams over an IEC60958 digital audio interface, together with the transfer of the accompanying “Channel Status” and “User Data.”
The IEC60958 standard specifies a widely used method of interconnecting digital audio equipment with two-channel linear PCM audio. The IEC61937 standard describes a way in which the IEC60958 interface must be used to convey non-PCM encoded audio bit streams for consumer applications.
The same basic techniques used in IEC61937 are reused here to convey non-PCM encoded audio bit streams over a Type III formatted audio stream. From a USB transfer standpoint, the data streaming over the interface looks exactly like two-channel 16 bit PCM audio data.
Universal Serial Bus Device Class Definition for Audio Data Formats
Release 2.0 May 31, 2006 20
2.3.3.1 Type III Format Type Descriptor
The bFormatType field indicates this is a Type III descriptor. The bSubSlotSize field indicates how many bytes are used to transport an audio subslot. The bBitResolution field indicates how many bits of the total number of available bits in the audio subslot are truly used by the audio function to convey audio information.
Table 2-4: Type III Format Type Descriptor
| Offset |
Field |
Size |
Value |
Description |
| 0 |
bLength |
1 |
Number |
Size of this descriptor, in bytes: 6 |
| 1 |
bDescriptorType |
1 |
Constant |
CS_INTERFACE descriptor type. |
| 2 |
bDescriptorSubtype |
1 |
Constant |
FORMAT_TYPE descriptor subtype. |
| 3 |
bFormatType |
1 |
Constant |
FORMAT_TYPE_III. Constant identifying the Format Type the AudioStreaming interface is using. |
| 4 |
bSubslotSize |
1 |
Number |
The number of bytes occupied by one audio subslot. Must be set to two. |
| 5 |
bBitResolution |
1 |
Number |
The number of effectively used bits from the available bits in an audio subframe. |
2.3.3.2 Type III Supported Formats
Refer to the ISO/IEC 60958 and ISO/IEC 61937 (several parts) specifications for detailed format information. The bit allocations in the bmFormats field of the class-specific AS interface descriptor for the different Type III Audio Data Formats can be found in Appendix A.2.3, “Audio Data Format Type III Bit Allocations.”
The following is a list of formats that is covered or will be covered by the above specifications.
• IEC61937_AC-3
• IEC61937_MPEG-1_Layer1
• IEC61937_MPEG-1_Layer2/3 or IEC61937_MPEG-2_NOEXT
• IEC61937_MPEG-2_EXT
• IEC61937_MPEG-2_AAC_ADTS
• IEC61937_MPEG-2_Layer1_LS
• IEC61937_MPEG-2_Layer2/3_LS
• IEC61937_DTS-I
• IEC61937_DTS-II
• IEC61937_DTS-III
• IEC61937_ATRAC
• IEC61937_ATRAC2/3
In addition, the WMA audio compression format as defined by Microsoft is supported.
2.3.4 Type IV Formats
Type IV formats can only be used on external connections to the audio function that do not use a USB pipe for their data transport but that do need an AudioStreaming interface to control an encoder or decoder process in one or more of its Alternate Settings. A typical example of such a connection is an S/PDIF connector that is capable of handling both PCM stereo audio data streams (IEC60958) in one Alternate
Release 2.0 May 31, 2006 20
最終更新:2011年06月04日 18:39