USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 31
Table 3-1: Status Word Format
| Offset |
Field |
Size |
Value |
Description |
| 0 |
bStatusType |
1 |
Bitmap |
D7: Interrupt Pending D6: Memory Contents Changed D5..4: Reserved D3..0: Originator 0 = AudioControl interface 1 = AudioStreaming interface 2 = AudioStreaming endpoint 3..15 = Reserved |
| 1 |
bOriginator |
1 |
Number |
ID of the Terminal, Unit, interface, orendpoint that reports the interrupt. |
3.7.2 AudioStreaming Interface
AudioStreaming interfaces are used to interchange digital audio data streams between the Host and the audio function. They are optional. An audio function can have zero or more AudioStreaming interfaces associated with it, each possibly carrying data of a different nature and format. Each AudioStreaming interface can have at most one isochronous data endpoint. This construction guarantees a one-to-one relationship between the AudioStreaming interface and the single audio data stream, related to the endpoint. In some cases, the isochronous data endpoint is accompanied by an associated isochronous synch endpoint for synchronization purposes. The isochronous data endpoint is required to be the first endpoint in the AudioStreaming interface. The synch endpoint always follows its associated data endpoint.
An AudioStreaming interface can have alternate settings that can be used to change certain characteristics of the interface and underlying endpoint. A typical use of alternate settings is to provide a way to change the bandwidth requirements an active AudioStreaming interface imposes on the USB. By incorporating a low-bandwidth or even zero-bandwidth alternate setting for each AudioStreaming interface, a device offers to the Host software the option to temporarily relinquish USB bandwidth by switching to this lowbandwidth alternate setting. If such an alternate setting is implemented, it must be the default alternate setting (alternate setting zero). A zero-bandwidth alternate setting can be implemented by specifying zero endpoints in the standard AudioStreaming interface descriptor. All other interface and endpoint descriptors (both standard and class-specific) need not be specified in this case.
The AudioStreaming interface is essentially used to provide an access point for the Host software (drivers) to manipulate the behavior of the physical interface it represents. Therefore, even external connections to the audio function (S/PDIF interface, analog input, etc.) can be represented by an AudioStreaming interface so that the Host software can control certain aspects of those connections. This type of AudioStreaming interface has no associated USB endpoints. The related audio data stream is not using USB as a transport medium.
In addition, the concepts of dynamic interfaces as described in the Universal Serial Bus Class Specification can be used to notify the Host software that changes have occurred on the external connection. This is analogous to switching alternate settings on an AudioStreaming interface with USB endpoints, except that the switch is now device-initiated instead of Host-initiated.
As an example, consider an S/PDIF connection to an audio function. If nothing is connected to this external S/PDIF interface, the AudioStreaming interface is idle and reports itself as being dynamic and non-configured (bInterfaceClass=0x00). If the user connects a standard IEC958 signal to the audio function, the S/PDIF receiver inside the audio function detects this and notifies the Host that the AudioStreaming interface has switched to its IEC958 mode (alternate setting x). If, on the other hand, an
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 32
IEC1937 signal, carrying MPEG-encoded audio is connected, the AudioStreaming interface switches to the appropriate setting (alternate setting y) to handle the MPEG decoding process.
For every isochronous OUT or IN endpoint defined in any of the AudioStreaming interfaces, there must be a corresponding Input or Output Terminal defined in the audio function. For the Host to fully understand the nature and behavior of the connection, it must take into account the interface- and endpoint-related descriptors as well as the Terminal-related descriptor.
3.7.2.1 Isochronous Audio Data Stream Endpoint
In general, the data streams that are handled by an isochronous audio data endpoint do not necessarily map directly to the logical channels that exist within the audio function. As an example, consider a “stereo” audio data stream that contains audio data, encoded in Dolby Prologic format. Although there is only one data stream, carrying interleaved samples for Left and Right (or more precisely LT and RT), these two channels carry information for four logical channels (Left, Right, Center, and Surround). Other examples include cases in which multiple logical audio channels are compressed into a single data stream. The format of such a data stream can be entirely different from the native format of the logical channels (for example, 256 Kbits/s MPEG1 stereo audio as opposed to 176.4 Kbytes/s 16 bit stereo 44.1 kHz audio). Therefore, to describe the data transfer at the endpoint level correctly, the notion of logical channel is replaced by the notion of audio data stream. It is the responsibility of the AudioStreaming interface which contains the OUT endpoint to convert between the audio data stream and the embedded logical channels before handing the data over to the Input Terminal. In many cases, this conversion process involves some form of decoding. Likewise, the AudioStreaming interface which contains the IN endpoint must convert logical channels from the Output Terminal into an audio data stream, often using some form of encoding.
Consequently, requests to control properties that exist within an audio function, such as volume or mute cannot be sent to the endpoint in an AudioStreaming interface. An AudioStreaming interface operates on audio data streams and is unaware of the number of logical channels it eventually serves. Instead, these requests must be directed to the proper audio function’s Units or Terminals via the AudioControl interface.
As already mentioned, an AudioStreaming interface can have zero or one isochronous audio data endpoint. If multiple synchronous audio channels must be communicated between Host and audio function, they must be clustered into one audio channel cluster by interleaving the individual audio data, and the result can be directed to the single endpoint. Furthermore, a single synch endpoint, if needed, can service the entire cluster. In this way, a minimum number of endpoints are consumed to transport related data streams.
If an audio function needs more than one cluster to operate, each cluster is directed to the endpoint of a separate AudioStreaming interface, belonging to the same Audio Interface Collection (all servicing the same audio function). If there is a need to manipulate a number of AudioStreaming interfaces as a whole, these interfaces can be tied together. The techniques for associating interfaces, described in the Universal Serial Bus Class Specification should be used to create the binding.
3.7.2.2 Isochronous Synch Endpoint
For adaptive audio source endpoints and asynchronous audio sink endpoints, an explicit synch mechanism is needed to maintain synchronization during transfers. For details about synchronization, see Section 5, “USB Data Flow Model,” in the USB Specification and the relevant parts of the Universal Serial Bus Class Specification.
The information carried over the synch path consists of a 3-byte data packet. These three bytes contain the Ff value in a 10.14 format as described in Section 5.10.4.2, “Feedback” of the USB Specification. Ff represents the average number of samples the endpoint must produce or consume per frame to match the desired sampling frequency Fs exactly.
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 33
A new Ff value is available every 2(10 – P) ms (frames) where P can range from 1 to 9, inclusive. The sample clock Fs is always derived from a master clock Fm in the device. P is related to the ratio between those clocks through the following relationship:
In worst case conditions, only Fs is available and Fm = Fs, giving P = 1 because one can always use phase information to resolve the estimation of Fs within half a clock cycle.
An adaptive audio source IN endpoint is accompanied by an associated isochronous synch OUT endpoint that carries Ff. An asynchronous audio sink OUT endpoint is accompanied by an associated isochronous synch IN endpoint.
For adaptive IN endpoints and asynchronous OUT endpoints, the standard endpoint descriptor provides the bSynchAddress field to establish a link to the associated synch endpoint. It contains the address of the synch endpoint. The bSynchAddress field of the synch standard endpoint descriptor must be set to zero.
As indicated earlier, a new Ff value is available every 2(10 – P) frames with P ranging from 1 to 9. The bRefresh field of the synch standard endpoint descriptor is used to report the exponent (10-P) to the Host. It can range from 9 down to 1. (512 ms down to 2 ms)
3.7.2.3 Audio Channel Cluster Format
An audio channel cluster is a grouping of logical audio channels that share the same characteristics like sampling frequency, bit resolution, etc. Channel numbering in the cluster starts with channel one up to the number of channels in the cluster. The virtual channel zero is used to address a master Control in a Unit, effectively influencing all the channels at once. The maximum number of independent channels in an audio channel cluster is limited to 254. Indeed, Channel zero is used to reference the master channel and code 0xFF (255) is used in requests to indicate that the request parameter block holds values for all available addressed Controls. For further details, refer to Section 5.2.2, “AudioControl Requests” and the sections that follow, describing the second form of requests.
In many cases, each channel in the audio cluster is also tied to a certain location in the listening space. A trivial example of this is a cluster that contains Left and Right logical audio channels. To be able to describe more complex cases in a manageable fashion, this specification imposes some limitations and restrictions on the ordering of logical channels in an audio channel cluster.
There are twelve predefined spatial locations:
· Left Front (L)
· Right Front (R)
· Center Front (C)
· Low Frequency Enhancement (LFE) [Super woofer]
· Left Surround (LS)
· Right Surround (RS)
· Left of Center (LC) [in front]
· Right of Center (RC) [in front]
· Surround (S) [rear]
· Side Left (SL) [left wall]
· Side Right (SR) [right wall]
· Top (T) [overhead]
If there are logical channels present in the audio channel cluster that correspond to some of the previously defined spatial positions, then they must appear in the order specified in the above list. For instance, if a
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 34
cluster contains logical channels Left, Right and LFE, then channel 1 is Left, channel 2 is Right, and channel 3 is LFE.
To characterize an audio channel cluster, a cluster descriptor is introduced. This descriptor is embedded within one of the following descriptors:
· Input Terminal descriptor
· Mixer Unit descriptor
· Processing Unit descriptor
· Extension Unit descriptor
The cluster descriptor contains the following fields:
· bNrChannels: a number that specifies how many logical audio channels are present in the cluster.
· wChannelConfig: a bit field that indicates which spatial locations are present in the cluster. The bit allocations are as follows:
§ D0: Left Front (L)
§ D1: Right Front (R)
§ D2: Center Front (C)
§ D3: Low Frequency Enhancement (LFE)
§ D4: Left Surround (LS)
§ D5: Right Surround (RS)
§ D6: Left of Center (LC)
§ D7: Right of Center (RC)
§ D8: Surround (S)
§ D9: Side Left (SL)
§ D10: Side Right (SR)
§ D11: Top (T)
§ D15..12: Reserved
· Each bit set in this bit map indicates there is a logical channel in the cluster that carries audio information, destined for the indicated spatial location. The channel ordering in the cluster must correspond to the ordering, imposed by the above list of predefined spatial locations. If there are more channels in the cluster than there are bits set in the wChannelConfig field, (i.e. bNrChannels > [Number_Of_Bits_Set]), then the first [Number_Of_Bits_Set] channels take the spatial positions, indicated in wChannelConfig. The remaining channels have ‘non-predefined’ spatial positions (positions that do not appear in the predefined list). If none of the bits in wChannelConfig are set, then all channels have non-predefined spatial positions. If one or more channels have non-predefined spatial positions, their spatial location description can optionally be derived from the iChannelNames field.
· iChannelNames: index to a string descriptor that describes the spatial location of the first nonpredefined logical channel in the cluster. The spatial locations of all remaining logical channels must be described by string descriptors with indices that immediately follow the index of the descriptor of the first non-predefined channel. Therefore, iChannelNames inherently describes an array of string descriptor indices, ranging from iChannelNames to (iChannelNames + (bNrChannels- [Number_Of_Bits_Set]) - 1)
Example 1:
An audio channel cluster that carries Dolby Prologic logical channels has the following cluster descriptor:
Table 3-2: Dolby Prologic Cluster Descriptor
| Offset |
Field |
Size |
Value |
Description |
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 35
| Offset |
Field |
Size |
Value |
Description |
| 0 |
bNrChannels |
1 |
4 |
There are 4 logical channels in the cluster. |
| 1 |
wChannelConfig |
2 |
0x0107 |
Left, Right, Center and Surround are present. |
| 3 |
iChannelNames |
1 |
Index |
Because there are no non-predefined logical channels, this index must be set to 0. |
Example 2:
A hypothetical audio channel cluster inside an audio function could carry Left, Left Surround, Left of Center, and two auxiliary channels that contain each a different weighted mix of the Left, Left Surround and Left of Center channels. The corresponding cluster descriptor would be:
Table 3-3: Left Group Cluster Descriptor
| Offset |
Field |
Size |
Value |
Description |
| 0 |
bNrChannels |
1 |
5 |
There are 5 logical channels in the cluster |
| 1 |
wChannelConfig |
2 |
0x0051 |
Left, Left Surround, Left of Center and two undefined channels are present. (bNrChannels > [Number_Of_Bits_Set]) |
| 3 |
iChannelNames |
1 |
Index |
Optional index of the first non-predefined string descriptor |
Optional string descriptors:
String (Index) = ‘Left Down Mix 1’
String (Index+1) = ‘Left Down Mix 2’
3.7.2.4 Audio Data Format
The format used to transport audio data over the USB is entirely determined by the code, located in the wFormatTag field of the class-specific interface descriptor. Therefore, each defined Format Tag must document in detail the audio data format it uses. Consequently, format-specific descriptors are needed to fully describe the format. For details about the predefined Format Tags and associated data formats and descriptors, see the separate document, USB Audio Data Formats, that is considered part of this specification. Vendor-specific protocols must be fully documented by the manufacturer.
最終更新:2011年05月22日 11:43