USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 16
Terminal Addressable logical object inside an audio function that represents a connection to the audio function’s outside world.
Unit Addressable logical object inside an audio function that represents a certain audio subfunctionality.
XUD Acronym for Extension Unit Descriptor.
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 17
2 Management Overview
The USB is very well suited for transport of audio (voice and sound). PC-based voice telephony is one of the major drivers of USB technology. In addition, the USB has more than enough bandwidth for sound, even high-quality audio. Many applications related to voice telephony, audio playback, and recording can take advantage of the USB.
In principle, a versatile bus specification like the USB provides many ways to propagate and control digital audio. For the industry, however, it is very important that audio transport mechanisms be well defined and standardized on the USB. Only in this way can interoperability be guaranteed among the many possible audio devices on the USB. Standardized audio transport mechanisms also help to keep software drivers as generic as possible. The Audio Device Class described in this document satisfies those requirements. It is written and revised by experts in the audio field. Other device classes that address audio in some way should refer to this document for their audio interface specification.
An essential issue in audio is synchronization of the data streams. Indeed, the smallest artifacts are easily detected by the human ear. Therefore, a robust synchronization scheme on isochronous transfers has been developed and incorporated in the USB Specification. The Audio Device Class definition adheres to this synchronization scheme to transport audio data reliably over the bus.
This document contains all necessary information for a designer to build a USB-compliant device that incorporates audio functionality. It specifies the standard and class-specific descriptors that must be present in each USB audio function. It further explains the use of class-specific requests that allow for full audio function control. A number of predefined data formats are listed and fully documented. Each format defines a standard way of transporting audio over USB. However, provisions have been made so that vendor-specific audio formats and compression schemes can be handled.
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 18
3 Functional Characteristics
In many cases, audio functionality does not exist as a standalone device. It is one capability that, together with other functions, constitutes a “composite” device. A perfect example of this is a CD-ROM player, which can incorporate video, audio, data storage, and transport control. The audio function is thus located at the interface level in the device class hierarchy. It consists of a number of interfaces grouping related pipes that together implement the interface to the audio function.
Audio functions are addressed through their audio interfaces. Each audio function has a single AudioControl interface and can have several AudioStreaming and MIDIStreaming interfaces. The AudioControl (AC) interface is used to access the audio Controls of the function whereas the AudioStreaming (AS) interfaces are used to transport audio streams into and out of the function. The MIDIStreaming (MS) interfaces are used to transport MIDI data streams into and out of the audio function. The collection of the single AudioControl interface and the AudioStreaming and MIDIStreaming interfaces that belong to the same audio function is called the Audio Interface Collection (AIC). A device can have multiple Audio Interface Collections active at the same time. These Collections are used to control multiple independent audio functions located in the same composite device.
Note: All MIDI-related information is grouped in a separate document, Universal Serial Bus Device Class Definition for MIDIStreaming Interfacesthat is considered part of this specification.
3.1 Audio Interface Class
The Audio Interface class groups all functions that can interact with USB-compliant audio data streams. All functions that convert between analog and digital audio domains can be part of this class. In addition, those functions that transform USB-compliant audio data streams into other USB-compliant audio data streams can be part of this class. Even analog audio functions that are controlled through USB belong to this class.
In fact, for an audio function to be part of this class, the only requirement is that it exposes one AudioControl interface. No further interaction with the function is mandatory, although most functions in the audio interface class will support one or more optional AudioStreaming interfaces for consuming or producing one or more isochronous audio data streams.
The Audio Interface class code is assigned by the USB. For details, see Section A.1, “Audio Interface Class Code.”
3.2 Audio Interface Subclass and Protocol
The Audio Interface class is divided into Subclasses that can be further qualified by the Interface Protocol code. However, at this moment, the Interface Protocol is not used and must be set to 0x00. All audio functions are part of a certain Subclass. The following three Subclasses are currently defined in this specification:
· AudioControl Interface Subclass
· AudioStreaming Interface Subclass
· MIDIStreaming Interface Subclass
The assigned codes can be found in Sections A.2, “Audio Interface Subclass Codes” and A.3, “Audio Interface Protocol Codes” of this specification. All other Subclass codes are unused and reserved except code 0xFF which is by specification reserved for vendor-specific extensions.
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 19
3.3 Audio Synchronization Types
Each isochronous audio endpoint used in an AudioStreaming interface belongs to a synchronization type as defined in Section 5 of the USB Specification. The following sections briefly describe the possible synchronization types.
3.3.1 Asynchronous
Asynchronous isochronous audio endpoints produce or consume data at a rate that is locked either to a clock external to the USB or to a free-running internal clock. These endpoints cannot be synchronized to a start of frame (SOF) or to any other clock in the USB domain.
3.3.2 Synchronous
The clock system of synchronous isochronous audio endpoints can be controlled externally through SOF synchronization. Such an endpoint must do one of the following:
· Slave its sample clock to the 1ms SOF tick.
· Control the rate of USB SOF generation so that its data rate becomes automatically locked to SOF.
3.3.3 Adaptive
Adaptive isochronous audio endpoints are able to source or sink data at any rate within their operating range. This implies that these endpoints must run an internal process that allows them to match their natural data rate to the data rate that is imposed at their interface.
3.4 Inter Channel Synchronization
An important issue when dealing with audio, and 3-D audio in particular, is the phase relationship between different physical audio channels. Indeed, the virtual spatial position of an audio source is directly related to and influenced by the phase differences that are applied to the different physical audio channels used to reproduce the audio source. Therefore, it is imperative that USB audio functions respect the phase relationship among all related audio channels. However, the responsibility for maintaining the phase relation is shared among the USB host software, hardware, and all of the audio peripheral devices or functions.
To provide a manageable phase model to the host, an audio function is required to report its internal delay for every AudioStreaming interface. This delay is expressed in number of frames (ms) and is due to the fact that the audio function must buffer at least one frame worth of samples to effectively remove packet jitter within a frame. Furthermore, some audio functions will introduce extra delay because they need time to correctly interpret and process the audio data streams (for example, compression and decompression). However, it is required that an audio function introduces only an integer number of frames of delay. In the case of an audio source function, this implies that the audio function must guarantee that the first sample it fully acquires after SOFn (start of frame n) is the first sample of the packet it sends over USB during frame (n+d). d is the audio function’s internal delay expressed in ms. The same rule applies for an audio sink function. The first sample in the packet, received over USB during frame n, must be the first sample that is fully reproduced during frame (n+d).
By following these rules, phase jitter is limited to ±1 audio sample. It is up to the host software to synchronize the different audio streams by scheduling the correct packets at the correct moment, taking into account the internal delays of all audio functions involved.
USB Device Class Definition for Audio Devices
Release 1.0 March 18, 1998 20
3.5 Audio Function Topology
To be able to manipulate the physical properties of an audio function, its functionality must be divided into addressable Entities. Two types of such generic Entities are identified and are called Units and Terminals.
Units provide the basic building blocks to fully describe most audio functions. Audio functions are built by connecting together several of these Units. A Unit has one or more Input Pins and a single Output Pin, where each Pin represents a cluster of logical audio channels inside the audio function. Units are wired together by connecting their I/O Pins according to the required topology.
In addition, the concept of a Terminal is introduced. There are two types of Terminals. An Input Terminal (IT) is an Entity that represents a starting point for audio channels inside the audio function. An Output Terminal (OT) represents an ending point for audio channels. From the audio function’s perspective, a USB endpoint is a typical example of an Input or Output Terminal. It either provides data streams to the audio function (IT) or consumes data streams coming from the audio function (OT). Likewise, a Digital to Analog converter, built into the audio function is represented as an Output Terminal in the audio function’s model. Connection to the Terminal is made through its single Input or Output Pin.
Input Pins of a Unit are numbered starting from one up to the total number of Input Pins on the Unit. The Output Pin number is always one. Terminals only have one Input or Output Pin that is always numbered one.
The information, traveling over I/O Pins is not necessarily of a digital nature. It is perfectly possible to use the Unit model to describe fully analog or even hybrid audio functions. The mere fact that I/O Pins are connected together is a guarantee (by construction) that the protocol and format, used over these connections (analog or digital), is compatible on both ends.
Every Unit in the audio function is fully described by its associated Unit Descriptor (UD). The Unit Descriptor contains all necessary fields to identify and describe the Unit. Likewise, there is a Terminal Descriptor (TD) for every Terminal in the audio function. In addition, these descriptors provide all necessary information about the topology of the audio function. They fully describe how Terminals and Units are interconnected.
This specification describes the following seven different types of standard Units and Terminals that are considered adequate to represent most audio functions available today and in the near future:
· Input Terminal
· Output Terminal
· Mixer Unit
· Selector Unit
· Feature Unit
· Processing Unit
· Extension Unit
The ensemble of UDs and TDs provide a full description of the audio function to the Host. A generic audio driver should be able to fully control the audio function, except for the functionality, represented by Extension Units. Those require vendor-specific extensions to the audio class driver.
The descriptors are further detailed in Section 4, “Descriptors” of this document.
Inside a Unit, functionality is further described through audio Controls. A Control typically provides access to a specific audio property. Each Control has a set of attributes that can be manipulated or that present additional information on the behavior of the Control. A Control can have the following four attributes:
· Current setting attribute
· Minimum setting attribute
· Maximum setting attribute
最終更新:2011年05月22日 10:28