アットウィキロゴ

Audio Data Formats 1.0(6-10)


USB Device Class Definition for Audio Data Formats
Release 1.0 March 18, 1998 6

1 Introduction

The intention of this document is to describe in detail all the Audio Data Formats that are supported by the Audio Device Class. This document is considered an integral part of the Audio Device Class Specification, although subsequent revisions of this document are independent of the revision evolution of the main USB Audio Specification. This is to easily accommodate the addition of new Audio Data Formats without impeding the core USB Audio Specification.

1.1 Related Documents

· Universal Serial Bus Specification, 1.0 final draft revision (also referred to as the USB Specification). In particular, see Chapter 9, “USB Device Framework.”
· Universal Serial Bus Device Class Definition for Audio Data Formats (referred to in this document as USB Audio Data Formats).
· Universal Serial Bus Device Class Definition for Terminal Types (referred to in this document as USB Audio Terminal Types).
· ANSI S1.11-1986 standard.
· MPEG-1 standard ISO/IEC 111172-3 1993.
· MPEG-2 standard ISO/IEC 13818-3 Feb. 20, 1997.
· Digital Audio Compression Standard (AC-3), ATSC A/52 Dec. 20, 1995. (available from http://www.atsc.org)
· ANSI/IEEE-754 floating-point standard.
· ISO/IEC 958 International Standard: Digital Audio Interface and Annexes.
· ISO/IEC 1937 standard.
· ITU G.711 standard.

1.2 Terms and Abbreviations

This section defines terms used throughout this document. For additional terms that pertain to the Universal Serial Bus, see Chapter 2, “Terms and Abbreviations,” in the USB Specification.

Audio Frame A collection of audio subframes, each containing a PCM audio sample of a different physical audio channel, taken at the same moment in time.
Audio Stream A concatenation of a potentially very large number of audio frames ordered according to ascending time.
Audio Subframe Holds a single PCM audio sample.
DVD Acronym for Digital Versatile Disc.
Encoded Audio Bitstream A concatenation of a potentially very large number of encoded audio frames, ordered according to ascending time.
Encoded Audio Frame A sequence of bits that contains an encoded representation of one or more physical audio channels.
MPEG Acronym for Moving Pictures Expert Group.
PCM Acronym for Pulse Coded Modulation.
Transfer Delimiter A unique token that indicates an interruption in an isochronous data packet stream. Can be either a zerolength data packet or the absence of an isochronous transfer in a certain USB frame.

USB Device Class Definition for Audio Data Formats
Release 1.0 March 18, 1998 7
blank page

USB Device Class Definition for Audio Data Formats
Release 1.0 March 18, 1998 8

2 Audio Data Formats

Audio Data Formats can be divided in three main groups according to type.

The first group, Type I, deals with audio data streams that are constructed on a sample-by-sample basis. Each audio sample is represented by a single independent symbol and the data stream is built up by concatenating those symbols. Different compression schemes may be used to transform the audio samples into symbols. If multiple physical audio channels are formatted into a single audio channel cluster, then samples at time x of subsequent channels are transmitted interleaved, according to the cluster channel ordering as described in the main USB Audio Specification, followed by samples at time x+1, interleaved in the same fashion and so on. The notion of physical channels is explicitly preserved during transmission. A typical example of Type I formats is the standard PCM audio data.

The second group, Type II, deals with those formats that do not preserve the notion of physical channels during the transmission. Typically, all non-PCM encoded audio data streams belong to this group. A number of audio samples, often originating from multiple physical channels, are encoded into a number of bits in such a way that, after transmission, the original audio samples can be reconstructed to a certain degree of accuracy. The number of bits used for transmission is typically one or more orders of magnitude smaller than the number of bits needed to represent the original PCM audio samples, effectively realizing a considerable bandwidth reduction during transmission.

The third group, Type III, contains special formats that do not fit in both previous groups. In fact, they mix characteristics of Type I and Type II groups to transmit audio data streams. One or more non-PCM encoded audio data streams are packed into “pseudo-stereo samples” and transmitted as if they were real stereo PCM audio samples. The sampling frequency of these pseudo samples matches the sampling frequency of the original PCM audio data streams. Therefore, clock recovery at the receiving end is easier than it is in the case of Type II formats. The drawback is that unless multiple non-PCM encoded streams are packed into one pseudo stereo stream, more bandwidth than necessary is consumed.

Section A.1, “Audio Data Format Codes” summarizes the Audio Data Formats that are currently supported in the Audio Device Class. The following sections explain those formats in more detail.

2.1 Transfer Delimiter

Isochronous data streams are continuous in nature, although the actual number of bytes sent per packet may vary throughout the lifetime of the stream (for rate adaptation purposes for instance). To indicate a temporary stop in the isochronous data stream without closing the pipe (and thus relinquishing the USB bandwidth), an in-band Transfer Delimiter needs to be defined. This specification considers two situations to be a Transfer Delimiter. The first is a zero-length data packet and the second is the absence of an isochronous transfer in a particular USB frame. Both situations are considered equivalent and the audio function is expected to behave the same. However, the second type consumes less isochronous USB bandwidth (i.e. zero bandwidth). In both cases, this specification considers a Transfer Delimiter to be an entity that can be sent over the USB.

2.2 Type I Formats

The following sections describe the Audio Data Formats that belong to Type I. A number of terms and their definition are presented.

2.2.1 USB Packets

Audio data streams that are inherently continuous must be packetized when sent over the USB. The quality of the packetizing algorithm directly influences the amount of effort needed to reconstruct a reliable sample clock at the receiving side. The goal must be to keep the instantaneous number of samples

USB Device Class Definition for Audio Data Formats
Release 1.0 March 18, 1998 9
per frame (ni) as close as possible to the average number of samples per frame, (nav). The average nav should be calculated as a sliding average over a period of 256 frames.

If the sampling rate is a constant, the allowable variation on ni is limited to one sample, that is, Dni = 1. This implies that all packets must either contain INT (nav ) (small packet) or INT (nav ) + 1 (large packet) samples. For all i:

ni = INT (nav) | INT (nav) + 1

Note: In the case where nav = INT (nav ), ni may vary between INT (nav) - 1 (small packet), INT (nav) (medium packet) and INT (nav) + 1 (large packet).

To limit the needed buffer depths to acceptable limits, this specification limits the cumulative difference between nav and ni to ±1.5 samples.

If the sampling rate can be varied (to implement pitch control), the allowable pitch shift is 1kHz/ms. That is, the allowable variation on ni is limited to one sample per frame. For all i:

ni+1 = ni ± 1

Pitch control is restricted to adaptive endpoints only. AudioStreaming interfaces that support pitch control on their isochronous endpoint are required to report this in the class-specific endpoint descriptor. In addition, a Set/Get Pitch Control request is required to enable or disable the pitch control functionality.

2.2.2 Audio Subframe

The basic structure used to represent audio data is the audio subframe. An audio subframe holds a single audio sample. An audio subframe always contains an integer number of bytes.

This specification limits the possible audio subframe sizes (bSubframeSize) to 1, 2, 3 or 4 bytes per audio subframe. An audio sample is represented using a number of bits (bBitResolution) less than or equal to the total number of bits available in the audio subframe, i.e. bBitResolution £ bSubframeSize*8.

AudioStreaming endpoints must be constructed in such a way that a valid transfer can take place as long as the reported audio subframe size (bSubframeSize) is respected during transmission. If the reported bits per sample (bBitResolution) do not correspond with the number of significant bits actually used during transfer, the device will either discard trailing significant bits ([actual_bits_per_sample] > bBitResolution) or interpret trailing zeros as significant bits ([actual_bits_per_sample] < bBitResolution).

2.2.3 Audio Frame

An audio frame consists of a collection of audio subframes, each containing an audio sample of a different physical audio channel, taken at the same moment in time. The number of audio subframes in an audio frame equals the number of logical audio channels in the audio channel cluster. The ordering of the audio subframes in the audio frame obeys the rules set forth in the USB Audio Specification. All audio subframes must have the same audio subframe size.

2.2.4 Audio Streams

An audio stream is a concatenation of a potentially very large number of audio frames, ordered according to ascending time. Streams are packetized when transported over USB whereby USB packets can only contain an integer number of audio frames. Each packet always starts with the same channel, and the channel order is respected throughout the entire transmission. If, for any reason, there are no audio frames available to construct a USB packet, a Transfer Delimiter must be sent instead.

USB Device Class Definition for Audio Data Formats
Release 1.0 March 18, 1998 10

2.2.5 Type I Format Type Descriptor

The Type I format type descriptor starts with the usual three fields: bLength, bDescriptorType, and bDescriptorSubtype.

The bFormatType field indicates this is a Type I descriptor. The bNrChannels field contains the number of physical channels in the audio data stream. The bSubframeSize field indicates how many bytes are used to transport an audio subframe. The bBitResolution field indicates how many bits of the total number of available bits in the audio subframe are truly used by the audio function to convey audio information.

The sampling frequency capabilities of the isochronous data endpoint of the AudioStreaming Interface are reported as well. Depending on the bSamFreqType field, the length of the descriptor varies and the interpretation of the trailing fields differs. Sampling frequencies occupy three bytes and are expressed in Hz to support over-sampled, reduced bit-resolution systems (the range is from 0 to 16,777,215 Hz).

Table 2-1: Type I Format Type Descriptor
Offset Field Size Value Descriptio
0 bLength 1 Number Size of this descriptor, in bytes: 8+(ns*3)
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type.
2 bDescriptorSubtype 1 Constant FORMAT_TYPE descriptor subtype.
3 bFormatType 1 Constant FORMAT_TYPE_I. Constant identifying the Format Type the AudioStreaming interface is using.
4 bNrChannels 1 Number Indicates the number of physical channels in the audio data stream.
5 bSubframeSize 1 Number The number of bytes occupied by one audio subframe. Can be 1, 2, 3 or 4.
6 bBitResolution 1 Number The number of effectively used bits from the available bits in an audio subframe.
7 bSamFreqType 1 Number Indicates how the sampling frequency can be programmed:
0: Continuous sampling frequency
1..255: The number of discrete sampling frequencies supported by the isochronous data endpoint of the AudioStreaming interface (ns)
8... See sampling frequency tables, below.
Depending on the value in the bSamFreqType field, the layout of the next part of the descriptor is as shown in the following tables.

Table 2-2: Continuous Sampling Frequency
Offset Field Size Value Descriptio


1 - 6 - 11 - 16 - 21 - 26 - 31

タグ:

+ タグ編集
  • タグ:
最終更新:2011年05月22日 14:39