The SMIL <par> element has a child element named <audio>. This is an empty element that only contains a reference to an external media object. It does not contain audio data in itself.
Typical syntax of the <audio> element is:
<audio src="audiofile.mp3"
clip-begin="npt=1.000s"
clip-end="npt=10.392s"
id="id-value" />
The src attribute contains a URI that points to the audio file that is currently synchronized.
If this synchronization point does not include the audio file in its entirety, it is possible to specify that a only certain segment of the audio file should be played. This is done using the clip-begin and clip-end attributes. These specify the begin time and end time in the audiofile respectively.
The id attribute contains a unique identifer of the <audio> element itself.
If phrase detection has been activated in the production tool, the <par> element will have a <seq> child that contains several <audio> elements as children.
<seq>
<audio [...] />
<audio [...] />
<audio [...] />
</seq>