Call Transcripts Data Formats
About Call Transcripts Data Formats
XM Discover enables you to call transcripts (i.e. transcripts of audio conversations) via CSV, Excel, JSON, or WebVTT format. Call transcripts identify the participants in a conversation and attribute each message to a participant.
Typically, call transcripts contain a number of structured and unstructured data fields that represent a conversation between a customer and an entity at your company (for example, the transcript between a customer and your automated phone service, or the call transcript between a customer and a live support representative). Structured fields may contain dates, numbers, or text data with a high degree of organization (such as names of brands, participant names, and products). Unstructured fields contain notes, comments, and other open-ended text fields.
You can upload call via the following formats:
- CSV
- XLS or XLSX (Microsoft Excel)
- JSON
- WebVTT
CSV and Excel Formatting for Call Transcripts
This section covers formatting for call transcripts for CSV and Excel files. The formatting and requirements for both file types are the same.
In CSV and Excel files, call transcripts are defined using multiple rows. Here’s how it works:
- Each row contains an individual line of dialogue in a conversation along with participant data and a timestamp.
- Separate rows are rolled into a single conversation by sharing the same conversation ID.
- Conversation-wide field values (such as Document Date or custom attributes) are taken from the first row of the conversation.
| Element | Description | 
| conversationId (Required) | A unique ID for the entire conversation. Each row that has the same ID is treated as a separate line within a single conversation. You can map this field to the natural_id attribute to use it as the document’s Natural ID. | 
| conversationTimestamp (Required) | The date and time of the entire conversation. Use the ISO 8601 format with seconds precision. You can map this field to the document_date attribute to use it as Document Date. | 
| participantId (Required) | The ID of the participant. Must be unique per conversation (document). | 
| participantType (Required) | The type of the participant. Possible values: 
 These values are passed through to the CB Participant Type attribute for reporting and participants visualization. If unspecified, CB Participant Type will have no reportable value. | 
| is_ivr (Optional) | A Boolean field that indicates whether a participant is an Interactive Voice Response (IVR) bot or a person. 
 These values are passed through to the CB Kind of Participant attribute for reporting and participants visualization. If unspecified, CB Kind of Participant will have no reportable value. | 
| text (Required) | Speech transcript. Attention: A sum of all text elements may not exceed 100,000 characters. If it does, the document is skipped. | 
| start (Required) | The time the speech starts (in milliseconds passed since the beginning of the conversation). | 
| end (Required) | The time the speech ends (in milliseconds passed since the beginning of the conversation). | 
| contentSegmentType (Required) | This parameter identifies the transcript format, which allows the Natural Language Processing (NLP) engine to process data correctly. Possible values: 
 | 
| custom fields (Optional) | You can provide multiple fields to add structured attributes to the conversation. | 
JSON Formatting for Call Transcripts
This section contains JSON formatting for call transcripts.
Top-Level Objects
The following table describes the top-level objects of a document node.
| Element | Description | 
| conversationId | A unique ID for the entire conversation. You can map this field to the natural_id attribute to use it as the document’s Natural ID. | 
| conversationTimestamp | The date and time of the entire conversation. Use the ISO 8601 format with seconds precision. You can map this field to the document_date attribute to use it as Document Date. | 
| content | An object that contains the content of the conversation. Includes these nested objects: 
 | 
| custom fields (attributes) | You can provide multiple key-value pairs to add structured attributes to the conversation. | 
content Object
The following table describes the objects nested inside the content object.
| Element | Description | 
| participants | An array of objects that provides information about the participants of the conversation. Includes these fields: 
 | 
| conversationContent | An array of objects that contains the lines of the conversation. Includes these fields: 
 | 
| contentSegmentType (required) | This parameter identifies the transcript format, which allows the Natural Language Processing (NLP) engine to process data correctly. Possible values: 
 | 
participants Object
The following table describes the fields nested inside the participants object.
| Element | Description | 
| participant_id (required) | The ID of the participant. Must be unique per conversation (document). | 
| type (Required) | The type of the participant. Possible values: 
 These values are passed through to the CB Participant Type attribute for reporting and participants visualization. If unspecified, CB Participant Type will have no reportable value. | 
| is_ivr (Optional) | A Boolean field that indicates whether a participant is an Interactive Voice Response (IVR) bot or a person. 
 These values are passed through to the CB Kind of Participant attribute for reporting and participants visualization. If unspecified, CB Kind of Participant will have no reportable value. | 
conversationContent Object
The following table describes the fields nested inside the conversationContent object.
| Element | Description | 
| participant_id (Required) | The ID of the participant who is speaking. Must match one of the IDs provided in the participants array. | 
| text (Required) | Speech transcript. Attention: A sum of all text elements may not exceed 100,000 characters. If it does, the document is skipped. | 
| start (Required) | The time the speech starts (in milliseconds passed since the beginning of the conversation). | 
| end (Required) | The time the speech ends (in milliseconds passed since the beginning of the conversation). | 
Example
Here is an example of a call transcript between an agent and a client.
[
{
"conversationId": "46289",
"conversationTimestamp": "2020-07-30T10:15:45.000Z",
"content": {
"participants": [
{
"participant_id": "1",
"type": "AGENT",
"is_ivr": false
},
{
"participant_id": "2",
"type": "CLIENT",
"is_ivr": false
}
],
"conversationContent": [
{
"participant_id": "1",
"text": "This is Emily, how may I help you?",
"start": 22000,
"end": 32000
},
{
"participant_id": "2",
"text": "Hi, I have a couple of questions.",
"start": 32000,
"end": 42000
}
],
"contentSegmentType": "TURN"
},
"city": "Boston",
"source": "Call Center"
}
]
WebVTT Formatting for Call Transcripts
You can upload call transcripts using WebVTT formatting.
The Document Date is automatically taken from the file name if available. To set the Document Date automatically, make sure the file name starts with the following prefix:
<Timezone><YYYY><MM><DD>-
If the file names use a different format, apply a date transformation to the Document Date field on the mappings step. For details, please see Setting a Specific Document Date.
Example
Here is an example of a Zoom call transcript in WebVTT format.
WEBVTT
1
00:00:00.599 --> 00:00:02.280
John Smith: Alright so let me
2
00:00:04.230 --> 00:00:05.339
John Smith: start sharing
3
00:00:12.809 --> 00:00:13.469
John Smith: My screen.
4
00:00:15.750 --> 00:00:18.119
John Smith: Can everybody see it.
5
00:00:19.050 --> 00:00:28.890
Paul Jones: Yes I can see it.