Skip to main content

Try Qualtrics for free

Free Account

What is text mining?

11 min read
Advances in technology have made it possible for machines to extract complex, semantically meaningful conclusions from vast amounts of text using a range of tools and technologies. Welcome to the world of text mining and text analytics. Discover how to utilise both to enhance your business outcomes.


Our world has been transformed by the ability of computers to process vast quantities of data. But although machines can quantify, itemize and analyze text data, they’re not so great at figuring out what the people who wrote it are thinking and feeling.

That is, until recently. Advances in technology have made it possible for machines to extract complex, semantically meaningful conclusions from vast amounts of text using a range of tools and technologies. Welcome to the world of text mining and text analytics.

Text mining definition

So what is text mining?

Text mining is the process of turning natural language into something that can be manipulated, stored, and analyzed by machines. It’s all about giving computers, which have historically worked with numerical data, the ability to work with linguistic data – aka natural language understanding.

There are three key concepts in text mining: structured, semi-structured, and unstructured data.

  • Unstructured data is language in its natural form, as created for and by human beings. This article is an example of unstructured data. As well as written content, unstructured data can take the form of video or audio files.
  • Structured data is information presented in a consistent format so that it’s easy for computers to analyse and store. A list of phone numbers is an example of structured data.
  • Semi-structured data is somewhere between the two — essentially, data is in an organised form but lacks the structure computers need to analyse it.

Unstructured data image

 

Image Source: Text Target

Text mining is the process of turning unstructured data or semi-structured data into structured data.

Although you can apply text mining technology to video and audio, it’s most commonly used on text.  Text mining is sometimes described as text data mining.

Text mining vs. text analysis

What’s the difference between text mining and text analysis? Well, the two terms are often used interchangeably, but they do have subtly different meanings.

Both text mining and text analysis describe several methods for extracting valuable information from large quantities of human language. The practices are closely related and often work together, resulting in a significant overlap in how people use the two terms.

  • Text mining focuses on turning natural language data into a structured format suitable for computers.
  • Text analysis describes the aspect of text mining that looks at patterns and trends in the data, and produces insights that aren’t apparent from just looking at the language itself.

In this article, we’ll use the term text mining to cover both these bases.

How is text mining different from using a search engine?

Search engines are powerful tools that make huge quantities of information available to us. However, the level of text analysis a search engine uses when crawling the web is basic compared to the way text mining techniques work.

Rather than looking for keywords and other signals of quality and relevance as search engines do, text mining software can parse and assess every word of a piece of content. Text mining algorithms may also take into account semantic and syntactic features of language to draw conclusions about the topic, the author’s emotions, and their intent in writing or speaking.

Text mining and text analysis in action

So what are the applications of these technologies and what are some typical text mining tasks? Here are a few examples:

●      Customer experience

Text mining allows a business to monitor how and when its products and brand are being talked about. Using sentiment analysis, the company can detect positive or negative emotion, intent and strength of feeling as expressed in different kinds of voice and text data. Then if certain criteria are met, automatically take action to benefit the customer relationship, e.g. by sending a promotion to help prevent customer churn.

●      Customer service

Text mining plays a central role in building customer service tools like chatbots. Using training data from previous customer conversations, text mining software can help generate an algorithm capable of natural language understanding and natural language generation.

●      Market research

By analysing social media, chat messages, and customer reviews, text mining can help paint a picture of how a brand is perceived in relation to its competitors, the level of brand familiarity among the target audience, and what its perceived strengths and weaknesses are.

●      Product development and design

Product teams can get an at-a-glance summary of how customers feel about an existing product in order to make it better. Or use text mining tools to find out where there are promising gaps in the market.

●      Fraud prevention

Text mining is useful in finance and insurance. It can flag inconsistencies and potential fraud situations — for example, by combing the unstructured text data entered in application documents.

●      Content selection

Content publishing and social media platforms can also use text mining to analyse user-generated information such as profile details and status updates. The service can then automatically serve relevant content and targeted ads to its users.

The business benefits of text mining

Typical businesses now deal with vast amounts of data from all kinds of sources. The amount of data produced, collected, and processed has increased by approximately 5000% since 2010.

As well as the traditional information, like accounting and record-keeping, customer details, HR records, and marketing lists, brands must now contend with a whole new layer of data.

There are external data like social media, emails and instant messages, reviews, news media and IoT data, and internal data, such as meta-data and analytic information, customer profiles used for personalisation, compliance check data, and many more sources, all producing streams of information 24 hours a day. Some of it is structured, but much of it is unstructured text data.

Data grid chart

Image Source: Demand Planning

Dealing with this much information manually has become impossible, even for the largest and most successful businesses. It’s no longer a human task.

All of this means companies have become much more selective and sophisticated when it comes to navigating data that are related to their activities. They must choose what sorts of information they capture and plan strategically to filter out the noise and arrive at the insights that will have the most impact.

Text mining, with its advanced ability to assimilate, summarise and extract insights from high-volume unstructured data, is an ideal tool for the task.

Text mining technologies

To get from a heap of unstructured text data to a condensed, accurate set of insights and actions takes multiple text mining techniques working together, some in sequence and some simultaneously. The text data has to be selected, sorted, organised, parsed and processed, and then analysed in the way that’s most useful to the end-user. Finally, the information can be presented and shared using tools like dashboards and data visualisation.

Here are a few of the technologies involved in text mining:

Visual of text mining processes

Image Source: Tech Target

1.   Natural language processing (NLP)

Natural language processing is a kind of AI (artificial intelligence). It focuses on giving machines human-like abilities in processing human voices or written communications.

Natural Language Processing

Site Source: Devopedia

It blends techniques from computer science, such as machine learning, and the field of linguistics to create a product that not only understands the semantic meanings of words but can also infer the sentiment and intent of the speaker or writer (an aspect of NLP known as sentiment analysis).

Natural language processing is used in all kinds of contexts, including familiar ones like customer service chatbots, satnavs, and voice assistants.

2.   Information retrieval

Information retrieval means identifying and collecting the relevant parts of a large quantity of unstructured data. Using a search engine is a form of information retrieval.

3.   Information extraction

Information extraction is the element of text mining that separates and sorts unstructured data into structured data that can be processed and edited. It identifies the entities (items like people, things, or companies) attributes, and relationships and stores the information in a database where it can easily be accessed.

4.   Data mining

Data mining is the process of finding trends, patterns, correlations, and other kinds of emergent information in a large body of data. It uses structured data – which means that in a text mining context, it happens once the information has been retrieved and extracted.

XM Discover: Text mining the easy way

But what if there was an easier way to extract meaning from text and utilise it to generate valuable insights? With XM Discover, you can.

XM Discover is a powerful listening tool that harnesses the power of text mining to tell you at a glance how your customers feel about your company, products, and services. It monitors multiple channels simultaneously to give you a big-picture overview of what your customers most want from you, why they feel the way they do, and how strongly they feel. It highlights when ‘red flag’ moments are occurring so that you can take action swiftly.

XM Discover offers a human-level understanding of conversational language, so it can go to work on straightforward tasks, freeing up agents to use their time on more complex tasks. It’s a unique blend of technologies that makes real-time listening and powerful insights accessible to all levels of your business, requiring no specialist knowledge.

XM Discover is part of the wider Qualtrics platform, which means the insights you generate can be added to other sources of data to provide a richer and more complete understanding of what’s happening in and around your business. This allows you to make predictions and act with confidence. With XM Discover at your fingertips, you have the foundation for the most complete data repository in existence.

Go deeper: find out how sentiment analysis benefits your business