Text Mining Opportunities: White Paper


Project Line:
Methods and Guidelines
Project Number:
Final Biosimilar Summary Dossier Issued:

Information specialists undertake a wide variety of literature searches to inform rapid reviews, systematic reviews, health technology assessments (HTAs), and economic evaluations. Searches can range from extensive sensitive searches to inform HTAs to more focused and pragmatic rapid searches for products with shorter time frames. There are many challenges for information specialists when producing efficient search strategies, particularly when time is short and/or search topics are complex or vocabulary is vague.

Text mining applications (TMAs) offer opportunities to introduce efficiencies into some information retrieval tasks. For example, TMAs can analyze bibliographic citations and provide data about the terms and concepts within those citations, which might help with strategy development, or may be able to automate and/or speed up search strategy development. Text mining covers a wide variety of techniques that involve using computers to analyze words and their relationships within text. Text mining can range from simple counts of the number of times that words appear in texts (frequency analysis), to machine learning that can distinguish texts by content following a training exercise, and even to semantic analyses that can analyze words according to their meaning within texts.

This white paper explores a range of TMAs to identify whether there are any practical, ready-to-use tools that might help information specialists now with their literature searching tasks. The TMAs have been assessed with a focus on specific HTA products and stages within projects where possible. This white paper also provides some insights into the challenges of using more sophisticated TMAs in daily information retrieval practice.