Natural Language Processing and Systematic Reviews Information Extraction
Applying tongue processing and machine learning techniques to patient experience feedback: a systematic review
Abstract
Objectives Unstructured gratuitous-text patient feedback contains rich information, and analysing these information manually would require a lot of personnel resources which are not available in nearly healthcare organisations.To undertake a systematic review of the literature on the employ of natural language processing (NLP) and machine learning (ML) to procedure and analyse gratis-text patient experience data.
Methods Databases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was accounted most appropriate. Data related to the report purpose, corpus, methodology, performance metrics and indicators of quality were recorded.
Results Nineteen articles were included. The majority (80%) of studies applied linguistic communication analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was oftentimes used (n=9), followed by unsupervised (northward=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised arroyo, and gratuitous-text comments held inside structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, remember and F-mensurate, with support vector car and Naïve Bayes being the best performing ML classifiers.
Conclusion NLP and ML accept emerged as an important tool for processing unstructured gratuitous text. Both supervised and unsupervised approaches have their part depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured gratuitous-text data.
- BMJ health informatics
- computer methodologies
- information management
- patient care
http://creativecommons.org/licenses/by-nc/4.0/
This is an open access article distributed in accordance with the Artistic Commons Attribution Non Commercial (CC Past-NC iv.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes fabricated indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- BMJ wellness informatics
- computer methodologies
- data management
- patient intendance
Summary
What is already known?
-
The ability to analyse and interpret free-text patient experience feedback falls brusk due to the resource intensity required to manually excerpt crucial information.
-
A semiautomated process to rapidly identify and categorise comments from free-text responses may overcome some of the barriers encountered, and this has proven successful in other industries.
What does this paper add?
-
Natural language processing and motorcar learning (ML) accept emerged as an of import tool for processing unstructured free text from patient feel feedback.
-
Comments extracted from social media were unremarkably analysed using an unsupervised approach, and gratuitous-text comments held within structured surveys were analysed using a supervised approach.
-
Healthcare organisations can use the various ML approaches depending on the source of patient experience free-text data, that is, solicited or unsolicited (social media), to gain most real-time insight into patient feel.
Background
Over the last decade, at that place has been a renewed try focusing on patient experiences, demonstrating the importance of integrating patients' perceptions and needs into intendance delivery.1 2 As healthcare providers continue to go patient-centric, it is essential that stakeholders are able to measure, report and improve feel of patients nether their care. Policy discourse has progressed from beingness curious virtually patients' feedback, to actually collecting and using the output to drive quality improvement (QI).
In the English language National Wellness Service (NHS), Us and many European health systems patient feel data are arable and publicly bachelor.3 4 NHS England commissions the Friends and Family Test (FFT), a continuous improvement tool allowing patients and people who use NHS services to feedback on their experience.5 It asks users to charge per unit services, or experiences, on a numerical calibration such as the Likert calibration. In addition to quantitative metrics, feel surveys such as the FFT besides include qualitative data in the form of patient narratives. Testify suggests that when staff are presented with both patient narratives and quantitative data, they tend to pay more attention to the narratives.half-dozen Patient narratives can even complement quantitative data past providing information on experiences not covered past quantitative data,7 viii and give more detail that may assist contextualise responses to structured questions. These free-text comments can be especially valuable if they are reported and analysed with the same scientific rigour already accorded to closed questions.ix ten However, this process is limited by human being resources and the lack of a systematic way to extract the useful insights from patient free-text comments to facilitate QI.11 12
Natural linguistic communication processing (NLP) and auto learning (ML)
A potential solution to mitigate the resource constraints of qualitative analysis is NLP. NLP is currently the most widely used 'large data' belittling technique in healthcare,xiii and is divers as 'any computer-based algorithm that handles, augments and transforms natural language so that it can be represented for computation.'14 NLP is used to extract information (ie, catechumen unstructured text into a structured course), perform syntactic processing (eg, tokenisation), capture significant (ie, ascribe a concept to a word or group of words) and identify relationships (ie, ascribe relationships betwixt concepts) from natural language free text through the use of defined language rules and relevant domain knowledge.fourteen–16 With regards to text analytics, the term ML refers to the awarding of a combination of statistical techniques in the form of algorithms that are able to consummate diverse ciphering tasks,17 including detect patterns including sentiment, entities, parts of speech and other phenomena within a text.18
Text assay
Topic or text analysis is a method used to analyse large quantities of unstructured data, and the output reveals the principal topics of each text.19 20 ML enables topic analysis through automation using various algorithms, which largely falls under two principal approaches, supervised and unsupervised.21 The divergence betwixt these 2 main classes is the being of labels in the preparation data subset.22 Supervised ML involves predetermined output aspect likewise the apply of input attributes.23 The algorithms attempt to predict and classify the predetermined attribute, and their accuracies and misclassification alongside other performance measures are dependent on the counts of the predetermined aspect correctly predicted or classified or otherwise.22 In healthcare, Doing-Harris et al24 identified the most common topics in free-text patient comments collected by healthcare services past designing automatic topic classifiers using a supervised approach. Conversely, unsupervised learning involves design recognition without the involvement of a target aspect.22 Unsupervised algorithms identify inherent groupings within the unlabelled information and subsequently assign label to each data value.25 Topics within a text can be detected using topic analysis models, simply by counting words and group similar words. Besides discovering the most frequently discussed topics in a given narrative, a topic model can exist used to generate new insights within the gratuitous text.26 Other studies have scraped patient experience data within comments from social media to detect topics using an unsupervised approach.27 28
Sentiment assay
Sentiment analysis, also known as opinion mining, helps determine the emotive context within free-text data.29 30 Sentiment analysis looks at users' expressions and in plow associates emotions within the analysed comments.31 In patient feedback, it uses patterns among words to classify a annotate into a complaint, or praise. This automated process benefits healthcare organisations by providing quick results when compared with a manual arroyo and is mostly free of human bias, all the same, reliability depends on the method used.27 32 33 Studies have measured the sentiment of comments on the main NHS (NHS choices) over a ii-yr period.27 34 They institute a potent understanding between the quantitative online rating of healthcare providers and analysis of sentiment using their individual automated approach.
NLP and patient experience feedback
Patient experience is by and large in natural language and in narrative free text. Almost healthcare organisations concur large datasets pertaining to patient experience. In the Englanish NHS almost thirty million pieces of feedback have been collected, and the total rises past over a million a month, which co-ordinate to NHS England is the 'biggest source of patient opinion in the globe'.5 Analysing these data manually would require a lot of personnel resources which are non available in most healthcare organisations.5 35 Patient narratives incorporate multiple sentiments and may be about more than i care aspect; therefore, it is a claiming to extract information from such comments.36 The appearance of NLP and ML makes it far more feasible to analyse these data and tin can provide useful insights and complement structured data from surveys and other quality indicators.37 38
Outside of a healthcare organisation, at that place is an abundance of patient feedback on social media platforms such equally Facebook, Twitter, and in the United kingdom of great britain and northern ireland, NHS Choices and Care Stance and other patient networks. This type of feedback gives information on non-traditional metrics, highlighting what patients truly value in their experiences by offering nuances that is often lacking in structured surveys.39 Sentiment analysis has been practical ad hoc to online sources, such as blogs and social media7 27 33 34 demonstrating in principle the utility of sentiment analysis for patient experience. In that location appears to be an appetite to explore the possibilities offered by NLP and ML within healthcare organisations to turn patient experience data into insight that can drive intendance commitment.40 41 Nonetheless, healthcare services need to be cognizant of what NLP methodology to use depending on the source of patient feel feedback.v To date, no systematic review related to the automated extraction of data from patient experience feedback using NLP has been published. In this paper, nosotros sought to review the trunk of literature and report the country of the science on the utilise of NLP and ML to procedure and analyse information from patient experience free-text feedback.
The aim of this written report is to systematically review the literature on the employ of NLP and ML to process and analyse free-text patient experience data. The objectives were to describe: (1) purpose and data source; (2) information (patient feel theme) extraction and sentiment analysis; (3) NLP methodology and functioning metrics and (iv) assess the studies for indicators of quality.
Methods
Search strategy
The following databases were searched from Jan 2000 and Dec 2019; MEDLINE, EMBASE, PsycINFO, The Cochrane Library (Cochrane Database of Systematic Reviews, Cochrane Central Annals of Controlled Trials, Cochrane Methodology Register), Global Health, Health Management Information Consortium, CINAHL and Spider web of Scientific discipline. Grey literature and Google Scholar were used to extract articles that were non retrieved in the databases searched. Attributable to the variety of terms used inferring patient experience, combinations of search terms were used. The search terms, derived from the Medical Discipline Headings vocabulary (The states National Library of Medicine) for the database queries that were used tin be found below. A review of the protocol was not published.
"tongue processing" OR "NLP" OR "text mining" OR "sentiment analysis" OR "opinion mining" OR "text classification" OR "document classification" OR "topic modelling" OR "machine learning" "supervised auto learning" OR "unsupervised car learning" AND "feedback" OR "surveys and questionnaires" OR "data collection" OR "wellness intendance surveys" OR "assessment" OR "evaluation" AND "patient centred care" OR "patient satisfaction" OR "patient experience".
Inclusion criteria
To be eligible for inclusion in the review, the principal requirement was that the article needed to focus on the clarification, evaluation or use of NLP algorithm or pipeline to process or analyse patient experience information. The review included randomised controlled trials, not-randomised controlled trials, case–control studies, prospective and retrospective cohort studies and qualitative studies. Queries were limited to English language linguistic communication but not date constraints. We excluded studies that gathered patient-reported result measurements, symptom monitoring, symptom information, quality of life measures and ecological momentary assessment without patient experience data. Briefing abstracts were excluded, every bit there was limited particular in the methodology to score confronting quality indicators.
Study selection
The research adhered to the guideline presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 checklist.42 The initial search returned 1007 papers; after removing duplicates 241 papers were retained. The titles and abstract were screened by two reviewers (MK and PA) independently, and discrepancies were resolved by a tertiary reviewer (EM). 30-i articles were identified as potentially eligible for inclusion. Full-text articles were retrieved and assessed for inclusion by the aforementioned reviewers, of which 19 were retained for final inclusion. The main reason for exclusion was the articles reported other patient-reported feedback and non patient experience. Figure 1 illustrates the PRISMA flowchart representing the study selection procedure and reasons for exclusion.
Information collection procedure
We developed a data collection tool with the following information fields: section of corresponding authors, country of study, study purpose, data source, solicited feedback, time period, information extraction method, data processing, ML classifiers, text assay approach, software, performance, key findings and limitations. Two reviewers (MK and PA) independently completed the data drove, and met to compare the results, and discrepancies were resolved by a third reviewer (EM).
Data synthesis
Due to the heterogeneous nature of the studies, a narrative synthesis was accounted most advisable. A formal quality assessment was not conducted, every bit relevant reporting standards have not been established for NLP articles. Instead, we report indicators of quality guided by elements reported in previous NLP-focused systematic reviews.43–46 Nosotros included information related to the report purpose, corpus (eg, data source and number of comments), NLP (eg, methodology and software used and performance metrics). Two reviewers (MK and PA) independently evaluated indicators of quality in each study, disagreements in evaluation were resolved by discussion with a tertiary reviewer (EM). Inter-rater agreement Cohen'due south kappa was calculated. In the reviewed studies, we assessed the NLP methodology and the rationale for its use. The key NLP approaches were summarised based on text analysis incorporating either text classification or topic modelling depending on the corpus available and evaluation was done as to whether sentiment analysis was performed using existing or bespoke software.
Functioning metrics
To understand how well an automatic ML algorithm performs, there are a number of statistical values that help determine its operation with the given data.18 Algorithm functioning is measured equally recall (proportion of all true positive observations that are right, that is, true positives/(true positives+false negatives)), precision (ratio of correctly predicted positive observations to the full predicted positive observations) and past the F-score which describes overall operation, representing the harmonic mean of precision and recollect.43 K-fold cantankerous-validation is a technique to evaluate predictive models by partitioning the original sample into a training ready to railroad train the model, and a test set to evaluate it. This ensures that the results are non by chance, and therefore ensures the validity of the algorithms performance. Nosotros expect all the recorded performance metrics in each of the included studies in order to proceeds a amend agreement of how the data and ML approach tin influence the performance.
Results
Study characteristics
Year of publication ranged from 2012 to 2020 with almost lxxx% (xv/19) of articles published in the last v years. The written report purpose of the 19 articles was like, in that they applied language analysis techniques on gratis-text patient experience feedback to excerpt information, which included themes or topics and sentiment. The feedback was either solicited24 47–fifty or unsolicited.6 26–28 32 34 51–58 Six studies were from the UK,26–28 48 49 55 2 from Spain,58 of which one included Dutch reviews54 and the residue were conducted in the USA,6 24 32 34 47 fifty 52 53 56 57 of which one51 looked at Chinese language reviews translated in English. The authors of all except one study47 were from a healthcare informatics department.
Data source
The majority (xv/19) of the feedback used for linguistic communication analysis was extracted from social media sites, such as Twitter,28 52 Facebook6 and healthcare specific forums, for case, NHS Choices,26 27 55 Yelp,56 57 RateMDs,32 34 53 Haodf,51 Masquemedicos,54 58 Zorgkaart Nederland.54 RateMDs and Yelp are US platforms that provide data, reviews and ratings on everything from cleanliness of hospital and care center facilities, to clinician knowledge, besides as giving patients the ability to share personal experiences of care. NHS Choices is a Britain-based platform that allows patients, carers and friends to comment on their feel of intendance received in any NHS institution. Haodf, Masquemedicos and Zorgkaart Nederland are platforms that incorporate patient experiences in Chinese, Spanish and Dutch, respectively. 5 studies used the accompanying gratis text from structured patient feedback surveys; Press Ganey,24 fifty vendor supplied (HCAHPS and comments),47 bespoke cancer experience survey with free-text comments,48 Cancer Patient Feel Survey.49 The initial dataset in terms of number of reviews captured to perform language analysis varied significantly from 734 reviews58 to 773 279 reviews.51 Where provided, the number of words, characters or sentences within the reviews varied. Table 1 gives an overview of the length of comments provided as either range, mean or median.
View this table:
- View inline
Tabular array 1
The length of comments provided in five of the nineteen studies, arranged in descending order according to the full number of comments
Software
The virtually common coding surroundings, sometimes used in combination, was Python (n=5)24 49 50 52 53 followed by R (n=3),26 48 55 Waikato Environment for Cognition Assay (n=two),27 34 Motorcar Learning for Language Toolkit (n=two),53 56 RapidMiner (n=2),vi 58 and C++ (n=ane).54
Language analysis approach
Studies used a variety of approaches to develop their language analysis methodology. The two almost mutual approaches were supervised (n=9)half-dozen 27 28 34 47 48 50 52 54 and unsupervised learning (n=6),24 26 51 53 55 56 followed past a combination, that is, (semisupervised) (northward=three),32 57 58 rule-based (northward=1)49 and dictionary look-up (n=1)54 (effigy two). Sentiment analysis with a combination of text assay was performed in ten studies,24 26 28 32 47–49 52 53 57 sentiment analysis alone was performed in four6 28 50 54 and text analysis alone in four studies.51 55 56 58 We describe the details of the two approaches, sentiment analysis and text analysis, which incorporated text classification and topic modelling, categorised as supervised and unsupervised learning, respectively.
Supervised learning
Transmission classification into topics or sentiment was performed in those studies that used a supervised approach. The nigh mutual approach was transmission nomenclature of a subset of comments equally the training fix. The percentage of total number of comments used for transmission classification varied in each study, as did the number of raters. Sentiment was generally expressed every bit positive, negative and neutral. 5 studies did non perform transmission classification and employed existing software to perform the sentiment analysis, that is, TheySayLtd,28 TextBlob,52 SentiWordNet,57 Diction,53 Keras.50 We dissever the supervised approach based on sentiment analysis (tabular array 2A) and text classification (tabular array 2B), where nosotros document the percent of total comments manually classified into categories for sentiment and topics for text classification, the number of raters including the inter-rater agreement and the classifier(s) used for ML. In addition, where reported, nosotros also highlight the configuration employed during the data processing steps. Support vector machine (SVM) was the about normally used classifier (n=vi) followed by Naïve Bayes (NB) (north=five).
View this table:
- View inline
Table 2A
Studies that performed sentiment analysis using supervised approach, including the number of raters and associated inter-rater agreement expressed every bit Cohen'due south kappa (κ), classifiers and configuration practical where reported. Studies are reported in chronological order
View this table:
- View inline
Table 2B
Studies that performed text classification using supervised approach, including the number of rater and associated inter-rater understanding expressed every bit Cohen's kappa (κ), classifiers and configuration practical where reported. Studies are reported in chronological society
Unsupervised learning
Topic modelling is an approach that automatically detects topics within a given comment. 7 studies24 26 32 51 53 55 56 used this approach and majority of the studies (n=6)24 26 51 53 55 56 used latent Dirichlet allocation (LDA). One study32 used a variation, factorial LDA, still this was a semisupervised approach as it involved some manual coding. LDA is a generative model of text that assumes words in a document reflect a mixture of latent topics (each word is associated with a single topic). For the output to exist understandable, the number of topics has to be called, and table 3 demonstrates the variation in topics determined while employing LDA.
View this tabular array:
- View inline
Tabular array three
The number of topics arranged in descending determined in each written report using latent Dirichlet resource allotment as a type of unsupervised learning approach
Performance
Seven studies did not study performance of the NLP algorithm or pipeline.28 32 47 51 53 56 57 The remaining 12 studies reported one or more evaluation metrics such as accuracy, sensitivity, recall, specificity, precision, F-measure. The college the F1 score the improve, with 0 being the worst possible and one existence the best. In the studies that employed a supervised approach, SVM and NB was the preferred classifier every bit it produced better results compared with other classifier demonstrated by the F1 score with sentiment analysis and text nomenclature. Tabular array 4 demonstrates the performance measure reported every bit F-measure or accuracy of the best performing classifiers for sentiment and text analysis using simply supervised approach, and the thousand-fold cross-validation where reported in 12 studies, of which but v studies reported multiple fold validation.
View this table:
- View inline
Tabular array 4
Functioning metrics in the studies used supervised learning (sentiment analysis and text classification). SVM and NB were the preferred classifier as it produced better results demonstrated past the F1 score. Only five studies reported multiple fold validation
Indicators of quality
Inter-rater agreement (Cohen'due south kappa) was calculated every bit 0.91 suggesting an virtually perfect understanding. The individual evaluation with a description on each domain is detailed in table 5. Specifically, clarity of the report purpose statement, and presence of information related to the dataset, the number of comments analysed. information extraction and information processing, adequate description of NLP methodology and evaluation metrics. All studies had at least four of the vii quality indicators. Twelve studies addressed all seven indicators of quality,6 24 26 27 34 48–50 52 54 55 58 and three studies addressed only four.28 47 57
View this table:
- View inline
Table 5
Evaluation of studies and performance metrics
Give-and-take
In this systematic review, we identified 19 studies that evaluated various NLP and ML approaches to analyse free-text patient experience data. The bulk of the studies dealt with documents written in English language, mayhap considering platforms for expressing emotions, opinions or comments related to health issues are mainly orientated towards Anglophones.58 Three studies51 54 58 were conducted using non-English complimentary-text comments, however Hao et al51 and Jimenez-Zafra et al54 translated comments to English that were initially written in Chinese and Spanish, respectively. Accurate and automated analysis is challenging due to the subjectivity, complexity and creativity of the language used, and translating into other language may lose these subtleties. The type of patient feedback information used and selection of ML algorithm tin bear on the upshot of language analysis and classification. Nosotros show how studies used various ML approaches.
The two well-nigh common approaches were supervised and unsupervised learning for text and sentiment analysis. Briefly, text analysis identifies the topic mentioned within a given annotate, whereas sentiment analysis identifies the emotion conveyed. Of the two approaches, the well-nigh common approach used was supervised learning, involving manual nomenclature of a subset of data by themes24 27 34 48 52 and sentiment.6 24 26 27 34 48 52 54 Comprehensive reading of all comments within the dataset remains the 'gilded standard' method for analysing costless-text comments, and is currently the but style to ensure all relevant comments are coded and analysed.48 This demonstrates that language analysis via an ML arroyo is merely as good as the learning set that is used to inform it. The studies that used a supervised arroyo in this review demonstrated that in that location were at to the lowest degree 2 independent reviewers involved in transmission coding, however, there was no consistency in the percentage of full comments coded, how the data was split into training and test prepare, and the k-fold cross-validation used. Inside supervised learning, the most common classifier was SVM followed past NB. SVM and NB have been widely used for document classification, which consistently yield adept nomenclature performance.
NLP has issues processing noisy data, reducing overall accurateness.xviii 59 Pre-processing of textual information is the start and an important step in processing of text that has been proven to ameliorate performance of text classification models. The goal of pre-processing is to standardise the text.59 We noted that pre-processing varied in the studies in this review. In improver to the standard pre-processing steps, that is, conversion to lowercase, stemming, stop word emptying, Alemi et al34 used sparsity dominion and information proceeds, Greaves et al27 used information proceeds and prior polarity and Bahja et al26 used sparsity dominion alone. Plaza-del-Arco et al58 used a combination of stopper and stemmer, and found that the accurateness was best (87.88%) with stemmer alone, however, F-measure was best (71.35%) when no stemmer or stopper was applied. Withal, despite these pre-processing steps, no consensus could exist plant over a preferred supervised ML classification method to use for sentiment or text classification in the patient feedback domain.
The nearly interesting finding in this review was that the ML approach employed corresponded to the information source. The choice of approach is based on the performance metrics of the algorithm results, which depends on three factors.21 First, identifying patterns is dependent on the quality of the information bachelor. In text classification or sentiment analysis, the multifariousness of comments affects the accuracy of the machine prediction. More diversity decreases the ability of the ML algorithm to accurately classify the comment.6 Second, each ML algorithm is governed by different sequential sets of rules for classifying semantic or syntactic relationships inside the given text, and certain algorithms may accommodate some datasets better than others. Third, the larger the preparation sets used the higher the accuracy of the algorithms at identifying similar comments within the wider dataset, but trade-offs with time and human coding are necessary to ensure the method is resource-efficient.21 We plant that comments extracted from social media were ordinarily analysed using an unsupervised approach26 32 51 53 55 56 and free-text comments held within structured surveys were analysed using a supervised approach.six 27 28 34 47 48 l 52 54
There is little evidence in the literature on the statistical properties for the minimum text size needed to perform language assay, principally considering of the difficulty of tongue understanding and the content and context of a text corpus.6 The studies that reported text size demonstrate that the average character count was around 40 words. The domain of patient feedback from gratuitous-text complementing structured surveys appears stock-still in its nature, making it attractive data for supervised learning.31 Simply as the domain is fixed, the perspective of a patient feedback certificate is also fixed31: there is express vocabulary that is useful for commenting about health service, and therefore information technology is possible to anticipate the meaning of various phrases and automatically allocate the comments.34 Rastegar-Mojarad et al57 likewise observed that a small (25%) vocabulary set covered a bulk (92%) of the content of their patients comments, consequent with a study60 exploring consumer health vocabulary used by consumers and healthcare professionals. This suggests that patients apply certain vocabulary when expressing their experience within complimentary-text comments.
The overall domain of patient feedback is the healthcare organisation,31 and this written report revealed the content of reviews tend to focus on a small collection of aspects associated with this as demonstrated by the topics used for text classification in the studies.24 27 34 48 52 In contrast, the studies26 32 51 53 55 56 that performed topic modelling, did and then on the premise that patient feedback comments contain a multitude of different topics. Topic modelling can be useful in evaluating how close results come to what humans with domain knowledge take determined the topics to be, and if this unsupervised approach finds new topics not identified by humans.49 LDA was used to extract a number of topics from the free-text reviews equally they occur in the data without any prior assumption near what patients intendance near. The topics identified by six studies that used LDA did non generate any new topics, which is in keeping with the before finding that consumer healthcare reporting has limited vocabulary. This finding was supported by Doing-Harris et al,24 who showed that their topic modelling results echo topic nomenclature results, demonstrating that no unexpected topics were found in topic modelling.
Other factors should be taken into account when employing LDA. LDA is mainly based on frequency on co-occurrence of words under like topics.51 Topics discovered using LDA may not match the true topics in the data, and short documents, such equally free-text comments, may effect in poor operation of LDA.49 In add-on to the brusk comments, studies in this review also demonstrate that majority of the comments on social media tend to exist positive, in contrast to the negative reviews which are longer but less frequent. Wagland et al48 establish that the content of positive comments was usually much less specific that for negative comments. Therefore an unsupervised arroyo to short positive reviews may not notice new topics, and the low frequency of negative reviews may not highlight new topics either. To mitigate this, there is a role of using a supervised arroyo to identify subcategories for negative reviews.48
Choice of the number of topics for LDA model likewise affects the quality of the output.25 56 If topics are besides few, their content gives insight into simply very general patterns in the text which are not very useful. Besides many topics, on the other manus, go far difficult to find common themes with numerous topics. An LDA topic model with an optimal number of topics should demonstrate meaningful patterns without producing many insignificant topics. The number of topics identified in the studies reviewed26 32 51 53 55 56 was non consistent and ranged from six to 60, demonstrating that deciding on the optimal number is challenging. Operation of the LDA models is affected past semantic coherence (the rate at which topic's virtually common words tend to occur together in the same reviews) and exclusivity (the rate at which most mutual terms are exclusive to individual topics). Both measures are useful guidance of which model to choose,55 notwithstanding, of the six studies that used LDA, only one study55 reported LDA performance measures.
Sentiment assay was commonly conducted using a supervised approach (n=8).6 24 26 27 34 47 48 54 Even though pre-classified, understanding what the comments both negative and positive are specifically talking nigh still requires reading through the comments. NLP makes this process efficient by identifying trends in the comment past sentiment. This review identified the virtually common approach to sentiment nomenclature was to categorise the comment into a single category, that is, positive or negative. However, this implies that there must be polarity associated with a document, which is not always the example. This fails to capture the mixed sentiments or neutral sentiments which could provide useful insights into patient experience. Nawab et alfifty demonstrated that splitting the mixed sentiments by sentences revealed distinct sentiments. Therefore, although the percentage of mixed or neutral sentiment is low compared with the overall dataset, analysis of comments within these mixed and neutral sentiment can provide useful information and therefore should non be discarded.
Greaves et al27 and Bahja et al26 used the associated star rating inside the NHS Choices information to direct train the sentiment tool. This approach is able to make use of the implicit notion that if a patient says they would recommend a infirmary based on star rating, they are then implying a positive sentiment, and conversely if not a negative sentiment, therefore automatically extracting a nominal categorisation. This automated classification removes the need for manual classification and eliminates potential biases of reviewer assignment of comments, but it makes an assumption that star ratings correlate with the sentiment. This is supported by Kowalski,55 who demonstrated intuitive relationships between topics' meanings and star rating across the analysed NHS Choices dataset. In contrast, Alemi et al34 constitute that sentiment in comments from RateMDs are not reflected in the overall rating, for example 6% of the patients who gave highest overall rating still included a complaint in their comments, and 33% of patients who gave everyman overall rating included praise. This suggests that the sentiment may non e'er correlate with the star rating, and therefore researchers need to recognise that the approach used for classification may have implications on validity.
With regard to sentiment analysis of Twitter dataset, Greaves et al28 found no associations when comparing Twitter data to conventional metrics such as patient feel, Hawkins et al52 establish no correlation betwixt twitter sentiment and HCAHPS score, suggesting twitter sentiment must be treated charily in understanding quality. Therefore, although star ratings can exist informative and in line quantitative measures of quality, they may not be sufficiently granular to help evaluate service quality based solely on the star rating without considering the textual content.53
Studies in this review demonstrate that NLP and ML have emerged equally an important tool for processing patient experience unstructured complimentary-text data and generating structured output. Notwithstanding, most of the work has been done on extracting information from social media.6 26–28 32 34 51–58 Healthcare organisations have raised concerns virtually the accurateness or comments expressed on social media,61 making policymakers reluctant to endorse narrative information as a legitimate tool. Even though most administrators remove malicious messages manually, anyone can comment on the website and intentionally distort how potential patients evaluate healthcare services. The validity and reliability of NLP is farther express past the fact that most patients do not post reviews online. Kowalski55 found that healthcare services in England received fewer that xx reviews over a period of three and a one-half years. For a express amount of data, NLP may not exist very expedient, and with a smaller number of comments the results may not be equally fruitful and at that place may non be enough raw information to detect a specific pattern.50 Furthermore, rating posted in social media reviews is not adjusted for user characteristics or medical take a chance, whereas structured survey scores are patient mix adjusted.half dozen
Limitations
We focused on indicators of quality of the included articles rather than assessing the quality of the studies, as relevant formal standards have however to be established for NLP articles. Due to the heterogenous nature of the studies, and diverse approaches taken with regard to pre-processing, manual classification and performance of classifiers, it is challenging to make any comparative statements.
Conclusion
Studies in this review demonstrate that NLP and ML take emerged every bit an important tool for processing unstructured free-text patient experience data. Both supervised and unsupervised approaches accept their office in language assay depending on the data source. Supervised learning is time consuming due to the transmission coding required, and is beneficial in analysing free-text comments commonly institute in structured surveys. As the book of comments posted on social media continues to rise, manual classification for supervised learning may non be feasible due to fourth dimension constraints and topic modelling may be a satisfactory approach. To ensure that every patients' phonation is heard, healthcare organisations must react and mould their linguistic communication assay strategy in line with the diverse patient feedback platforms.
Acknowledgments
We thank Jacqueline Cousins (Library Director and Liaison Librarian at Regal College London) for the back up improving the composition of the search terms and procedural aspects of the search strategy.
Asking Permissions
If you lot wish to reuse any or all of this article please use the link below which will accept you to the Copyright Clearance Center's RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Copyright information:
© Author(south) (or their employer(s)) 2021. Re-employ permitted under CC BY-NC. No commercial re-use. Encounter rights and permissions. Published past BMJ. http://creativecommons.org/licenses/past-nc/4.0/ This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC Past-NC 4.0) license, which permits others to distribute, remix, adjust, build upon this work non-commercially, and license their derivative works on dissimilar terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is not-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Source: https://informatics.bmj.com/content/28/1/e100262
0 Response to "Natural Language Processing and Systematic Reviews Information Extraction"
Post a Comment