Angie Nguyen

Data scientist

PhD Thesis

Contribution to the valorization of free textual data in the health sector (2022)

Recently, the healthcare industry has faced numerous challenges (epidemics management, demand volatility, care times condensation, etc.), resulting in a growing need for useful information to support decision-making. Furthermore, the majority of existing health data is available in the form of free text (clinical notes, messages on social networks, etc.). In this context, recent breakthroughs in natural language processing (NLP), especially language models based on deep learning, have raised opportunities to unlock this information and improve the global management of the healthcare sector. These technologies will allow for enhancing health databases, smoothing information flows between stakeholders, and improving multiple processes ranging from demand forecasting to epidemics management. Thus, this thesis focused on how to leverage the massively available unstructured textual data in the healthcare sector. First, two literature reviews identified opportunities and challenges of applying NLP to leverage available textual data and improve management processes. However, using these techniques comes with several challenges, including the high variability and implicit nature of natural language expressions or the scarcity of training and evaluation data. Therefore, a methodology using recent language models based on transformers has been developed to perform contextualized health information extraction (negations or suspicions of diseases, etc.) from various health-related texts, in the context of data scarcity in French. Finally, a second contribution developed a methodology to combine structured medical data with unstructured textual data from news media and validated it on two real cases in the pharmaceutical industry.

Research Fields

Natural language processing
Health
Machine learning
Deep learning
Data science

Main publications

Nguyen, Angie & Lamouri, Samir & Pellerin, Robert & Tamayo Giraldo, Simon & Lekens, Béranger. (2021). Data analytics in pharmaceutical supply chains: state of the art, opportunities, and challenges. International Journal of Production Research. 60. 10.1080/00207543.2021.1950937.
Nguyen, Angie & Usuga Cadavid, Juan Pablo & Lamouri, Samir & Grabot, Bernard & Pellerin, Robert. (2021). Understanding Data-Related Concepts in Smart Manufacturing and Supply Chain Through Text Mining. 10.1007/978-3-030-69373-2_37.
Nguyen, Angie & Pellerin, Robert & Lamouri, Samir & Lekens, Béranger. (2022). Managing demand volatility of pharmaceutical products in times of disruption through news sentiment analysis. International Journal of Production Research. 61. 10.1080/00207543.2022.2070044.
C. Eteve-Pitsaer & T. Marty & A. Nguyen & E. Le Priol & C. Paris & A. Mebarki & N. Texier & S. Schück. (2022). Psoriasis et altérations de la qualité de vie au travail: une étude avec des données issues de la base THIN® France croisées avec les contenus des réseaux sociaux analysés par l’outil Detec’t. Revue d’Épidémiologie et de Santé Publique. 10.1016/j.respe.2022.09.015.
Nguyen, Angie & Bougacha, Omar & Lekens, Béranger & Lamouri, Samir & Pellerin, Robert & Couvreur, Christophe. (2023). On the use of logistics data to anticipate drugs shortages through data mining. Procedia Computer Science. 219. 949-956. 10.1016/j.procs.2023.01.371.