Sci-Tech

Analysis

Artificial Intelligence

Reading time: 13min

Smart Text, Stronger Systems: Large Language Models for the Performance of Veterinary Services

published on

09/03/2025

written by

Lead writer

Jennifer Lasley

Jennifer is Senior Programme Coordinator for the PVS Pathway and WHO/IHR Connections in WOAH’s Capacity-Building Department. At WOAH since 2010, her most recent work consists of the digitalisation of the PVS Pathway and the development of the PVS Initial and Follow-up Evaluation Database, where natural language processing and machine learning are used to extract added value and insights from data on the performance of animal health systems globally.

Barbara Alessandrini

Head of WOAH’s Capacity-Building Department, Barbara leads the WOAH capacity-building programme, which includes the PVS Pathway Programme and the Training Platform, promoting innovation in country-led services and in learner-centred training, to empower both institutions and individuals to welcome cultural and operational change, including the One Health approach.

Wasif Raza Mirza

Wasif, Director of SysReforms International, is a technology leader with over 22 years of experience in software architecture and digital transformation. His expertise spans legacy systems to modern AI-driven solutions, including enterprise large language model (LLM) implementation and qualitative data analysis platforms. A hands-on architect, he specialises in cloud-native technologies, full-stack development and large-scale system integration for global organisations, including the World Health Organization (WHO), International Labour Organization (ILO), United Nations Development Programme (UNDP), United States Agency for International Development (USAID) and the World Organisation for Animal Health (WOAH).

Hana Abdelsattar

Hana is Senior Training Programme Manager in the Capacity-Building Department of the World Organisation for Animal Health (WOAH), with extensive experience of capacity building in international organisations. She holds a Master’s degree in Management and Communication, and leads the development of WOAH’s eLearning platform, designed to meet diverse learning needs and uphold the Organisation’s standards. Dedicated to accessible, learner-centred training systems that build global capacity, Hana works to strengthen global Veterinary Services, equipping them to address both current and emerging challenges.

Abstract

This article explores the integration of large language models (LLMs) in enhancing the World Organisation for Animal Health’s (WOAH) Performance of Veterinary Services (PVS) Pathway, which supports improvements in animal health systems globally. LLMs, trained on vast datasets, are powerful AI tools capable of understanding and generating human-like text, offering opportunities for more efficient analysis of text data. WOAH’s PVS Evaluation Reports, which contain comprehensive assessments of national Veterinary Services, are rich in text but difficult to analyse manually due to their volume. LLMs can process these reports, extracting actionable insights from recommendations, strengths and weaknesses identified across 142 countries. The article details how Microsoft Azure OpenAI was used to process text from PVS reports, applying clustering and semantic similarity techniques to categorise key concepts and generate trends. Challenges in applying this off-the-shelf LLM, including issues with input limits, hallucination and resource demands, led WOAH to develop an in-house, specialised LLM based on Mistral 7B, which is in the process of being fine-tuned with WOAH– vand PVS-specific knowledge. The article highlights the potential for LLMs to improve data-driven decision-making and enhance global veterinary training and competency mapping. The future of this technology promises more efficient, data-backed policy development and the continued strengthening of animal health systems worldwide.

Artificial intelligence (AI) has made significant strides, with applications now spanning nearly every domain. Central to this progress are large language models (LLMs) – powerful AI systems capable of understanding and generating human-like text [1]. These models are transforming how we search for information, create content and generate knowledge. Understanding how LLMs can be applied is key to navigating our rapidly changing world.

The World Organisation for Animal Health (WOAH) serves as the global authority on animal health, committed to transparent reporting on animal diseases and the improvement of animal health worldwide, ultimately contributing to a safer, healthier and more sustainable future [2]. Since 2006, its Performance of Veterinary Services (PVS) Pathway has supported the lasting improvement of national Animal Health Services across the globe [3]. Given the volume of unstructured text and quantitative data in PVS Evaluation Reports, generative AI – powered by LLMs – offers an opportunity for decision-makers and policy-makers to extract trends and insights more effectively, enhancing the evaluation and performance of Animal Health Services [4,5]. 

What are Large Language Models?

LLMs are advanced algorithms trained to predict and generate text. They are termed ‘large’ because they are trained on vast datasets – often billions or even trillions of words – and use enormous numbers of parameters (mathematical weights) to process information. These parameters enable models to ‘learn’ patterns in language, including grammar, factual knowledge, reasoning strategies, and nuances in style and tone [4]. LLMs can also be trained to handle specialised subject-matter domains and topics.

Why Use Generative Artificial Intelligence and Large Language Models to Understand Animal Health Systems?

PVS reports [6] provide comprehensive assessment of a country’s Animal Health Services, documenting their performance and capacity in detail. They include findings and recommendations designed to inform and support governments, investors and partners [6-8]. However, the sheer volume of text makes them difficult to transform, analyse and summarise effectively (Figure 1). As a result, most analyses have focused solely on the Levels of Advancement (LoA)⁽ᵃ⁾ assigned to each Critical Competency (CC)⁽ᵇ⁾ in the PVS Tool, which evaluates performance in line with WOAH’s international standards [9-11].

Figure 1. View of a Critical Competency chapter from a PVS Evaluation report, demonstrating the large amount of unstructured yet rich text data available but previously unexploitable

While LoA data are valuable and regularly used by partners such as the World Bank, World Health Organization, Pandemic Fund and donors to monitor performance, they capture only a fraction of the insights contained in PVS Reports. LoA scores can indicate the capacity of Animal Health Systems, but they are less sensitive to change over time due to the many variables factored into their determination.

PVS Programme staff hypothesised that quantitative text analysis could provide a more sensitive measure of progress by focusing on the recommendations made and the actions taken in response. To test this, the team began developing a global database compiling the most frequent recommendations, strengths and weaknesses of Animal Health Services from PVS Evaluations conducted at the request of national authorities.

Processing Text for Insight on Animal Health Systems

Using Microsoft (MS) Azure OpenAI, 25,656 recommendations, 17,234 strengths and 19,930 weaknesses were processed from each chapter of 236 PVS Evaluation Reports across 142 countries. These reports were written in English (n = 152), French (n = 52) and Spanish (n = 32).

First, each recommendation was summarised in a ‘key concept’. Statistical methods such as the Jaccard similarity⁽ᶜ⁾ and BERT semantic similarity⁽ᵈ⁾ were then applied to sort, cluster and categorise similar concepts [12,13]. These groups were then refined into ‘umbrella concepts’ (n = 2125), allowing trends to emerge regarding the type and frequency of recommendations made across CCs in both global and regional contexts (Figure 2). Finally, all raw text, key concepts and umbrella concepts were translated into the other two languages, removing previous limitations on cross-language global and regional analysis [14].

Figure 2. Most frequent recommendations appearing in PVS Evaluation Reports (n = 236 PVS Evaluation reports)

Lessons Learned Through Experimentation and Use of Large Language Models

While the initial application of an LLM to the PVS dataset produced outputs of publishable quality, several challenges led to the adoption of an in-house, PVS-specialised LLM approach. These included input size limitations (i.e. Context Window⁽ᵉ⁾), the need for iterative testing and repeated prompt reengineering, inconsistent responses across subject areas and languages, extensive manual intervention, and high resource demands for result correction. When scaled over time and data volume, outputs proved costly. ‘Hallucinations’ (convincing but inaccurate content) further complicated processing the large PVS dataset.

After evaluating five open-source LLM base models, Mistral 7B was identified as the best performer compared to MS Azure OpenAI, based on cost–benefit–risk evaluation criteria suitable for intergovernmental use. Developing an in-house WOAH knowledge-based LLM enables specialisation currently not achievable with standard, off-the-shelf models. Benefitting from its extensive pre-training on general language patterns, fed in by the original developers from enormous volumes of books, websites, articles and other publicly available sources, the Mistral 7B model can be quickly advanced through finetuning with PVS-specific and WOAH-specific knowledge.

This finetuning is currently underway, incorporating contextual knowledge about WOAH’s mandate, international standards, guidelines, cooperative frameworks and country-specific facts, to name a few. Developers will adjust WOAH’s PVS LLM behaviour by giving it more targeted data and instructions. This involves both supervised learning, where the model is trained in example questions and answers, and reinforcement learning, where it is rewarded for giving useful responses. Feedback from human reviewers will further align the model to WOAH’s requirements, improving accuracy and compliance with its international standards and PVS context.

The WOAH PVS LLM will assist PVS Experts in drafting reports, summarising country progress across time, analysing past reports, developing surveys, extracting data from unstructured text and supporting data analysis.

Applications of Large Language Models in Building the Capacity of the Global Veterinary Workforce

A major benefit of applying LLMs within the PVS Pathway and continuing education is the ability to leverage specialised knowledge on WOAH and PVS to support core areas of the Organisation’s work. The WOAH PVS LLM will assist PVS Experts in drafting reports, summarising country progress across time, analysing past reports, developing surveys, extracting data from unstructured text and supporting data analysis.

New approaches are being developed to connect WOAH Members’ learning needs with the Organisation’s knowledge systems, with the WOAH PVS LLM playing a strategic role in this transformation. This innovation streamlines the conversion of unstructured data into actionable intelligence and expands the possibilities for using generative AI to support dynamic, learner-centred training across the entire learning cycle – from needs assessment to design, delivery and the evaluation of progress.

Recent literature highlights the potential of LLMs when used in competency mapping and curriculum design for veterinary and health education. LLM-integrated analytics have also proven effective in tracking learner engagement and informing training priorities [14,15]. Embedding LLMs into the Learning Needs Assessment methodology and the broader training cycle, WOAH can generate real-time, adaptive learning recommendations, enhance competency mapping and strengthen alignment with its Competency-Based Training Framework. Once consolidated, an in-house system will also assure the reliability and quality of AI-generated feedback, providing immediate and appropriate support to reinforce learning outcomes.

The Way Forward

As the WOAH PVS LLM pilot progresses, scaling and extending training to other areas of expertise within WOAH will maximise the impact of AI-driven insights and enhance efficiency in the Organisation’s core activities.

The opportunities presented by generative AI are a clear call to action, but they demand responsible and critical use, with strong safeguards for data confidentiality, validity and reliability. Building on the secure environment of the PVS Pathway Information System [6], finetuning the WOAH PVS LLM’s intelligence with high-quality data will create a powerful and unique WOAH knowledge ecosystem. This ecosystem will integrate information from WOAH and beyond, comprising PVS, eLearning resources, publications, international standards, guidelines and scientific literature.

The WOAH PVS LLM can help ensure that policy decisions are firmly grounded in data and science. Data-driven, evidence-based policies must be the north star to guide decision-makers, and data must support applied approaches with demonstrated and documented results. Armed with new insights, Veterinary Services will have more powerful tools for targeted action and investment, better-informed decision-making, improved implementation of WOAH’s international standards, enhanced advocacy capabilities and strengthened competencies to meet citizens’ needs for safe food, healthy animals and improved nutrition.

Main image: ©Krot Studio

DOI: https://doi.org/10.20506/woah.3640

Acknowledgements

The authors would like to acknowledge the contributions of the project team members, whose efforts since the project’s ideation have been integral. Listed in alphabetical order: Mario Ignacio Algüerno, Emmanuel Appiah, Maud Carron, François Caya, Oshin Dhand, Amy Hammond, Hasan Irtaza Mirza, Aminata Niang, Paul Nicoullaud, Roycelynne Reyes, Taimour Taj Shami, Valentyna Sharandak and Sania Zeb.

(a) The Level of Advancement (LoA) refers to the capacity observed and attributed to a country’s Animal Health Services by an independent expert, in a particular area (i.e. Critical Competency), represented as a semi-qualitative score: 1 = No or little capacity, 2 = Some capacity, 3 = Minimal capacity, 4 = Good capacity, 5 = Excellent capacity.

(b) The Critical Competency (CC) is a specific domain that is considered critical to the good functioning of the Animal Health Services in a country, according to WOAH’s international standards.

(c) Jaccard similarity is a measure used to determine the similarity between two sets of data, in this case text data.

(d) BERT semantic similarity is a measure used to determine how closely two sentences are related in meaning. BERT (Bidirectional Encoder Representations from Transformers) is a powerful transformer-based model that excels at capturing contextual relationships between words.

(e) ‘A context window… is the amount of text, in tokens, that the model can consider or “remember” at any one time. A larger context window enables an AI model to process longer inputs and incorporate a greater amount of information into each output.’ For more information, see https://www.ibm.com/think/topics/context-window.

References

[1] Kasneci E, Sessler K, Küchemann S, Bannert M, Dementieva D, Fischer F, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023;103:102274. https://doi.org/10.1016/j.lindif.2023.102274

[2] World Organisation for Animal Health (WOAH). Home page. Paris (France): WOAH; 2025. Available at: https://www.woah.org/en/home (accessed on 14 August 2025).

[3] World Organisation for Animal Health (WOAH). PVS Pathway. Paris (France): WOAH; 2025. Available at: https://www.woah.org/en/what-we-offer/improving-veterinary-services/pvs-pathway (accessed on 14 August 2025).

[4] Massachusetts Institute of Technology (MIT) News. Explained: generative AI. Cambridge (United States of America): MIT; 2023. Available at: https://news.mit.edu/2023/explained-generative-ai-1109 (accessed on 14 August 2025).

[5] OpenAI. Research. San Francisco (United States of America): OpenAI, L.L.C.; 2025. Available at: https://openai.com/research (accessed on 14 August 2025).

[6] World Organisation for Animal Health (WOAH). Performance of Veterinary Services Information System (PVSIS): PVS Reports. Paris (France): WOAH; 2025. Available at: https://pvs.woah.org/documents (accessed on 14 August 2025).

[7] World Organisation for Animal Health. Strengthening Veterinary Services through the OIE PVS Pathway: the case for engagement and investment. Paris (France): World Organisation for Animal Health; 2019. 40 p. Available at: https://doc.woah.org/dyn/portal/index.xhtml?page=alo&aloId=41673 (accessed on 14 August 2025).

[8] Bastiaensen P, Abernethy D, Etter E. Assessing the extent and use of risk analysis methodologies in Africa, using data derived from the Performance of Veterinary Services (PVS) Pathway. Rev. Sci. Tech. 2017;36(1):163-74. https://doi.org/10.20506/rst.36.1.2619

[9] World Organisation for Animal Health (WOAH). Evaluation of the Performance of Veterinary Services: PVS Tool, Terrestrial 2019. 7th ed. Paris (France): WOAH; 2023. 67 p. https://doi.org/10.20506/PVS.3428

[10] World Organisation for Animal Health (WOAH). Codes and Manuals. Paris (France): WOAH; 2025. Available at: https://www.woah.org/en/what-we-do/standards/codes-and-manuals (accessed on 30 April 2025).

[11] Zhou C. Understanding Jaccard similarity: a powerful tool for data analysis [Internet]. Medium; 2023. Available at: https://medium.com/@conniezhou678/understanding-jaccard-similarity-a-powerful-tool-for-data-analysis-42abaaafd782 (accessed on 30 April 2025).

[12] Data604. Semantic textual similarity using BERT [Internet]. Medium; 2023. Available at: https://medium.com/@Mustafa77/semantic-textual-similarity-with-bert-e10355ed6afa (accessed on 30 April 2025).

[13] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv. 2013;1301.3781. https://doi.org/10.48550/arXiv.1301.3781

[14] Choudhary OP, Saini J, Challana A. ChatGPT for veterinary anatomy education: an overview of the prospects and drawbacks. Int. J. Morphol. 2023;41(4):1198-202. https://doi.org/10.4067/S0717-95022023000401198

[15] Al-Ismail MS, Naseralallah LM, Hussain TA, Stewart D, Alkhiyami D, Abu Rasheed HM, et al. Learning needs assessments in continuing professional development: a scoping review. Med. Teach. 2023;45(2):203-11. https://doi.org/10.1080/0142159X.2022.2126756

Continue reading

data protection_animals_data scientist in a farm with cattle

10/23/2025

5 min read

Why data protection is key to global animal health security

 Pascaline Bossard

10/23/2025

5 min read

One Risk: Strengthening Regional Risk Modelling Capacity and Collaboration

Solenne Costard

10/23/2025

5 min read

From Fiction to Reality: The Threat of Agro-Terrorism

Daniel Donachie

Discover more themes

Animal health

Biosecurity

Collaboration

Gender

Global Health Security

Veterinary Workforce

Wildlife