Research

Developing the next generation of AI-based inference based on Large Language Models

Core Research Themes

Building NLP models capable of complex and formal inference. How to develop AI models which can deliver complex, expert-level (e.g. scientific) reasoning and explanations? In this area, we focus on building models capable of encoding complex and abstract inference, with a particular emphasis in scientific reasoning and discovery. Our contributions to this area affirm a balance between the flexibility provided by contemporary Large Language Models (LLMs) and the explicit inference controls delivered by symbolic methods. This includes exploring the complementary properties between Large Language Models (LLMs) and symbolic inference paradigms (mathematical and logic solvers) to support more rigorous and controlled reasoning in problem spaces such as ethical reasoning, reasoning in Mathematics and Physics and Oncology.

Explainable & controlled inference over LLMs. How to develop inference models which balance flexibility, rigour and transparency? We extended the capabilities of language models to encode and reason over complex conceptual representations (in a more controlled and explainable manner) using the support of variational auto-encoders and hyperbolic embeddings in conjunction with language models. These models allow for the construction of latent spaces which are better localised, disentangled and separated, allowing for a programme around the 'geometrization of inference' within latent spaces. This paradigm has been extended to explanatory abductive inference, where scientific explanations and inference steps are encoded within this geometrical paradigm. Additionally, We investigated new methods to integrate additional symbolic inference constraints into neural-based natural language inference (NLI) models. Our contributions to this area include the analysis and improvement of abstract logical feature modelling within neural NLI models. In order to establish the internal linguistic and inference consistency properties of neural NLI models, we are exploring the impact of probing methods to support natural language inference (NLI).

Mathematical Language Processing. How to develop NLP models which can represent and reason over mathematical text? A significant part of scientific discourse is expressed as mathematics. We are currently pioneering new AI methodologies which allow for automated inference over mathematical text, where models need to encode both natural language and mathematical expressions. Our research established new modeling mechanisms and evaluation methods for mathematical inference, including methods which can assess the ability of language models to interpret mathematical expressions and derivations. Our investigation extended to the domain of Physics, where we developed the first corpus for natural language inference over Physics texts.

NLP in Oncology (Biomarker Discovery and Clinical Trials Management). We are working in close collaboration with major cancer research centres in the UK (CRUK Manchester Institute) and Europe (Karolinska Institutet, VHIO) on the development of NLP models specialised in the oncology domain. Areas of collaboration involve the development of NLP models to support treatment recommendation, toxicity prediction and the development of complex multi-omics biomarkers. Current projects include the use of natural language inference to support cancer clinical trials (CCTs) design and management, multi-omics biomarker integration supported by deep learning architectures and LLMs to support scientific discovery in oncology. Recently, we developed a Meta-analysis informed ML model to support the prediction of Cytokine Release Syndrome (CRS) events in the context of CAR-T cell therapies, developed a systematic analysis on biologically-informed deep learning models and participated in discussions on the role of AI to support scientific discovery and disease management in oncology (in collaboration with EORTC and ESMO).

Recent Applications & Industrial Collaborations

The groups work in close collaboration with industrial partners to support the adaptation and technology transfer of state-of-the-art NLP into applied settings. Recent collaborations include:

Antibiotic discovery. Natural language inference over LLMs to support the discovery of new antibiotic substances (in collaboration with Inflamalps, funded by the Ark).

Biomarker discovery. Development of explainable ML or transformer-based models to support the discovery on new biomarkers in oncology. Development of complex LLM-based pipelines which integrates with specialised databases and analytical tools in oncology (in collaboration with Cancer Core Europe, funded by the European Union) .

Cancer clinical trials management. Development of specialised LLM-based systems for supporting clinical trials pipelines including: clinical decision support systems for molecular tumour boards, patient-clinical trials matching, mapping protocol deviations and evidence-based clinical trials design (in collaboration with The Christie NHS Foundation, funded by CRUK and UpSMART).

Social analytics & fact-checking. Controlling LLMs to understand social trends and perform complex fact checking (in collaboration with Bloom, funded by Innosuisse).

LLMs to support journalistic analyses. Assessing the capabilities of LLMs to deliver complex fact checking and facilitate journalistic discovery (funded by EPSRC iCASE).

ESG analysis. Automating the development of Environmental, Sustainability and Governance (ESG) reports with the support of LLMs (in collaboration with Ascentys).

Financial risk analysis. Use of controlled inference over LLMs to support more comprehensive risk-analysis models (in collaboration with Basinghall, funded by Innosuisse).

Explainable QA for intelligence analysis. Development of an explainable QA platform which queries over integrated multimodal (text, tables and graph) data (in collaboration with Fujitsu).

See Publications