Navigation auf uzh.ch

Suche

Department of Computational Linguistics

Text Type-Sensitive Controllable Text Simplification with LLMs

Introduction

Controllable Text Generation (CTG) refers to the ability to guide or influence the output of a language model to meet specific requirements or constraints. When integrated with Automatic Text Simplification (ATS), these constraints typically include compression ratio (the length of the simplified text divided by the length of the original text), lexical complexity, semantic similarity, and syntactic richness. By utilizing parallel complex-simple text pairs, ATS systems can be trained to generate simplifications that adhere to these specified requirements.

Problem Statement

Despite the advancements made in controllable ATS systems, an important aspect has been overlooked: the text type. Text of different types should be simplified differently due to their distinct formalities. For instance, legal and scientific texts have unique lexical preferences and syntactic structures that differ significantly from those of news articles. 

Challenges

A significant challenge in controllable ATS systems is that they are typically trained to simplify text from a single specific types. This approach leads to two main issues:

  1. The systems are not sensitive to variations across different text types, and 
  2. They perform poorly when applied to text from types other than the one they were trained on. 

Consequently, this leads to developing multiple distinct ATS systems, complicating access to simplified information. This project aims to address these limitations by developing a unified ATS system that is responsive to various input text types, thereby enhancing accessibility to diverse textual information.

Research Questions

The primary objective of this project is to develop a types-sensitive ATS system that can effectively distinguish between different text types and generate types-specific text simplifications. In pursuit of this goal, the project will explore several key research questions:

  • Text Type Modeling: How can we effectively model tex type features to enhance the sensitivity of ATS systems? Should we perform statistical analysis directly from complex-simple pairs, or should we utilize a reference corpus?
  • Evaluation Metrics: Which evaluation metrics are most appropriate for assessing types-specific text simplifications?
  • Alignment Analysis: Do the automatic (sentence or paragraph) alignments that feed into the ATS system exhibit characteristics that are specific to particular types?
  • Feature Generalization: Once we can model the features of an unseen text types, can we guide the ATS to generate simplification for this unseen types without supervised fine-tuning?

By addressing these questions, this project aims to advance the capabilities of current ATS systems and open new avenues for enhancing information accessibility.

Project Objectives

This project involves further project objectives:

  • Literature search for controllable ATS systems, compare their methodology and application cases, understand the task, and suggest appropriate datasets for testing the performance.
  • Perform extensive experiments, write result analysis and conduct ablation study.
  • Propose evaluation approaches for types-specific text simplifications.
  • Write a thesis and potentially contribute to a publication.

Requirements

This project requires the following knowledge basis in:

  • Generative models in NLP
  • Understanding in evaluation of machine learning models
  • Deep learning with PyTorch and Huggingface (parallel computing with MultiThread is a plus.

Contact

We are looking for highly motivated students majoring in computer science/data science/mathematics/electrical engineering. Please send your CV and transcript to yingqiang.gao@uzh.ch (cc Prof. Dr. Sarah Ebling, ebling@cl.uzh.ch). This project will be co-supervised by Dr. Nianlong Gu (nianlong.gu@uzh.ch) at Linguistics Research Infrastructure (LiRI) of University of Zurich.

References

  1. Martin, Louis, et al. "Controllable Sentence Simplification." Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020.
  2. Maddela, Mounica, Fernando Alva-Manchego, and Wei Xu. "Controllable Text Simplification with Explicit Paraphrasing." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
  3. Yang, Kevin, and Dan Klein. "FUDGE: Controlled Text Generation With Future Discriminators." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
  4. Scarton, Carolina, and Lucia Specia. "Learning simplifications for specific target audiences." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.
  5. Cemri, Mert, Tolga Çukur, and Aykut Koç. "Unsupervised simplification of legal texts." arXiv preprint arXiv:2209.00557 (2022).
  6. Garimella, Aparna, et al. "Text Simplification for Legal types: Insights and Challenges." NLLP 2022 2022 (2022): 296-304.
  7. Justo, Jenel M., and Reginald Neil C. Recario. "Text Simplification System for Legal Contract Review." Future of Information and Communication Conference. Cham: Springer Nature Switzerland, 2024.
  8. Engelmann, Björn, et al. "Text Simplification of Scientific Texts for Non-Expert Readers." (2023).
  9. Devaraj, Ashwin, et al. "Paragraph-level Simplification of Medical Texts." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
  10. Kim, Yea-Seul, et al. "SimpleScience: Lexical simplification of scientific terminology." Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.
  11. Vásquez-Rodríguez, Laura, et al. "Investigating Text Simplification Evaluation." Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021.