The use of large language models for the annotation of discourse relations and its means of signalling: an empirical test

Authors

  • Juliano Desiderato Antonio Universidade Estadual de Maringá

DOI:

https://doi.org/10.11606/issn.2176-9419.v28i1e-240890

Keywords:

RST, Large Language Models, Discourse relations, Automatic annotation, ChatGPT

Abstract

This study investigates the use of large language models (LLMs) for the identification of rhetorical relations and the signalling means that enable their recognition. Grounded in Rhetorical Structure Theory (RST), which conceives text as a network of functional relations between discourse units, the research evaluates the potential of artificial intelligence tools for the automatic annotation of discourse corpora. The methodology consisted of reapplying to ChatGPT ten excerpts from a spoken language corpus administered to university professors in a previous investigation, keeping the same questions regarding the identification of discourse relations and the cues that signal such relations. The results reveal that, in seven out of ten cases, ChatGPT produced analyses coinciding with those of the previous paper, employing labels established in RST and justifying them on the basis of semantic, formal, and pragmatic signals. In the remaining three cases, although divergent, the model’s responses proved theoretically plausible. It is concluded that ChatGPT demonstrates satisfactory performance in the identification of discourse relations, presenting itself as a promising tool for the automatic annotation of corpora, provided that its analyses are validated by human experts.

Downloads

Download data is not yet available.

References

ANTONIO, J. D. Mecanismos utilizados pelos destinatários do discurso para identificação de relações de coerência não sinalizadas por conectores. DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada, v. 33, n. 1, p. 79-108, 2017.

ANTONIO, J. D.; SANTOS, J. A. A estrutura retórica do gênero resposta argumentativa. Signum: Estudos da Linguagem, v. 17, n. 2, p. 193-223, 2014.

BAREZ, F. et al. Chain-of-thought is not explainability. Preprint, alphaXiv, 2025. Disponível em https://www.alphaxiv.org/overview/2025.02v1. Acesso em: 18 mar. 2026.

BRAGA, M. L. Processos de redução: o caso das orações de gerúndio. In: KOCH, I.

G. V. (org.). Gramática do português falado: desenvolvimentos. Campinas: Ed. da Unicamp, 2002. v. 6, p. 239-258.

CARDOSO, P. C. F. et al. CSTNews: a discourse-annotated corpus for single and multi-document summarization in Brazilian Portuguese. NILC Technical Reports, University of São Paulo, 2011.

CARLSON, L.; MARCU, D. Discourse tagging reference manual. Los Angeles: University of Southern California, 2001.

CARLSON, L.; MARCU, D.; OKUROWSKI, M. E. RST Discourse Treebank. Philadelphia: Linguistic Data Consortium, 2001.

CORTES, E.; VIEIRA, R.; BARONE, D. Perguntas e respostas. In: CASELI, H. M.;

NUNES, M. G. V. (org.). Processamento de linguagem natural: conceitos, técnicas e aplicações em português. 3. ed. São Carlos: BPLN, 2023. p. 416-439.

CUNHA, I. da; TORRES-MORENO, J.-M.; SIERRA, G. On the development of the

RST Spanish treebank. In: LINGUISTIC ANNOTATION WORKSHOP, 5., 2011. Proceedings […]. Stroudsburg: Association for Computational Linguistics, 2011. p. 1-10.

DAS, D.; TABOADA, M. RST signalling corpus: a corpus of signals of coherence relations. Language Resources and Evaluation, Dordrecht, v. 52, n. 1, p. 149-184, 2018.

FÁVERO, L. L.; ANDRADE, M. L. C. V. O.; AQUINO, Z. G. O. O par dialógico pergunta–resposta. In: JUBRAN, C. C. A. S.; KOCH, I. G. V. (org.). Gramática do português culto falado no Brasil: construção do texto falado. Campinas: Ed. da Unicamp, 2006. v. 1, p. 133-166.

GÓMEZ-GONZÁLEZ, M. A.; TABOADA, M. Coherence relations in functional discourse grammar. In: MACKENZIE, J. L.; GÓMEZ-GONZÁLEZ, M. A. (ed.). Studies in functional discourse grammar. Berne: Peter Lang, 2005. p. 227-259.

GRIMES, J. The thread of discourse. The Hague: Mouton, 1975.

HILGERT, J. G. Parafraseamento. In: JUBRAN, C. C. A. S.; KOCH, I. G. V. (org.). Gramática do português culto falado no Brasil: construção do texto falado. Campinas: Ed. da Unicamp, 2006. v. 1, p. 255-273.

HOBBS, J. R. On the coherence and structure of discourse. Stanford: CSLI, 1985. (Report n. 35-87).

IRUSKIETA, M. et al. The RST Basque TreeBank: an online search interface to check rhetorical relations. In: RST AND DISCOURSE STUDIES WORKSHOP, 4., 2013. Proceedings […]. s.l.: s.n., 2013. p. 40-49.

JUBRAN, C. C. A. S. Parentetização. In: JUBRAN, C. C. A. S.; KOCH, I. G. V. (org.).

Gramática do português culto falado no Brasil: construção do texto falado. Campinas: Ed. da Unicamp, 2006. v. 1, p. 301-357.

MANN, W. C.; THOMPSON, S. A. Relational propositions in discourse. Marina del Rey: ISI, 1983. (ISI/RR-83-115).

MANN, W. C.; THOMPSON, S. A. Rhetorical structure theory: toward a functional theory of text organization. Text, v. 8, n. 3, p. 243-281, 1988.

MANN, W. C.; MATTHIESSEN, C. M. I. M.; THOMPSON, S. A. Rhetorical structure theory and text analysis. In: MANN, W. C.; THOMPSON, S. A. (ed.). Discourse description: diverse linguistic analyses of a fund-raising text. Amsterdam: John Benjamins, 1992. p. 39-77.

MATTHIESSEN, C.; THOMPSON, S. The structure of discourse and ‘subordination’. In: HAIMAN, J.; THOMPSON, S. (ed.). Clause combining in grammar and discourse. Amsterdam: John Benjamins, 1988. p. 275-329.

MARCUSCHI, L. A. Repetição. In: JUBRAN, C. C. A. S.; KOCH, I. G. V. (org.). Gramática do português culto falado no Brasil: construção do texto falado. Campinas: Ed. da Unicamp, 2006. v. 1, p. 219-254.

NEVES, M. H. M. Gramática de usos do português. São Paulo: Ed. da Unesp, 2000.

PAES, A.; VIANNA, D.; RODRIGUES, J. Modelos de linguagem. In: CASELI, H. M.;

NUNES, M. G. V. (org.). Processamento de linguagem natural: conceitos, técnicas e aplicações em português. 3. ed. São Carlos: BPLN, 2023. p. 385-414.

REDEKER, G. et al. Multi-layer discourse annotation of a Dutch text corpus. Paris: ELRA, 2012.

SANDERS, T. J. M.; SPOOREN, W. P. M.; NOORDMAN, L. G. M. Toward a taxonomy of coherence relations. Discourse Processes, v. 15, n. 1, p. 1-35, 1992.

SOUZA, J. W. C.; CARDOSO, P. C. F.; RODRIGUES, R. Systematic review of studies on rhetorical structure theory (RST). Revista de Estudos da Linguagem, v. 31, n. 3, p. 1643-1675, 2024.

TABOADA, M. Discourse markers as signals (or not) of rhetorical relations. Journal of Pragmatics, v. 38, n. 4, p. 567-592, 2006.

TABOADA, M. Implicit and explicit coherence relations. In: RENKEMA, J. (ed.). Discourse, of course. Amsterdam: John Benjamins, 2009. p. 127-140.

TABOADA, M.; DAS, D. Annotation upon annotation: adding signalling information to a corpus of discourse relations. Dialogue & Discourse, v. 4, n. 2, p. 249-281, 2013.

SHAHMOHAMMADI, M. et al. PrunedRST: a large-scale RST treebank for Persian with an optimized annotation scheme. arXiv, Ithaca, 2021. Preprint. Disponível em: https://arxiv.org/abs/2102.03003. Acesso em: 18 mar. 2026.

STEDE, M.; NEUMANN, A. Potsdam Commentary Corpus 2.0: annotation for discourse research. In: LANGUAGE RESOURCES AND EVALUATION CONFERENCE, 9., 2014. Proceedings […]. Paris: ELRA, 2014. p. 925-929.

TOLDOVA, S. et al. Rhetorical relations markers in Russian RST Treebank. In: RECENT ADVANCES IN RST AND RELATED FORMALISMS, 6., 2017. Proceedings […]. s.l.: s.n., 2017. p. 29-33.

TÖRNBERG, P. Best Practices for Text Annotation with Large Language Models. arXiv, 2024. Disponível em: https://arxiv.org/abs/2402.05129. Acesso em: 18 mar. 2026.

ZELDES, A. rstWeb - A Browser-based Annotation Interface for Rhetorical Structure Theory and Discourse Relations. In: NAACL-HLT 2016. Proceedings […]. San Diego, 2016. San Diego. p. 1-5.

ZELDES, A. The GUM corpus: creating multilayer resources in the classroom. Language Resources and Evaluation, v. 51, n. 3, p. 581-612, 2017.

Published

2026-05-08

Issue

Section

Papers

How to Cite

Antonio, J. D. (2026). The use of large language models for the annotation of discourse relations and its means of signalling: an empirical test. Filologia E Linguística Portuguesa, 28(1), e-240890. https://doi.org/10.11606/issn.2176-9419.v28i1e-240890