Mitigação de viés de datasets multimodais em um classificador de categorias urbano-sociais

Luciano C. Lugli; Daniel Abujabra Merege; Rafael Pillon Almeida

doi:10.1590/s0103-4014.202438111.019

Authors

Luciano C. Lugli Universidade de São Paulo, Escola de Engenharia de São Carlos, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0002-9065-9639
Daniel Abujabra Merege Instituto de Pesquisas Tecnológicas do Estado de São Paulo, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0002-9232-9270
Rafael Pillon Almeida Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0003-0558-276X

DOI:

https://doi.org/10.1590/s0103-4014.202438111.019

Keywords:

Bias mitigation, Social sensing, Transformers, NLP text analysis, Text classification

Abstract

This research project is based on the relational implications of the sociomoral development of Piaget’s psychogenetic theory on the cognition construction of ethics in personal biases as in references of discursive dialectics in linguistics. Functional data from training and testing were parameterized in an urban-social category classifier in a textual analytical approach by Natural Language Processing (NLP) and based on the Transformers adapted attention mechanism. In this perspective, a bias mitigation methodology was developed to restructure the convergence criteria in which multimodal datasets were retrained, retested, and reevaluated. Finally, the heterogeneity of the common collective human ethics was verified and validated, over interpretive inferences, insights, and real social trends, whereby the city/citizen relation addresses the “social sensing” in the identification of public-social problems.

Downloads

Download data is not yet available.

Author Biographies

Luciano C. Lugli, Universidade de São Paulo, Escola de Engenharia de São Carlos, Daoura Research, São Paulo, Brasil

é bacharel em Engenharia da Computação (2008), mestre em Engenharia Mecânica/Mecatrônica (2011) e doutor em Engenharia Mecânica/Mecatrônica (2016) pela Escola de Engenharia de São Carlos da Universidade de São Paulo. Engenheiro de Dados Sênior (desde 2021) na Daoura Research – São Paulo, SP, Brasil.
Daniel Abujabra Merege, Instituto de Pesquisas Tecnológicas do Estado de São Paulo, Daoura Research, São Paulo, Brasil

é bacharel em Sistemas de Informação pela Escola de Artes, Ciências e Humanidades da Universidade de São Paulo (2010), mestre em Engenharia da Computação pelo Instituto de Pesquisas Tecnológicas do Estado de São Paulo (2016). Co-fundador e CEO (desde 2016) na Daoura Research – São Paulo, SP, Brasil.
Rafael Pillon Almeida, Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Daoura Research, São Paulo, Brasil

é bacharel em Ciência da Computação pelo Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo (2012). Head de Tecnologia (desde 2018) na Daoura Research – São Paulo, SP, Brasil.

References

ANGWIN, J. et al. Machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica, 2017.

ASSIMAKOPOULOS, S. et al. Annotating for hate speech: The MaNeCo corpus and some input from critical discourse analysis. In: PROCEEDINGS OF THE 12TH LANGUAGE RESOURCES AND EVALUATION CONFERENCE, Marseille, p.5088-97, Marseille, France. 2020.

BAKHTIN, M. M. Estética da criação verbal (edição francesa Tzvetan Todorov). 6.ed. São Paulo: Editora MF, 2011.

BAKHTIN, M. M. Problemas da poética de Dostoiévski. 5.ed. Rio de Janeiro: Forense Editora, 2018.

BASU, P. et al. Multimodal Sentiment Analysis of #MeToo Tweets using Focal Loss (Grand Challenge). 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM).

BLODGETT, S. L. et al. Language (technology) is power: A critical survey of “bias” in NLP. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.5454-76, Online. Association for Computational Linguistics. 2020.

BLODGETT, S. L. et al. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. In: PROCEEDINGS OF THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS and the 11th International Joint Conference on Natural Language Processing, Online. Association for Computational Linguistics. 2021.

BRAIT, B. (Org.). Bakhtin. Dialogismo e construção do sentido. Campinas: Editora da Unicamp, 2005.

BRAIT, B. (Org.). Bakhtin e o Círculo. São Paulo: Contexto. 2009.

BRANDIST, C.; TIHANOV, G. Materializing Bakhtin. The Bakhtin Circle and social theory. London: MacMillan Press, 2000.

BRASIL/EBIA. Estratégia Brasileira de Inteligência Artificial (EBIA) em 07/2021 – Ministério da Ciência, Tecnologia e Inovações (MCTI) / Secretaria de Empreendedorismo e Inovação (SEI). 2021.

BROWN, A. et al. Toward algorithmic accountability in public services: A qualitative study of affected community perspectives on algorithmic decision-making in child welfare services. In: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI ’19, New York, p.1-12, New York, NY, USA. Association for Computing Machinery. 2019.

CAÑETE, J. et al. Spanish Pre-Trained BERT Model and Evaluation Data. In: Practical ML for Developing Countries Workshop (PML4DC) at Eighth International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia CFP2020, PML4DC at ICLR 2020.

CASTELLE, M. The linguistic ideologies of deep abusive language classification. In: PROCEEDINGS OF THE 2ND WORKSHOP ON ABUSIVE LANGUAGE Online (ALW2), Brussels, p.160-70, Brussels, Belgium. Association for Computational Linguistics. 2018.

CHAKRAVARTHI, B. R. HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion. In: PROCEEDINGS OF THE THIRD WORKSHOP ON COMPUTATIONAL MODELING OF PEOPLE’S OPINIONS, PERSONALITY, and Emotion’s in Social Media, Barcelona, p.41-53, Barcelona, Spain (Online). Association for Computational Linguistics. 2020.

DEVLIN, J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding. S.l.: s.n., 2018.

FIELD, A. et al. A Survey of Race, Racism, and Anti-Racism in NLP. In: PROCEEDINGS OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR

COMPUTATIONAL LINGUISTICS and the 11th International Joint Conference on Natural Language Processing, p.1905-25. August 1–6, 2021a.

FIELD, A.; PARK, C. Y.; TSVETKOV, Y. Controlled analyses of social biases in Wikipedia bios. Computing Research Repository, arXiv:2101.00078. Version 1. 2021b.

GILLANI, N.; LEVY, R. Simple dynamic word embeddings for mapping perceptions in the public sphere. In: PROCEEDINGS OF THE THIRD WORKSHOP ON NATURAL LANGUAGE PROCESSING AND COMPUTATIONAL SOCIAL SCIENCE, Minneapolis, p.94-9, Minneapolis, Minnesota. Association for Computational Linguistics. 2019.

HANNA, A. et al. Towards a critical race methodology in algorithmic fairness. In: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, New York, p.501-12, New York, NY, USA. Association for Computing Machinery. 2020.

HILT, D. E.; SEEGRIST, D. W. Ridge: a computer program for calculating ridge regression estimates. Research Note NE-236. Upper Darby, PA: U.S. Department of Agriculture, Forest Service, Northeastern Forest Experiment Station. 7p. 1977.

HUDLEY, A. H. C.; MALLINSON, C.; BUCHOLTZ, M. Toward racial justice in linguistics: Interdisciplinary insights into theorizing race in the discipline and diversifying the profession. Language, v.96, n.4: p.e200–e235, 2020.

HUTCHINSON, B. et al. Social biases in NLP models as barriers for persons with disabilities. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.5491-501, Online. Association for Computational Linguistics. 2020.

JIANG, M. et al. Transformer Based Memory Network for Sentiment Analysis of Web Comments. IEEE Access: Special section on Innovation and Application of Intelligent Processing. DOI: 10.1109/ACCESS.2019.2957192, 2019.

JIANG, M.; FELLBAUM, C. Interdependencies of gender and race in contextualized word embeddings. In: PROCEEDINGS OF THE SECOND WORKSHOP ON GENDER BIAS IN NATURAL LANGUAGE PROCESSING, Barcelona, p.17-25, Barcelona, Spain (Online). Association for Computational Linguistics. 2020.

JOSHI, P. et al. The state and fate of linguistic diversity and inclusion in the NLP world. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.6282-93, Online. Association for Computational Linguistics. 2020.

KIELAY, D. et al. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv:1909.02950v2 [cs.CL] 12 Nov 2020.

LAHIRE, B. Formes Sociales Scripturales et Formes Sociales Orales. Une Analyse Sociologique de l’’Echec Scolaire’ à l’Ecole Primaire. Lyon, 1990. Tese (Doutorado) – Université Lumière Lyon 2.

LAHIRE, B. Culture écrite et inégalités scolaires. Lyon: Presses Universitaires de Lyon. DOI : 10.4000/books.pul.12525. 1993a.

LAHIRE, B. La raison des plus faibles. Rapport au Travail, Ecritures Domestiques et Lectures en Milieux Populaires. Lille : Presses Universitaires de Lille. 1993b.

LAHIRE, B. Pratiques d’écriture et sens pratique. In: SINGLY, F. de; CHAUDRON, M. (Org.) Identité, Lecture, Ecriture. Paris: Bibliothèque Publique d’Information; Centre Georges Pompidou, 1993c. p.115-30.

LEINS, K.; LAU, L. H.; BALDWIN, T. Give me convenience and give her death: Who should decide what uses of NLP are appropriate, and on what basis? In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.2908-13, Online. Association for Computational Linguistics. 2020.

LEPORI, M. Unequal representations: Analyzing intersectional biases in word embeddings using representational similarity analysis. In: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, Barcelona, p.1720-8, Barcelona, Spain (Online). International Committee on Computational Linguistics. 2020.

LIU, H. et al. Does gender matter? towards fairness in dialogue systems. In: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, Barcelona, p.4403-16, Barcelona, Spain (Online). International Committee on Computational Linguistics. 2020.

MA, K. Artificial unintelligence: How computers misunderstand the world. The Information Society, v.35, n.5, p.314-15, 2018. DOI: 10.1080/01972243.2019.1655942.

MELLET, K. et al. A “democratization” of markets? Online consumer reviews in the restaurant industry. Valuation Studies, v.2, n.1, p.5-41, 2014. doi: 10.3384/vs.2001-5992.14215.

MOTHA, S. Is an antiracist and decolonizing applied linguistics possible? Annual Review of Applied Linguistics, v.40, p.128-33, 2020.

NOBLE, S. Algorithms of oppression: How search engines reinforce racism. New York: NYU Press, 2018.

O’NEIL, C. Weapons of math destruction: how big data increases inequality and threatens democracy. New York: USA C. Publishers, 2016. ISBN 9780553418835.

PÊCHEUX, M. Semântica e discurso: uma crítica à afirmação do óbvio. Ed. Unicamp, 1988.

PÊCHEUX, M. O discurso: estrutura ou acontecimento. Campinas: Pontes, 1990.

PIAGET, J. Adaptation Vitale et Psychologie de l´Intelligence: sélection organique et phénocopie. France: Hermann, 1974.

PIAGET, J. A epistemologia genética, sabedoria e ilusões da filosofia, problemas de epistemologia genética. São Paulo: Abril Cultural, 1983.

PIAGET, J. Seis estudos de Psicologia. 18.ed. Rio de Janeiro: Forense Editora, 1991.

PIAGET, J. O juízo moral na criança. São Paulo: Summus, 1994.

SOUZA, F., NOGUEIRA, R., LOTUFO, R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS, BRACIS, Rio Grande do Sul, Brazil, October 20-23, 2020.

TESTUGGINE, D. et al. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv:1909.02950v2 [cs.CL] 12 Nov 2020.

VASWANI, A. et al. Attention is all you need. In: 31st CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017), CA, USA. arXiv preprint:arXiv:1706.03762v5. 2017.

YOON K.; DENTON, C.; HOANG, L. Structured attention networks. In: INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, s. l., 2017.

Mitigação de viés de datasets multimodais em um classificador de categorias urbano-sociais

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Language

Information