Mitigação de viés de datasets multimodais em um classificador de categorias urbano-sociais

Luciano C. Lugli; Daniel Abujabra Merege; Rafael Pillon Almeida

doi:10.1590/s0103-4014.202438111.019

Autores

Luciano C. Lugli Universidade de São Paulo, Escola de Engenharia de São Carlos, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0002-9065-9639
Daniel Abujabra Merege Instituto de Pesquisas Tecnológicas do Estado de São Paulo, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0002-9232-9270
Rafael Pillon Almeida Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Daoura Research, São Paulo, Brasil https://orcid.org/0000-0003-0558-276X

DOI:

https://doi.org/10.1590/s0103-4014.202438111.019

Palavras-chave:

Mitigação de viés, Social sensing, Transformers, Análise de textos em PLN, Classificação de textos

Resumo

O referido projeto se caracteriza nas implicações relacionais do desenvolvimento sociomoral da teoria psicogenética em Piaget sobre a construção cognoscente da ética nos vieses pessoais e em referenciais da dialética discursiva na linguística. Foram parametrizados a dados funcionais de treinamento e teste em um classificador de categorias urbano-sociais em uma abordagem analítica textual por Processamento de Linguagem Natural (PLN), e baseado no mecanismo de atenção adaptada Transformers.
Nessa perspectiva, desenvolveu-se uma metodologia de mitigação de viés para a reestruturação do crivo e critério que datasets multimodais são retreinados, retestados e reavaliados. Finalmente, verificou-se e validou-se a heterogeneidade da ética comum coletiva humana, sobre inferências interpretativas, insights e tendências sociais reais que a relação cidade/cidadão aborda o “social sensing” na identificação de problemas público-sociais.

Downloads

Os dados de download ainda não estão disponíveis.

Biografia do Autor

Luciano C. Lugli, Universidade de São Paulo, Escola de Engenharia de São Carlos, Daoura Research, São Paulo, Brasil

é bacharel em Engenharia da Computação (2008), mestre em Engenharia Mecânica/Mecatrônica (2011) e doutor em Engenharia Mecânica/Mecatrônica (2016) pela Escola de Engenharia de São Carlos da Universidade de São Paulo. Engenheiro de Dados Sênior (desde 2021) na Daoura Research – São Paulo, SP, Brasil.
Daniel Abujabra Merege, Instituto de Pesquisas Tecnológicas do Estado de São Paulo, Daoura Research, São Paulo, Brasil

é bacharel em Sistemas de Informação pela Escola de Artes, Ciências e Humanidades da Universidade de São Paulo (2010), mestre em Engenharia da Computação pelo Instituto de Pesquisas Tecnológicas do Estado de São Paulo (2016). Co-fundador e CEO (desde 2016) na Daoura Research – São Paulo, SP, Brasil.
Rafael Pillon Almeida, Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Daoura Research, São Paulo, Brasil

é bacharel em Ciência da Computação pelo Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo (2012). Head de Tecnologia (desde 2018) na Daoura Research – São Paulo, SP, Brasil.

Referências

ANGWIN, J. et al. Machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica, 2017.

ASSIMAKOPOULOS, S. et al. Annotating for hate speech: The MaNeCo corpus and some input from critical discourse analysis. In: PROCEEDINGS OF THE 12TH LANGUAGE RESOURCES AND EVALUATION CONFERENCE, Marseille, p.5088-97, Marseille, France. 2020.

BAKHTIN, M. M. Estética da criação verbal (edição francesa Tzvetan Todorov). 6.ed. São Paulo: Editora MF, 2011.

BAKHTIN, M. M. Problemas da poética de Dostoiévski. 5.ed. Rio de Janeiro: Forense Editora, 2018.

BASU, P. et al. Multimodal Sentiment Analysis of #MeToo Tweets using Focal Loss (Grand Challenge). 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM).

BLODGETT, S. L. et al. Language (technology) is power: A critical survey of “bias” in NLP. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.5454-76, Online. Association for Computational Linguistics. 2020.

BLODGETT, S. L. et al. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets. In: PROCEEDINGS OF THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS and the 11th International Joint Conference on Natural Language Processing, Online. Association for Computational Linguistics. 2021.

BRAIT, B. (Org.). Bakhtin. Dialogismo e construção do sentido. Campinas: Editora da Unicamp, 2005.

BRAIT, B. (Org.). Bakhtin e o Círculo. São Paulo: Contexto. 2009.

BRANDIST, C.; TIHANOV, G. Materializing Bakhtin. The Bakhtin Circle and social theory. London: MacMillan Press, 2000.

BRASIL/EBIA. Estratégia Brasileira de Inteligência Artificial (EBIA) em 07/2021 – Ministério da Ciência, Tecnologia e Inovações (MCTI) / Secretaria de Empreendedorismo e Inovação (SEI). 2021.

BROWN, A. et al. Toward algorithmic accountability in public services: A qualitative study of affected community perspectives on algorithmic decision-making in child welfare services. In: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI ’19, New York, p.1-12, New York, NY, USA. Association for Computing Machinery. 2019.

CAÑETE, J. et al. Spanish Pre-Trained BERT Model and Evaluation Data. In: Practical ML for Developing Countries Workshop (PML4DC) at Eighth International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia CFP2020, PML4DC at ICLR 2020.

CASTELLE, M. The linguistic ideologies of deep abusive language classification. In: PROCEEDINGS OF THE 2ND WORKSHOP ON ABUSIVE LANGUAGE Online (ALW2), Brussels, p.160-70, Brussels, Belgium. Association for Computational Linguistics. 2018.

CHAKRAVARTHI, B. R. HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion. In: PROCEEDINGS OF THE THIRD WORKSHOP ON COMPUTATIONAL MODELING OF PEOPLE’S OPINIONS, PERSONALITY, and Emotion’s in Social Media, Barcelona, p.41-53, Barcelona, Spain (Online). Association for Computational Linguistics. 2020.

DEVLIN, J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding. S.l.: s.n., 2018.

FIELD, A. et al. A Survey of Race, Racism, and Anti-Racism in NLP. In: PROCEEDINGS OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR

COMPUTATIONAL LINGUISTICS and the 11th International Joint Conference on Natural Language Processing, p.1905-25. August 1–6, 2021a.

FIELD, A.; PARK, C. Y.; TSVETKOV, Y. Controlled analyses of social biases in Wikipedia bios. Computing Research Repository, arXiv:2101.00078. Version 1. 2021b.

GILLANI, N.; LEVY, R. Simple dynamic word embeddings for mapping perceptions in the public sphere. In: PROCEEDINGS OF THE THIRD WORKSHOP ON NATURAL LANGUAGE PROCESSING AND COMPUTATIONAL SOCIAL SCIENCE, Minneapolis, p.94-9, Minneapolis, Minnesota. Association for Computational Linguistics. 2019.

HANNA, A. et al. Towards a critical race methodology in algorithmic fairness. In: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, New York, p.501-12, New York, NY, USA. Association for Computing Machinery. 2020.

HILT, D. E.; SEEGRIST, D. W. Ridge: a computer program for calculating ridge regression estimates. Research Note NE-236. Upper Darby, PA: U.S. Department of Agriculture, Forest Service, Northeastern Forest Experiment Station. 7p. 1977.

HUDLEY, A. H. C.; MALLINSON, C.; BUCHOLTZ, M. Toward racial justice in linguistics: Interdisciplinary insights into theorizing race in the discipline and diversifying the profession. Language, v.96, n.4: p.e200–e235, 2020.

HUTCHINSON, B. et al. Social biases in NLP models as barriers for persons with disabilities. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.5491-501, Online. Association for Computational Linguistics. 2020.

JIANG, M. et al. Transformer Based Memory Network for Sentiment Analysis of Web Comments. IEEE Access: Special section on Innovation and Application of Intelligent Processing. DOI: 10.1109/ACCESS.2019.2957192, 2019.

JIANG, M.; FELLBAUM, C. Interdependencies of gender and race in contextualized word embeddings. In: PROCEEDINGS OF THE SECOND WORKSHOP ON GENDER BIAS IN NATURAL LANGUAGE PROCESSING, Barcelona, p.17-25, Barcelona, Spain (Online). Association for Computational Linguistics. 2020.

JOSHI, P. et al. The state and fate of linguistic diversity and inclusion in the NLP world. In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.6282-93, Online. Association for Computational Linguistics. 2020.

KIELAY, D. et al. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv:1909.02950v2 [cs.CL] 12 Nov 2020.

LAHIRE, B. Formes Sociales Scripturales et Formes Sociales Orales. Une Analyse Sociologique de l’’Echec Scolaire’ à l’Ecole Primaire. Lyon, 1990. Tese (Doutorado) – Université Lumière Lyon 2.

LAHIRE, B. Culture écrite et inégalités scolaires. Lyon: Presses Universitaires de Lyon. DOI : 10.4000/books.pul.12525. 1993a.

LAHIRE, B. La raison des plus faibles. Rapport au Travail, Ecritures Domestiques et Lectures en Milieux Populaires. Lille : Presses Universitaires de Lille. 1993b.

LAHIRE, B. Pratiques d’écriture et sens pratique. In: SINGLY, F. de; CHAUDRON, M. (Org.) Identité, Lecture, Ecriture. Paris: Bibliothèque Publique d’Information; Centre Georges Pompidou, 1993c. p.115-30.

LEINS, K.; LAU, L. H.; BALDWIN, T. Give me convenience and give her death: Who should decide what uses of NLP are appropriate, and on what basis? In: PROCEEDINGS OF THE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, p.2908-13, Online. Association for Computational Linguistics. 2020.

LEPORI, M. Unequal representations: Analyzing intersectional biases in word embeddings using representational similarity analysis. In: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, Barcelona, p.1720-8, Barcelona, Spain (Online). International Committee on Computational Linguistics. 2020.

LIU, H. et al. Does gender matter? towards fairness in dialogue systems. In: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, Barcelona, p.4403-16, Barcelona, Spain (Online). International Committee on Computational Linguistics. 2020.

MA, K. Artificial unintelligence: How computers misunderstand the world. The Information Society, v.35, n.5, p.314-15, 2018. DOI: 10.1080/01972243.2019.1655942.

MELLET, K. et al. A “democratization” of markets? Online consumer reviews in the restaurant industry. Valuation Studies, v.2, n.1, p.5-41, 2014. doi: 10.3384/vs.2001-5992.14215.

MOTHA, S. Is an antiracist and decolonizing applied linguistics possible? Annual Review of Applied Linguistics, v.40, p.128-33, 2020.

NOBLE, S. Algorithms of oppression: How search engines reinforce racism. New York: NYU Press, 2018.

O’NEIL, C. Weapons of math destruction: how big data increases inequality and threatens democracy. New York: USA C. Publishers, 2016. ISBN 9780553418835.

PÊCHEUX, M. Semântica e discurso: uma crítica à afirmação do óbvio. Ed. Unicamp, 1988.

PÊCHEUX, M. O discurso: estrutura ou acontecimento. Campinas: Pontes, 1990.

PIAGET, J. Adaptation Vitale et Psychologie de l´Intelligence: sélection organique et phénocopie. France: Hermann, 1974.

PIAGET, J. A epistemologia genética, sabedoria e ilusões da filosofia, problemas de epistemologia genética. São Paulo: Abril Cultural, 1983.

PIAGET, J. Seis estudos de Psicologia. 18.ed. Rio de Janeiro: Forense Editora, 1991.

PIAGET, J. O juízo moral na criança. São Paulo: Summus, 1994.

SOUZA, F., NOGUEIRA, R., LOTUFO, R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS, BRACIS, Rio Grande do Sul, Brazil, October 20-23, 2020.

TESTUGGINE, D. et al. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv:1909.02950v2 [cs.CL] 12 Nov 2020.

VASWANI, A. et al. Attention is all you need. In: 31st CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2017), CA, USA. arXiv preprint:arXiv:1706.03762v5. 2017.

YOON K.; DENTON, C.; HOANG, L. Structured attention networks. In: INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, s. l., 2017.

Mitigação de viés de datasets multimodais em um classificador de categorias urbano-sociais

Autores

DOI:

Palavras-chave:

Resumo

Downloads

Biografia do Autor

Referências

Downloads

Publicado

Edição

Seção

Licença

Como Citar

Idioma

Informações