Information, Data and Technology

Guilherme Ataíde Dias

Federal University of Paraíba (UFPB) | guilhermeataide@ccsa.ufpb.br | https://orcid.org/0000-0001-6576-0017 | https://lattes.cnpq.br/9553707435669429

Undergraduate in Computer Science from the Federal University of Paraíba UFPB Campus II (1990), Bachelor in Law by the University Center of João Pessoa UNIPE (2010), Master in Organization & Management by Central Connecticut State University? CCSU (1995), PhD in Information Science (Communication Sciences) at the University of São Paulo? USP (2003) and Post-Doctor by UNESP (2011). He is currently Associate Professor III at the Federal University of Paraíba, where he holds a degree in Information Science. He is involved with Post-Graduation through the Post-Graduate Program in Information Science and Postgraduate Program in Administration, both of UFPB. Has research interest in the following themes: Knowledge Representation; Information Architecture; Information security; Information and Communication Technologies; Health Information; Social networks; Free software; Law, Ethics and Intellectual Property in Cyberspace; Scientific Data Management; Legal Information; He is currently Research Productivity Scholar (PQ) at CNPq.

Moisés Lima Dutra

Federal University of Santa Catarina (UFSC) | moises.dutra@ufsc.br | https://orcid.org/0000-0003-1000-5553 | https://lattes.cnpq.br/1973469817655034

Professor, Federal University of Santa Catarina, Department of Information Science. PhD in Computing from the University of Lyon 1, France (2009). Master in Electrical Engineering, subarea Automação e Sistemas (2005) and Bachelor in Computing (1998) from the Federal University of Santa Catarina. His current lines of research are related to Applied Artificial Intelligence (Machine Learning, Deep Learning, Semantic Web, Linked Data) and Data Science (Big Data, IoT). It is linked to the research group ITI-RG (Intelligence, Technology and Information - Research Group).

Fábio Mosso Moreira

São Paulo State University (UNESP) | fabio.moreira@unesp.br | https://orcid.org/0000-0002-9582-4218 | https://lattes.cnpq.br/1614493890723021

Undergraduate degree in Business Administration from the Faculty of Sciences and Engineering (UNESP / Tupã). Master degree in Information Science - (UNESP / Marília). PhD student in the Graduate Program in Information Science (UNESP / Marília). Member of the Research Group - GPNTI (UNESP / Marília) and GPTAD (UNESP / Tupã). Collaborator of the Project Digital Skills for Family Farming (CoDAF). Content editor of the Electronic Journal Digital Skills for Family Farming (RECoDAF). Professional Technical Skill in Informatics from ETEC Massuyuki Kawano - Centro Paula Souza de Tupã. Professional experience in the ERP Information Systems for Logistics Operations. Works with research in Information Science, studying the use of digital resources for access to government data of Public Policies in the context of the small farmer.

Fernando de Assis Rodrigues

Federal University of Pará (UFPA) | fernando@rodrigues.pro.br | https://orcid.org/0000-0001-9634-1202 | https://lattes.cnpq.br/5556499513805582

Professor at Federal University of Pará. Ph.D. and M.S. in Information Science, Post-bachelor in Internet Systems and Bachelor of Science in Information Systems. Most of his experience is based on works developed as a Full Stack Developer and Database administrator, especially with Python, Java and PHP programming languages, as well as MySQL, MariaDB, SQLite3 and PostgreSQL databases. Also, he lectured classes related to the context of Computer Science to undergraduate and graduate students at UNESP. Currently, He workd as a postdoc researcher at UNESP labs, working in data studies.

Ricardo César Gonçalves Sant'Ana

São Paulo State University (UNESP) | ricardo.santana@unesp.br | https://orcid.org/0000-0003-1387-4519 | https://lattes.cnpq.br/1022660730972320

Associate Professor at the Paulista State University - UNESP, Faculty of Sciences and Engineering - FCE, Campus de Tupã, on an exclusive dedication, where he is Chairman of the Monitoring and Evaluation Committee of the Graduate Courses - CAACG, Local Coordinator of the Center for Studies and Pedagogical Practices - CENEPP and Local Ombudsman. Professor of the Post-Graduate Program in Information Science of the Paulista State University, Marília Campus. Graduated in Mathematics and Pedagogy, Master in Information Science (2002), Doctorate in Information Science (2008) and Freelance in Management Information Systems by UNESP (2017). He has specialized in Object Orientation (1996) and Management of Information Systems (1998). Ad hoc advisor of periodicals and development agencies. Member of the Research Group - New Technologies in Information GPNTI-UNESP. Has experience in the area of ??Computer Science, currently conducts research focused on: information science and information technology, investigating issues related to the Data Life Cycle, Transparency and Information Flow in Productive Chains. He worked as a professor at Faccat Faculdade de Ciências Contábeis e Administração de Tupã, where he coordinated a course of Administration with Qualification in Systems Analysis for ten years and the course of Licenciatura in Computing. He worked in the private sector as a consultant, integrator and researcher of new information technologies from 1988 to 2004.


Organizators

Guilherme Ataíde Dias

Federal University of Paraíba (UFPB) | guilhermeataide@ccsa.ufpb.br | https://orcid.org/0000-0001-6576-0017 | https://lattes.cnpq.br/9553707435669429

Undergraduate in Computer Science from the Federal University of Paraíba UFPB Campus II (1990), Bachelor in Law by the University Center of João Pessoa UNIPE (2010), Master in Organization & Management by Central Connecticut State University? CCSU (1995), PhD in Information Science (Communication Sciences) at the University of São Paulo? USP (2003) and Post-Doctor by UNESP (2011). He is currently Associate Professor III at the Federal University of Paraíba, where he holds a degree in Information Science. He is involved with Post-Graduation through the Post-Graduate Program in Information Science and Postgraduate Program in Administration, both of UFPB. Has research interest in the following themes: Knowledge Representation; Information Architecture; Information security; Information and Communication Technologies; Health Information; Social networks; Free software; Law, Ethics and Intellectual Property in Cyberspace; Scientific Data Management; Legal Information; He is currently Research Productivity Scholar (PQ) at CNPq.

Moisés Lima Dutra

Federal University of Santa Catarina (UFSC) | moises.dutra@ufsc.br | https://orcid.org/0000-0003-1000-5553 | https://lattes.cnpq.br/1973469817655034

Professor, Federal University of Santa Catarina, Department of Information Science. PhD in Computing from the University of Lyon 1, France (2009). Master in Electrical Engineering, subarea Automação e Sistemas (2005) and Bachelor in Computing (1998) from the Federal University of Santa Catarina. His current lines of research are related to Applied Artificial Intelligence (Machine Learning, Deep Learning, Semantic Web, Linked Data) and Data Science (Big Data, IoT). It is linked to the research group ITI-RG (Intelligence, Technology and Information - Research Group).

Fábio Mosso Moreira

São Paulo State University (UNESP) | fabio.moreira@unesp.br | https://orcid.org/0000-0002-9582-4218 | https://lattes.cnpq.br/1614493890723021

Undergraduate degree in Business Administration from the Faculty of Sciences and Engineering (UNESP / Tupã). Master degree in Information Science - (UNESP / Marília). PhD student in the Graduate Program in Information Science (UNESP / Marília). Member of the Research Group - GPNTI (UNESP / Marília) and GPTAD (UNESP / Tupã). Collaborator of the Project Digital Skills for Family Farming (CoDAF). Content editor of the Electronic Journal Digital Skills for Family Farming (RECoDAF). Professional Technical Skill in Informatics from ETEC Massuyuki Kawano - Centro Paula Souza de Tupã. Professional experience in the ERP Information Systems for Logistics Operations. Works with research in Information Science, studying the use of digital resources for access to government data of Public Policies in the context of the small farmer.

Fernando de Assis Rodrigues

Federal University of Pará (UFPA) | fernando@rodrigues.pro.br | https://orcid.org/0000-0001-9634-1202 | https://lattes.cnpq.br/5556499513805582

Professor at Federal University of Pará. Ph.D. and M.S. in Information Science, Post-bachelor in Internet Systems and Bachelor of Science in Information Systems. Most of his experience is based on works developed as a Full Stack Developer and Database administrator, especially with Python, Java and PHP programming languages, as well as MySQL, MariaDB, SQLite3 and PostgreSQL databases. Also, he lectured classes related to the context of Computer Science to undergraduate and graduate students at UNESP. Currently, He workd as a postdoc researcher at UNESP labs, working in data studies.

Ricardo César Gonçalves Sant'Ana

São Paulo State University (UNESP) | ricardo.santana@unesp.br | https://orcid.org/0000-0003-1387-4519 | https://lattes.cnpq.br/1022660730972320

Associate Professor at the Paulista State University - UNESP, Faculty of Sciences and Engineering - FCE, Campus de Tupã, on an exclusive dedication, where he is Chairman of the Monitoring and Evaluation Committee of the Graduate Courses - CAACG, Local Coordinator of the Center for Studies and Pedagogical Practices - CENEPP and Local Ombudsman. Professor of the Post-Graduate Program in Information Science of the Paulista State University, Marília Campus. Graduated in Mathematics and Pedagogy, Master in Information Science (2002), Doctorate in Information Science (2008) and Freelance in Management Information Systems by UNESP (2017). He has specialized in Object Orientation (1996) and Management of Information Systems (1998). Ad hoc advisor of periodicals and development agencies. Member of the Research Group - New Technologies in Information GPNTI-UNESP. Has experience in the area of ??Computer Science, currently conducts research focused on: information science and information technology, investigating issues related to the Data Life Cycle, Transparency and Information Flow in Productive Chains. He worked as a professor at Faccat Faculdade de Ciências Contábeis e Administração de Tupã, where he coordinated a course of Administration with Qualification in Systems Analysis for ten years and the course of Licenciatura in Computing. He worked in the private sector as a consultant, integrator and researcher of new information technologies from 1988 to 2004.


Entity named recognition in financial intelligence report

Pages: 291 - 302

Authors

Jairo Santana

Federal University of Santa Catarina (UFSC) | jairo.santana@gmail.com |

Diefferson K. Moro

Federal University of Santa Catarina (UFSC) | differson.moro@gmail.com |

Graduation in Information and Communication Technologies - UFSC

Rogério de Aquino Silva

Federal University of Santa Catarina (UFSC) | rogerriomp@gmail.com |

Vinicius Faria Culmant Ramos

Federal University of Santa Catarina (UFSC) | v.ramos@ufsc.br | https://orcid.org/0000-0002-8319-743X | https://lattes.cnpq.br/0442142220296336

Professor of the Federal University of Santa Catarina (UFSC) Araranguá campus. He holds a Bachelor's degree in Computer Science from UFRJ, a Master's Degree in Systems and Computer Engineering from COPPE / UFRJ and a Doctorate in Systems and Computer Engineering with a co-tutelary agreement between COPPE / UFRJ and the Eindhoven University of Technology (HOL) . Currently, he works with research and development of methodologies and technological tools for the treatment and analysis of large amounts of data (Big Data) in social networks. His researches are also focused on the teaching of computer programming and the development of constructivist learning environments in presence and distance with the use of new digital technologies of information and communication. It mainly works in the following subjects: educational technology, distance education, new Web technologies, adaptive systems, evaluation of adaptive systems, programming and treatment teaching and big data analysis.

Gustavo Medeiros de Araujo

Federal University of Santa Catarina (UFSC) | gustavo.araujo@ufsc.br | https://orcid.org/0000-0003-0572-6997 | https://lattes.cnpq.br/2609254559240670

PhD in Automation and Systems Engineering at UFSC (2013) and Master in Computer Science at UFSC (2007). He has experience in Computer Science and Automation, with emphasis on Data Science, Machine and Deep Learning and Cyber-Physical System. It has two lines of research: i) Applications with Data Science, Machine and Deep Learning and ii) Wireless Sensor Networks (WSNs) and MANTEs protocols. In addition to his academic background, he has experience in the software industry by developing information systems for the federal government and systems for the automation industry. He is currently associate professor A2 at the Federal University of Santa Catarina, member of the Laboratory of Software and Hardware Integration (LISHA) and member of the Laboratory of Engineering and Data Science (LECID).

Resumo

O reconhecimento de entidades nomeadas é uma das subáreas do processamento de linguagem natural, mineração de textos e aprendizado de máquinas. Todas essas áreas fazem parte da grande área da inteligência artificial, muito utilizada em diversos problemas práticos do nosso dia a dia. Uma das competências da Polícia Federal é a investigação de crimes financeiros, em especial, a lavagem de dinheiro. Dentre os problemas encontrados na investigação policial, destacamos a análise dos Relatórios de Inteligência Financeira (RIF), escritos em português do Brasil, que são gerados pelo Conselho de Controle de Atividades Financeiras. O objetivo desta análise é identificar os atores envolvidos em esquemas de lavagem de dinheiro, mas, dependendo da complexidade do esquema, a identificação, por exemplo, desses atores e suas relações (sociedades, parentescos, “laranjas”, empresas “fantasmas”, etc) em um relatório, pode demandar um tempo significativo do policial envolvido na investigação. Este trabalho, portanto, visa apresentar resultados iniciais da automatização do reconhecimento de entidades nomeadas, importantes para a investigação policial, em RIFs. Identificamos, na literatura, uma grande lacuna para esse tipo de solução em textos em português. Os nossos resultados, ainda preliminares, demonstram que as ferramentas e os dados utilizados para o treinamento ainda precisam ser melhor trabalhados para que estes sejam mais significativos. Pudemos perceber que com poucos dados de treinamento conseguimos aumentar a precisão do reconhecimento de entidades de 14 para 27% e, em um teste com o framework RASA NLU, aumentamos a precisão para 60,98% de entidades reconhecidas corretamente, muito aquém dos 90% encontrados na literatura para outros idiomas.

Palavras-chave: Reconhecimento de Entidades Nomeadas. Mineração de Texto. Relatório de Inteligência Financeira. Processamento de Linguagem Natural.

Abstract

The named entity recognition is a subarea of natural language processing, text mining, and machine learning. These areas are part of the artificial intelligence area, very used in different kind of daily practical problems. One of the competencies of the Brazilian Federal Police is to investigate financial crimes, especially money laundering. Among the problems encountered in the police investigation, we highlight the analysis of the Financial Intelligence Reports, written in Brazilian Portuguese, which are generated by the Financial Activities Control Council. The aim of this analysis is to identify the actors involved in money laundering schemes, but depending on the complexity of the scheme, the identification, for example, of these actors and their relationships (societies, kinship, "oranges", "ghost" companies, etc.) in a report, may require significant time from the police man involved in the investigation. The main objective of this paper is to present initial results of the automation of named entity recognition, important for the police investigation, in Financial Intelligence Reports. We identified, in the literature, a large gap for this type of solution in Portuguese texts. Our preliminary results demonstrate that the tools and data used for training still need to be better explored to make them more meaningful. We could see that, with a few training dataset, we were able to increase the accuracy of the recognition of entities from 14 to 27% and, using the Rasa NLU framework, we got a 60.98% precision, very below the 90% found in the literature for other languages.

Keywords: Named Entity Recognition. Text Mining. Financial Intelligence Report. Natural Processing Language.


Support

Federal University of Paraíba (UFPB)São Paulo State University (UNESP)Federal University of Santa Catarina (UFSC)Electronic Journal Digital Skills for Family Farming (RECoDAF)National Council for Scientific and Technological Development (CNPq)