Sentiment Analysis: identfying sentiment in comments on Humaniza SUS network

Guilherme Ataíde Dias; Moisés Lima Dutra; Fábio Mosso Moreira; Fernando de Assis Rodrigues; Ricardo César Gonçalves Sant'Ana

Information, Data and Technology

Guilherme Ataíde Dias

Federal University of Paraíba (UFPB) | guilhermeataide@ccsa.ufpb.br | https://orcid.org/0000-0001-6576-0017 | https://lattes.cnpq.br/9553707435669429

Undergraduate in Computer Science from the Federal University of Paraíba UFPB Campus II (1990), Bachelor in Law by the University Center of João Pessoa UNIPE (2010), Master in Organization & Management by Central Connecticut State University? CCSU (1995), PhD in Information Science (Communication Sciences) at the University of São Paulo? USP (2003) and Post-Doctor by UNESP (2011). He is currently Associate Professor III at the Federal University of Paraíba, where he holds a degree in Information Science. He is involved with Post-Graduation through the Post-Graduate Program in Information Science and Postgraduate Program in Administration, both of UFPB. Has research interest in the following themes: Knowledge Representation; Information Architecture; Information security; Information and Communication Technologies; Health Information; Social networks; Free software; Law, Ethics and Intellectual Property in Cyberspace; Scientific Data Management; Legal Information; He is currently Research Productivity Scholar (PQ) at CNPq.

Moisés Lima Dutra

Federal University of Santa Catarina (UFSC) | moises.dutra@ufsc.br | https://orcid.org/0000-0003-1000-5553 | https://lattes.cnpq.br/1973469817655034

Professor, Federal University of Santa Catarina, Department of Information Science. PhD in Computing from the University of Lyon 1, France (2009). Master in Electrical Engineering, subarea Automação e Sistemas (2005) and Bachelor in Computing (1998) from the Federal University of Santa Catarina. His current lines of research are related to Applied Artificial Intelligence (Machine Learning, Deep Learning, Semantic Web, Linked Data) and Data Science (Big Data, IoT). It is linked to the research group ITI-RG (Intelligence, Technology and Information - Research Group).

Fábio Mosso Moreira

São Paulo State University (UNESP) | fabio.moreira@unesp.br | https://orcid.org/0000-0002-9582-4218 | https://lattes.cnpq.br/1614493890723021

Undergraduate degree in Business Administration from the Faculty of Sciences and Engineering (UNESP / Tupã). Master degree in Information Science - (UNESP / Marília). PhD student in the Graduate Program in Information Science (UNESP / Marília). Member of the Research Group - GPNTI (UNESP / Marília) and GPTAD (UNESP / Tupã). Collaborator of the Project Digital Skills for Family Farming (CoDAF). Content editor of the Electronic Journal Digital Skills for Family Farming (RECoDAF). Professional Technical Skill in Informatics from ETEC Massuyuki Kawano - Centro Paula Souza de Tupã. Professional experience in the ERP Information Systems for Logistics Operations. Works with research in Information Science, studying the use of digital resources for access to government data of Public Policies in the context of the small farmer.

Fernando de Assis Rodrigues

Federal University of Pará (UFPA) | fernando@rodrigues.pro.br | https://orcid.org/0000-0001-9634-1202 | https://lattes.cnpq.br/5556499513805582

Professor at Federal University of Pará. Ph.D. and M.S. in Information Science, Post-bachelor in Internet Systems and Bachelor of Science in Information Systems. Most of his experience is based on works developed as a Full Stack Developer and Database administrator, especially with Python, Java and PHP programming languages, as well as MySQL, MariaDB, SQLite3 and PostgreSQL databases. Also, he lectured classes related to the context of Computer Science to undergraduate and graduate students at UNESP. Currently, He workd as a postdoc researcher at UNESP labs, working in data studies.

Ricardo César Gonçalves Sant'Ana

São Paulo State University (UNESP) | ricardo.santana@unesp.br | https://orcid.org/0000-0003-1387-4519 | https://lattes.cnpq.br/1022660730972320

Associate Professor at the Paulista State University - UNESP, Faculty of Sciences and Engineering - FCE, Campus de Tupã, on an exclusive dedication, where he is Chairman of the Monitoring and Evaluation Committee of the Graduate Courses - CAACG, Local Coordinator of the Center for Studies and Pedagogical Practices - CENEPP and Local Ombudsman. Professor of the Post-Graduate Program in Information Science of the Paulista State University, Marília Campus. Graduated in Mathematics and Pedagogy, Master in Information Science (2002), Doctorate in Information Science (2008) and Freelance in Management Information Systems by UNESP (2017). He has specialized in Object Orientation (1996) and Management of Information Systems (1998). Ad hoc advisor of periodicals and development agencies. Member of the Research Group - New Technologies in Information GPNTI-UNESP. Has experience in the area of ??Computer Science, currently conducts research focused on: information science and information technology, investigating issues related to the Data Life Cycle, Transparency and Information Flow in Productive Chains. He worked as a professor at Faccat Faculdade de Ciências Contábeis e Administração de Tupã, where he coordinated a course of Administration with Qualification in Systems Analysis for ten years and the course of Licenciatura in Computing. He worked in the private sector as a consultant, integrator and researcher of new information technologies from 1988 to 2004.

Organizators

Guilherme Ataíde Dias

Federal University of Paraíba (UFPB) | guilhermeataide@ccsa.ufpb.br | https://orcid.org/0000-0001-6576-0017 | https://lattes.cnpq.br/9553707435669429

Undergraduate in Computer Science from the Federal University of Paraíba UFPB Campus II (1990), Bachelor in Law by the University Center of João Pessoa UNIPE (2010), Master in Organization & Management by Central Connecticut State University? CCSU (1995), PhD in Information Science (Communication Sciences) at the University of São Paulo? USP (2003) and Post-Doctor by UNESP (2011). He is currently Associate Professor III at the Federal University of Paraíba, where he holds a degree in Information Science. He is involved with Post-Graduation through the Post-Graduate Program in Information Science and Postgraduate Program in Administration, both of UFPB. Has research interest in the following themes: Knowledge Representation; Information Architecture; Information security; Information and Communication Technologies; Health Information; Social networks; Free software; Law, Ethics and Intellectual Property in Cyberspace; Scientific Data Management; Legal Information; He is currently Research Productivity Scholar (PQ) at CNPq.

Moisés Lima Dutra

Federal University of Santa Catarina (UFSC) | moises.dutra@ufsc.br | https://orcid.org/0000-0003-1000-5553 | https://lattes.cnpq.br/1973469817655034

Professor, Federal University of Santa Catarina, Department of Information Science. PhD in Computing from the University of Lyon 1, France (2009). Master in Electrical Engineering, subarea Automação e Sistemas (2005) and Bachelor in Computing (1998) from the Federal University of Santa Catarina. His current lines of research are related to Applied Artificial Intelligence (Machine Learning, Deep Learning, Semantic Web, Linked Data) and Data Science (Big Data, IoT). It is linked to the research group ITI-RG (Intelligence, Technology and Information - Research Group).

Fábio Mosso Moreira

São Paulo State University (UNESP) | fabio.moreira@unesp.br | https://orcid.org/0000-0002-9582-4218 | https://lattes.cnpq.br/1614493890723021

Undergraduate degree in Business Administration from the Faculty of Sciences and Engineering (UNESP / Tupã). Master degree in Information Science - (UNESP / Marília). PhD student in the Graduate Program in Information Science (UNESP / Marília). Member of the Research Group - GPNTI (UNESP / Marília) and GPTAD (UNESP / Tupã). Collaborator of the Project Digital Skills for Family Farming (CoDAF). Content editor of the Electronic Journal Digital Skills for Family Farming (RECoDAF). Professional Technical Skill in Informatics from ETEC Massuyuki Kawano - Centro Paula Souza de Tupã. Professional experience in the ERP Information Systems for Logistics Operations. Works with research in Information Science, studying the use of digital resources for access to government data of Public Policies in the context of the small farmer.

Fernando de Assis Rodrigues

Federal University of Pará (UFPA) | fernando@rodrigues.pro.br | https://orcid.org/0000-0001-9634-1202 | https://lattes.cnpq.br/5556499513805582

Professor at Federal University of Pará. Ph.D. and M.S. in Information Science, Post-bachelor in Internet Systems and Bachelor of Science in Information Systems. Most of his experience is based on works developed as a Full Stack Developer and Database administrator, especially with Python, Java and PHP programming languages, as well as MySQL, MariaDB, SQLite3 and PostgreSQL databases. Also, he lectured classes related to the context of Computer Science to undergraduate and graduate students at UNESP. Currently, He workd as a postdoc researcher at UNESP labs, working in data studies.

Ricardo César Gonçalves Sant'Ana

São Paulo State University (UNESP) | ricardo.santana@unesp.br | https://orcid.org/0000-0003-1387-4519 | https://lattes.cnpq.br/1022660730972320

Associate Professor at the Paulista State University - UNESP, Faculty of Sciences and Engineering - FCE, Campus de Tupã, on an exclusive dedication, where he is Chairman of the Monitoring and Evaluation Committee of the Graduate Courses - CAACG, Local Coordinator of the Center for Studies and Pedagogical Practices - CENEPP and Local Ombudsman. Professor of the Post-Graduate Program in Information Science of the Paulista State University, Marília Campus. Graduated in Mathematics and Pedagogy, Master in Information Science (2002), Doctorate in Information Science (2008) and Freelance in Management Information Systems by UNESP (2017). He has specialized in Object Orientation (1996) and Management of Information Systems (1998). Ad hoc advisor of periodicals and development agencies. Member of the Research Group - New Technologies in Information GPNTI-UNESP. Has experience in the area of ??Computer Science, currently conducts research focused on: information science and information technology, investigating issues related to the Data Life Cycle, Transparency and Information Flow in Productive Chains. He worked as a professor at Faccat Faculdade de Ciências Contábeis e Administração de Tupã, where he coordinated a course of Administration with Qualification in Systems Analysis for ten years and the course of Licenciatura in Computing. He worked in the private sector as a consultant, integrator and researcher of new information technologies from 1988 to 2004.

Sentiment Analysis: identfying sentiment in comments on Humaniza SUS network

Pages: 491 - 500

Authors

Eduardo Alves Silva

New University of Lisboa (NOVAIMS) | easilva91@gmail.com |

Luis Felipe Rosa de Oliveira

Federal University of Goías (UFGO) | luisprf@gmail.com | https://lattes.cnpq.br/6498992926514286

Master in Social Communication at the Faculty of Information and Communication of the Federal University of Goiás CAPES - DS. Bachelor in Information Management from the Federal University of Goiás. Performing research in the area of Digital Social Media, Social Networks, Python Programming Language and Statistical Data Analysis.

Video Transcription

O meu é Eduardo esse vídeo é uma apresentação sobre o trabalho que foi enviado para o workshop de informação dados e tecnologia que ocorre na Universidade da Paraíba entre os dias 27/11 e 29/11 de 2018.

O trabalho em questão tem o nome de “ANÁLISE DE SENTIMENTOS: Identificando sentimentos em comentários da Rede Humaniza SUS” trabalho foi produzido por mim Eduardo Silva e pelo meu colega Luiz Felipe Rosa.

O trabalho em questão tem como objetivo identificar sentimentos nos comentários da rede humaniza sus é uma rede social que têm atividade desde o ano de 2008 permeia assuntos relacionados ao sistema único de saúde brasileiro,ou seja, o SUS.

Atualmente a rede tem cerca de 30 mil usuários 14 mil publicações e em torno de 35 mil a 40 mil comentários. Para análises de sentimentos da rede dos comentários da rede nós fizemos o uso de mineração de dados ou mineração de texto nesse caso aplicando alguns conceitos de processamento de linguagem natural tudo utilizando a linguagem de programação python os dados foram coletados utilizando um banco de dados da rede após a coleta dos dados nós passamos a um tratamento desses dados usando um alguns metodologias de análise de texto e processamento de linguagem natural.

O texto em si vinha com linha com uma sujeira ou seja nesse contexto a sujeira são por exemplo caracteres assim que não foram bem identificados como a Ç, o AÕ entre outros caracteres.

Para além disso alguns comentários vinham com tags html ou seja apresentava as tags de parágrafos quebra de linha entre outros utilizando a linguagem de programação python fizemos limpeza desses dados e a sua normalização ou seja os dados passaram a ser normalizados sem sujeira todos os comentários com letras minúsculas.

Para identificação de sentimentos ou em alguns casos a sua classificação que não é bem esse caso podemos utilizar o aprendizado de máquina ou um léxico de sentimento uma vez que o intuito do trabalho para identificação e não a classificação nós utilizamos um léxico sentimento, o léxico é um com um dicionário com um conjunto de palavras ou textos que tenha atribuído a essas palavras uma polaridade essa popularidade pode ser positiva negativa ou neutra no caso do estudo nós utilizamos um léxico de sentimento chamado o OpLéxicon que é produzido pela PUC-RS baseado em texto jornalístico e em algumas outras fontes que eles utilizaram para produzir.

O OpLéxicon contém cerca de 32191 itens ou seja 32191 palavras dentre elas temos 24485 objetivos e 6889 verbos.

Nesse caso é preponderante o número de adjetivos e verbos uma vez que na língua portuguesa para definir um sentimento normalmente são esses dois tipos de palavras que são utilizados mas o léxico ainda tem hashtags determinantes preposições adjetivos e emóticos para dar segmento à metodologia o que fizemos foi a parte do léxico do sentimento criar uma metodologia de comparação das palavras que existe no comentário e que aparece no léxico, ou seja, se uma sentença tem 20 palavras e 10 das palavras aparecem no léxico de sentimento nós iremos armazenar os valores da polaridade dessas palavras e se encontram tanto no léxico quanto no comentário em questão a partir disso nós usamos uma metodologia de verificação e cálculo para definir quais os sentimentos aquele comentário que apresenta para tal nós tivemos como base os estudos feitos por autores que criaram um léxico de opinião o único de mineração de opinião da língua inglesa a partir do código que faz o uso desse léxico, nós fizemos uma adaptação com o léxicon a partir disso conseguimos verificar quais eram popularidade das palavras e assim sendo vai ficar com a popularidade do comentário ou seja se o comentário era positivo negativo ou neutro de acordo com o número de palavras positivas negativas ou neutras que apareciam nesse comentário.

Feito isso nós replicamos esse processo para todos os quase 40 mil comentários gerando assim a identificação de sentimentos de cada um deles.

A partir da identificação da popularidade nós podemos perceber que existe uma quantidade maior de comentários positivos e neutros do que negativos,.

Talvez isso ocorra por conta do tipo de rede social ao qual estamos lidando é uma rede um pouco mais controlada e focado em um tema específico então as pessoas nos comentários tendem a ser mais assertivas em termos de não dar comentários negativos mas sim comentários positivos de apoio ou então mensagens mais simples como bom dia gostei muito da postagem da publicação e comentários nessa linha

Dessa forma o resultado final que nós tivemos além da popularidade dos sentimentos a verificação das palavras mais comumente utilizadas em todos os comentários sendo que a palavra saúde como esperado parece em sua maioria assim como SUS entre outras relacionadas a esse contexto da saúde.

Foi possível também averiguar algumas questões mais minuciosos que se trata de uma análise um tanto quanto descritiva como por exemplo qual a frequência de comentários por mês por ano e assim por diante. Mas o resultado final a que nos interessava e que foi alcançado era definir ou identificar os possíveis sentimentos de cada um dos comentários pode se dizer que de certa forma nós estamos criando um dataset que posteriormente pode ser utilizado para classificar novos comentários então é importante ressaltar que o trabalho não se trata da classificação utilizando aprendizado de máquina mas sim da identificação utilizando um léxico a partir desse ponto é possível fazer uma comparação com outros léxico de sentimento da língua portuguesa para verificar quais deles tem uma taxa de acerto maior em relação à identificação de sentimentos no entanto uma vez que foi feito uso apenas do OpLéxico.

Nós tivemos como resultado final somente esses dados de identificação do sentimento agradeço a atenção daqueles que ler um artigo ou que estão apenas vendo o vídeo de apresentação muito obrigado e até mais.

Information, Data and Technology

Guilherme Ataíde Dias

Moisés Lima Dutra

Fábio Mosso Moreira

Fernando de Assis Rodrigues

Ricardo César Gonçalves Sant'Ana

Organizators

Guilherme Ataíde Dias

Moisés Lima Dutra

Fábio Mosso Moreira

Fernando de Assis Rodrigues

Ricardo César Gonçalves Sant'Ana

Sentiment Analysis: identfying sentiment in comments on Humaniza SUS network

Authors

Eduardo Alves Silva

Luis Felipe Rosa de Oliveira

Video Transcription

Support