Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/23218
Autoria: Vicente, M.
Batista, F.
Carvalho, J. P.
Editor: Kóczy, L., and Medina, J.
Data: 2016
Título próprio: Improving Twitter gender classification using multiple classifiers
Paginação: 121 - 127
Título do evento: 8th European Symposium on Computational Intelligence and Mathematics
ISBN: 978-84-617-5119-8
Palavras-chave: Gender classification
Twitter users
Gender database
Text mining
Resumo: The user profile information is important for many studies, but essential information, such as gender and age, is not provided when creating a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. Our experiments use an English labelled dataset containing 6.5M tweets from 65K users, and a Portuguese labelled dataset containing 5.8M tweets from 58K users. We use supervised approaches, considering four groups of features extracted from different sources: user name and screen name, user description, content of the tweets, and profile picture. A final classifier that combines the prediction of each one of the four previous partial classifiers achieves 93.2% accuracy for English and 96.9% accuracy for Portuguese data.
Arbitragem científica: yes
Acesso: Acesso Aberto
Aparece nas coleções:CTI-CRI - Comunicações a conferências internacionais

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
conferenceobject_30885.pdfVersão Editora289,12 kBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.