Evaluation of the author disambiguation algorithm of AMiner in an academic metasearcher of Computer Science
Abstract
Author disambiguation is a problem of considerable relevance for academic information retrieval systems. A name disambiguation algorithm implemented in AMiner represents one of the approaches based on Machine Learning with the greatest impact in the present. This work presents an evaluation of the name disambiguation algorithm of AMiner for author disambiguation in the context of an academic metasearcher of Computer Science. Experimental results with data generated by the academic metasearcher indicate an average performance similar to the reference. Likewise, experimental results led to identify special cases of author names that present low performance compared with the average. This finding allowed the identification of an apparent association between the low performance of the algorithm in the context of several authors with the same name and with a low number of publications.
Downloads
References
Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. F. (2012). A Brief Survey of Automatic Methods for Author Name Disambiguation. SIGMOD Rec., 41(2), 15–26. https://doi.org/10.1145/2350036.2350040
Kuna, H., Cantero, A., Canteros, A., Rey, M., Zamudio, E., Rambo, A., Martini, E., Pautsch, G., Biale, C., Krujoski, S., & Rauber, F. (2019). Avances en el desarrollo de métodos de Desambiguación y Recomendación de Autores Científicos para un Metabuscador de las Ciencias de la Computación. XXI Workshop de Investigadores en Ciencias de la Computación, 198-202. http://www.wicc2019.unsj.edu.ar/descargas/Libro_WICC2019.pdf
Kuna, H., Rey, M., Zamudio, E., Olivas, J. A., Rambo, A., Cantero, A., Canteros, A., Martini, E., & Biale, C. (2017). An Entity Profile Schema for Data Integration in an Academic Metasearch Engine. Proceedings of the 2017 International Conference on Artificial Intelligence, 281–285. http://csce.ucmss.com/cr/books/2017/ConferenceReport?ConferenceKey=ICA
Liu, Y., Li, W., Huang, Z., & Fang, Q. (2015). A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology, 66(3), 634-644. https://doi.org/10.1002/asi.23183
Santana, A. F., Gonçalves, M. A., Laender, A. H. F., & Ferreira, A. A. (2017). Incremental Author Name Disambiguation by Exploiting Domain-specific Heuristics. J. Assoc. Inf. Sci. Technol., 68(4), 931–945. https://doi.org/10.1002/asi.23726
Shoaib, M., Daud, A., & Amjad, T. (2020). Author Name Disambiguation in Bibliographic Databases: A Survey. arXiv preprint arXiv:2004.06391.
Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A Unified Probabilistic Framework for Name Disambiguation in Digital Library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975-987. https://doi.org/10.1109/TKDE.2011.13
Tang, Jie. (2016a). AMiner: Mining deep knowledge from big scholar data. Proceedings of the 25th international conference companion on world wide web, 373–373.
Tang, Jie. (2016b). AMiner: Toward understanding big scholar data. Proceedings of the ninth ACM international conference on web search and data mining, 467–467.
Wan, H., Zhang, Y., Zhang, J., & Tang, J. (2019). Aminer: Search and mining of academic social networks. Data Intelligence, 1(1), 58–76.
Wang, H., Wang, R., Wen, C., Li, S., Jia, Y., Zhang, W., & Wang, X. (2020). Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation Learning. arXiv preprint arXiv:2002.09803.
Zhang, W., Yan, Z., & Zheng, Y. (2019). Author Name Disambiguation Using Graph Node Embedding Method. 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD), 410–415.
Zhang, Y., Zhang, F., Yao, P., & Tang, J. (2018). Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1002–1011.
Zhu, J., Wu, X., Lin, X., Huang, C., Fung, G. P., & Tang, Y. (2018). A Novel Multiple Layers Name Disambiguation Framework for Digital Libraries Using Dynamic Clustering. Scientometrics, 114(3), 781–794. https://doi.org/10.1007/s11192-017-2611-8
Sasaki, Y. (2007). The truth of the F-measure. Teach Tutor Mater.
Van Rijsbergen, C. (1979). Information Retrieval | Guide books. https://dl.acm.org/doi/book/10.5555/539927
The articles published in the journal Ciencia y Tecnología are the exclusive property of their authors. Their opinions and content belong to their authors, and the Universidad de Palermo declines all responsibility for the rights that may arise from reading and/or interpreting the content of the published articles.
The reproduction, use or exploitation by any third party of the published articles is not authorized. Its use is only authorized for exclusively academic and/or research purposes.