Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques
Аутори
Zečević, AnđelkaKovačević, Jovana
Davidović, Radoslav
Остала ауторства
Morić, IvanaĐorđević, Valentina
Конференцијски прилог (Објављена верзија)
,
© 2023 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade
Метаподаци
Приказ свих података о документуАпстракт
Information aggregation from various gen, disease, and gen-disease databases such
as DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and Diseases into a unique
database would enable researchers to analyze and compare valuable domain findings
in a more convenient and systematic way. However, the aggregation poses numerous
challenges due to non-uniform information annotation across the databases. In this work,
we address the problem of mapping a disease name, when needed, into a standardized
disease code (DOID) based on Natural Language Processing text representation
techniques. We examine the benefits and limitations of using off-the-shelf embeddings
such as Med2vec, and language models such as BioBERT, UmlsBERT, and PubMedBERT
in retrieval scenarios with respect to standard full-text search. In addition to qualitative
improvements, we elaborate on the technical requirements and computational
complexities that come with the embracement of language models and semantic sear...ch.
Извор:
4th Belgrade Bioinformatics Conference, 2023, 4, 37-37Издавач:
- Belgrade : Institute of molecular genetics and genetic engineering
Напомена:
- Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023
Колекције
Институција/група
Institut za molekularnu genetiku i genetičko inženjerstvoTY - CONF AU - Zečević, Anđelka AU - Kovačević, Jovana AU - Davidović, Radoslav PY - 2023 UR - https://belbi.bg.ac.rs/ UR - https://imagine.imgge.bg.ac.rs/handle/123456789/1975 AB - Information aggregation from various gen, disease, and gen-disease databases such as DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and Diseases into a unique database would enable researchers to analyze and compare valuable domain findings in a more convenient and systematic way. However, the aggregation poses numerous challenges due to non-uniform information annotation across the databases. In this work, we address the problem of mapping a disease name, when needed, into a standardized disease code (DOID) based on Natural Language Processing text representation techniques. We examine the benefits and limitations of using off-the-shelf embeddings such as Med2vec, and language models such as BioBERT, UmlsBERT, and PubMedBERT in retrieval scenarios with respect to standard full-text search. In addition to qualitative improvements, we elaborate on the technical requirements and computational complexities that come with the embracement of language models and semantic search. PB - Belgrade : Institute of molecular genetics and genetic engineering C3 - 4th Belgrade Bioinformatics Conference T1 - Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques EP - 37 SP - 37 VL - 4 UR - https://hdl.handle.net/21.15107/rcub_imagine_1975 ER -
@conference{ author = "Zečević, Anđelka and Kovačević, Jovana and Davidović, Radoslav", year = "2023", abstract = "Information aggregation from various gen, disease, and gen-disease databases such as DisGeNet, COSMIC, HumsaVar, Orphanet, ClinVar, HPO, and Diseases into a unique database would enable researchers to analyze and compare valuable domain findings in a more convenient and systematic way. However, the aggregation poses numerous challenges due to non-uniform information annotation across the databases. In this work, we address the problem of mapping a disease name, when needed, into a standardized disease code (DOID) based on Natural Language Processing text representation techniques. We examine the benefits and limitations of using off-the-shelf embeddings such as Med2vec, and language models such as BioBERT, UmlsBERT, and PubMedBERT in retrieval scenarios with respect to standard full-text search. In addition to qualitative improvements, we elaborate on the technical requirements and computational complexities that come with the embracement of language models and semantic search.", publisher = "Belgrade : Institute of molecular genetics and genetic engineering", journal = "4th Belgrade Bioinformatics Conference", title = "Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques", pages = "37-37", volume = "4", url = "https://hdl.handle.net/21.15107/rcub_imagine_1975" }
Zečević, A., Kovačević, J.,& Davidović, R.. (2023). Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques. in 4th Belgrade Bioinformatics Conference Belgrade : Institute of molecular genetics and genetic engineering., 4, 37-37. https://hdl.handle.net/21.15107/rcub_imagine_1975
Zečević A, Kovačević J, Davidović R. Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques. in 4th Belgrade Bioinformatics Conference. 2023;4:37-37. https://hdl.handle.net/21.15107/rcub_imagine_1975 .
Zečević, Anđelka, Kovačević, Jovana, Davidović, Radoslav, "Mapping of Disease Names to Disease Codes based on Natural Language Processing Techniques" in 4th Belgrade Bioinformatics Conference, 4 (2023):37-37, https://hdl.handle.net/21.15107/rcub_imagine_1975 .