Assistant Professor @ DSAI BOUN
Hi! I am an assistant professor at the Institute for Data Science and Artificial Intelligence at Boğaziçi University who specializes in natural language processing and machine learning.
I have received my bachelor’s degree and master’s degree from the Department of Computer Engineering at Boğaziçi University. I completed my PhD studies in the same place under the guidance of Dr. Arzucan Özgür and Dr. Tunga Güngör. Previously, I worked as a researcher at the Institute of Natural Language Processing at the University of Stuttgart. I also worked at KUIS AI Center as a post-doctoral research fellow working on automatic learning of procedural language from natural language instructions.
To see my publication record please check out my Google Scholar profile.
Extracted from Turkish National Corpus (TNC), BOUN Treebank consists of 9,761 syntactically annotated sentences (121,214 tokens) from five different text types: Biographical texts, national newspapers, instructional texts, popular culture articles, and essays. treebank, paper.
This is the first Ottoman Turkish dependency treebank in the Universal Dependencies (UD) annotation style. The OTA-BOUN Treebank currently includes 1,743 manually annotated sentences from twelve different texts by ten different writers. All of the texts are from literature published between 1880 and 1928. treebank.
Ottoman Text Corpus is a clean corpus of transliterated historical Turkish texts that spans a wide range of historical periods. Currently OTC encompasses a total of 11 million tokens. corpus.
HisTR is the first named entity recognition (NER) dataset for historical Turkish, comprising 812 sentences from the 17th to the 19th centuries. PERSON, LOCATION, and ORGANIZATION entities are manually labeled within the dataset. dataset.
IMST Treebank consists of 5,635 syntactically annotated sentences collected from daily news reports and novels. treebank, paper.
This is the most updated version of the UD_Turkish-PUD treebank which was originally a part of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. It consists of 1,000 sentences that were parallel annotated for 18 languages. treebank, paper.
BOUN-Pars is a Turkish dependency parser that creates parse trees of Turkish sentences in CoNLL-U format. It employs an LSTM-based model and uses linguistically oriented rules and morphological information of words. code, demo, paper.
This dependency parser employs a semi-supervised learning approach “DCST” and utilizes auxiliary tasks for dependency parsing of code-switched (CS) language pairs (and low-resource languages, in general). There are two versions of the parsing model, one is LSTM-based and the other is XLM-R-based. code, paper.
These methods use the dependency grammar representations of sentences to compute sentence similarity for extractive multi-document summarization. code, paper.