Şaziye Betül Özateş

Hi! I am an assistant professor at the Institute for Data Science and Artificial Intelligence at Boğaziçi University who specializes in natural language processing and machine learning.

I have received my bachelor’s degree and master’s degree from the Department of Computer Engineering at Boğaziçi University. I completed my PhD studies in the same place under the guidance of Dr. Arzucan Özgür and Dr. Tunga Güngör. Previously, I worked as a researcher at the Institute of Natural Language Processing at the University of Stuttgart. I also worked at KUIS AI Center as a post-doctoral research fellow working on automatic learning of procedural language from natural language instructions.

Publications

To see my publication record please check out my Google Scholar profile.

Software and Data

BOUN Treebank:

Extracted from Turkish National Corpus (TNC), BOUN Treebank consists of 9,761 syntactically annotated sentences (121,214 tokens) from five different text types: Biographical texts, national newspapers, instructional texts, popular culture articles, and essays. treebank, paper.

IMST Treebank:

IMST Treebank consists of 5,635 syntactically annotated sentences collected from daily news reports and novels. treebank, paper.

PUD Treebank:

This is the most updated version of the UD_Turkish-PUD treebank which was originally a part of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. It consists of 1,000 sentences that were parallel annotated for 18 languages. treebank, paper.

BOUN-Pars:

BOUN-Pars is a Turkish dependency parser that creates parse trees of Turkish sentences in CoNLL-U format. It employs an LSTM-based model and uses linguistically oriented rules and morphological information of words. code, demo, paper.

Semi-supervised Deep Dependency Parser:

This dependency parser employs a semi-supervised learning approach “DCST” and utilizes auxiliary tasks for dependency parsing of code-switched (CS) language pairs (and low-resource languages, in general). There are two versions of the parsing model, one is LSTM-based and the other is XLM-R-based. code, paper.

Sentence Similarity Kernels:

These methods use the dependency grammar representations of sentences to compute sentence similarity for extractive multi-document summarization. code, paper.