Publications

SukatWika: An Analysis Software for Linguistic Properties of Texts

Published in International Conference on Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, 2019

There is a lack of understanding on the qualities of texts that children can read, especially in Philippine languages. Text quality should be informed by an analysis of text difficulty, which can be measured by the linguistic properties of text such as word density, concept load, and phonological weight. The SukatWika analysis tool was developed to automate the extraction of this information for texts written in Filipino, English, Sinugbuanong Binisaya, and Ilokano languages. The results obtained from this software can be used as an aid in the creation of instructional materials which will support reading development among learners.

Recommended citation: Kathrina Lorraine Lucasan, Angelina Aquino, Francis Paolo Santelices, and Dina Ocampo. 2019. SukatWika: An Analysis Software for Linguistic Properties of Texts. In Proceedings of the International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, Paris, France. https://github.com/AngelAquino/SukatWika/blob/master/sukatwika2019.pdf

G2P and ASR techniques for low-resource phonetic transcription of Tagalog, Cebuano, and Hiligaynon

Published in 9th International Symposium on Multimedia and Communication Technology (ISMAC), 2019

Philippine linguists are tasked with documenting over 170 indigenous languages. A key part of this documentation is the phonetic transcription of recorded speech, which is typically done by hand, and is often expensive and time-consuming. Automated phonetic transcription systems provide a faster and cheaper alternative to manual transcription, but no such system has yet been developed for most Philippine languages. In this paper, we present an implementation of three APT methods—grapheme-to-phoneme conversion, automatic speech recognition, and adaptive alignment—for transcription of small speech corpora in Tagalog, Cebuano, and Hiligaynon. We show that the G2P, adaptive, and select ASR models perform at par with human transcribers while greatly reducing total time and costs. These systems serve as a competent baseline for future developments in APT for Philippine languages, and are expected to facilitate further research and advancements in Philippine linguistics and speech technology.

Recommended citation: Angelina Aquino, Joshua Lijandro Tsang, Crisron Rudolf Lucas, and Franz de Leon. 2019. G2P and ASR techniques for low-resource phonetic transcription of Tagalog, Cebuano, and Hiligaynon. In Proceedings of the 9th International Symposium on Multimedia and Communication Technology (ISMAC), Quezon City, Philippines. IEEE. https://ieeexplore.ieee.org/document/8836168