Bengali Text to IPA conversion

1st in DataVerse Challenge - ITVerse 2023

Bengali is among the most widely spoken native languages in the world, yet Bengali text-to-IPA (International Phonetic Alphabet) transcription remains underdeveloped compared to other languages. In this project, we designed a model to convert Bengali text into its corresponding IPA representation.

We fine-tuned the ByT5 model, leveraging its byte-level tokenization, which is language-independent and well-suited for Bengali. Our model achieved Word Error Rate (WER) of 0.01420, securing first place on the competition leaderboard.

Paper Slides

Categories: