Skip to Content

Publications

A big list of researches I've worked on

Theses (1)

TitleAuthorsVenueYear
Automatic Subtitle Generation for Bengali Multimedia using Deep Learning

For audio or video material to be more inclusive and accessible, automatic subtitle generation is essential. Nevertheless, implementing this technology into Bengali presents significant challenges due to scarce resources and linguistic difficulty. In this study, a new deep learning based system for creating Subtitles for Bengali multimedia automatically is introduced. The suggested approach makes use of the Wav2vec2 and the Common Voice Bengali Dataset, a large collection of Bengali audio recordings. This study uses the Common Voice Dataset Bengali to train and tune the Wav2vec2 model in order to accurately convert Bengali audio into text. Current automatic speech recognition approaches are combined with Bengali language-specific factors in the created system to give accurate and reliable transcription works. The transcribed text is synced with the matching audio parts throughout the subtitle production process. The produced subtitles are enhanced using post-processing approaches, similar to capitalization and punctuation restoration, to ensure readability and consistency. The findings of this study might greatly improve Bengali language media’s usability and availability across a range of sectors. The created subtitles may enhance the watching experience for Bengali multimedia by easing greater understanding, and expanding availability. The study demonstrates the potential of using deep learning and ASR methods to get over the difficulties of automated subtitle production in the Bengali language, advancing multimedia availability and inclusion.

Ehsanur Rahman Rhythm, Shafakat Sowroar Arnob, Rajvir Ahmed Shuvo, Annajiat Alim Rasel, Sifat E JahanB.Sc, Brac University2023

Conference Papers (11)

TitleAuthorsVenueYear
Contrail Analysis through Advanced Neural Network Architectures: Image Segmentation and Classification

The aviation industry’s immense expansion is having an impact on global warming and has resulted in some significant environmental issues. When an airplane passes directly over them, tiny crisscross patterns, often known as contrails, may be visible. They are to blame for this effect. Contrails are really just airborne particles that have been compressed with water. They are uncommon since ice can only form under particular climatic circumstances, such as extremely cold, hot, humid, and saturating air. Even worse, because of the cooler climate at night, it is more dangerous because it has more time to live. They gather heat from the sun and store it, then release it into the atmosphere. Some experts have also warned the public that the radiation these contrails produce may be more damaging to the atmosphere than previously predicted. For this reason, scientists are looking for methods to reduce these contrails by comprehending their behaviors and patterns. Now, the proposed study segments and classifies images of contrails acquired from satellite data. In this study, complex neural network architectures, including U-Net, DeepLab, Attention Mechanism, and ResNet50 with CNN, are used to segment and binary classify those photos. These architectural frameworks will aid this research in effectively classifying and segmenting those contrails from the satellite images so that further research can comprehend and observe their patterns and behaviors.

Ashraful Alam Nirob, Shahriar Ahmed, Tahmidul Karim Takee, Adnan Rahman Eshan, Shadik Ul Haque, Ehsanur Rahman Rhythm, Annajiat Alim RaselICCIT2023
ReSkipNet: Skip Connected Convolutional Autoencoder for Original Document Denoising

Data pre-processing, data analysis, and Optical Character Recognition need a huge amount of clean data, and document images are usually a good source for this. However, document images frequently exhibit blurring and various other forms of noise, which can pose challenges in their manipulation and analysis. To denoise and deblur such document images, autoencoders have been used for a long time. For this task, we propose a novel Convolutional Autoencoder Network which is composed of multiple skip-connected residual blocks and other layers for supporting the encoder and decoder parts. This model not only uses less computational power to denoise existing document image datasets but also performs well. While prior research primarily concentrates on optimizing evaluation metrics, our approach additionally prioritizes larger resolution input sizes. This characteristic of using larger image sizes enhances its practicality and usability as real-world documents are typically characterized by a higher word density. Moreover, in order to further advance the development of our model, we produced an original dataset and proceeded to train our model on this dataset, resulting in satisfactory outcomes.

Mohammad Muhibur Rahman*, Anushua Ahmed*, Mohammad Rakibul Hasan Mahin, Fahmid Bin Kibria, Waheed Moonwar, Ehsanur Rahman Rhythm, Annajiat Alim RaselICCIT2023
Siamese-Transformer Network for Offline Handwritten Signature Verification using Few-shot

Handwritten signature verification is a crucial task with applications spanning authentication, financial transactions, and legal documents. In scenarios where only a single reference signature is available, the challenge of accurate verification becomes pronounced due to variations in writing styles, distortions, and limited labeled data. In this paper, we propose a novel Siamese-Transformer network tailored for handwritten signature verification using few-shot learning. By synergizing Siamese neural networks and Transformer architectures, our model excels in capturing contextual relationships and discerning genuine from forged signatures. A triplet loss function facilitates discriminative feature learning. Convolution layers extract local features from an image, while the transformer component utilizes these local features to capture global dependencies within signatures. Experimental results on benchmark datasets showcase the model’s superior performance in few-shot verification scenarios, marking it as a promising advancement in signature verification and few-shot learning techniques.

Prattoy Majumder, AFM Mohimenul Joaa, Ehsanur Rahman Rhythm, Md Humaion Kabir Mehedi, Annajiat Alim RaselICCIT2023
Comparative Analysis of Traditional and Contextual Embedding for Bangla Sarcasm Detection in Natural Language Processing

Sarcasm, a sort of sentiment characterized by a disparity between the apparent and intended meanings of the text, is a key component of sentiment analysis, opinion extraction, and social media analytics. However, sarcasm detection in Bangla has not received sufficient research attention yet. Moreover, there hasn’t been a significant amount of study done comparing traditional and contextual word embeddings for the Bengali language. This study aims to address this gap by comparing traditional embedding by using the Bidirectional Gated Recurrent Unit - BiGRU model and contextual embedding by using Bidirectional Encoder Representations from Transformers - BERT for sarcasm detection in Bangla. The collection of the dataset of Bangla text was from social media platforms, containing labelled instances - whether it provides sarcasm or non-sarcasm. Pre-trained word embeddings i.e. GloVe and FastText are used as traditional embedding for this study. By using metrics like precision, recall and F1-score, the performances for both models have been obtained. When the two traditional word embedding approaches are compared, GloVe embedding with Bi-GRU has outperformed FastText embedding with a macro-averaged F1 score of 0.9395. On the other hand, contextual word embedding using BERT has outperformed both the traditional approaches having a better macro-averaged F1 score of 0.9572 and greater class-wise performance as compared with traditional embedding for both non-sarcastic (96%) and sarcastic (96%) text detection. In our findings, contextual word embedding i.e. BERT has performed better as compared with the two traditional word embeddings for this specific Bangla sarcasm detection binary classification task.

Kaji Mehedi Hasan Fahim, Mithila Moontaha, Mashrur Rahman, Ehsanur Rahman Rhythm, Annajiat Alim RaselCOMNETSAT2023
Detecting Derogatory Comments on Women using Transformer-Based Models

Natural Language Processing (NLP) is a piqued interest field nowadays, as it helps AI to understand and interpret human languages. In order to facilitate the advancement in this field, in this paper, we propose research on the detection of derogatory comments against women with the help of transformer-based models. Here, our main focus is to detect misogynistic comments, as the women of our country mainly get harassed by such texts. This paper aims to make a comparative study on how efficient transformer models are in detecting gender-biased slandering in languages such as English and Bengali. To carry out this research procedure, the datasets we used were in English and Bengali languages which were further trained across the following transformer models: BanglaBERT, XLM-RoBERTa, m-BERT, and DistilBERT. To give further richness to the paper, the Bengali and English datasets used were created by combining multiple different datasets in these languages. The datasets were extracted from various papers related to this or a similar field of research to help reduce biases and improve language understanding capability. Upon, training our datasets across the mentioned models, for the Bengali dataset, Bangla-BERT-Base performed the best with an F1 score of 94% and for the English dataset, m-BERT scored the best with an F1 score of 86.1%. To add on, since the paper mostly focuses on the Bengali language, it will furthermore, encourage others to increase research on low-resourced languages.

Sara Jerin Prithila, Fariha Hasan Tonima, Tahsina Tajrim Oishi, Md. Nazrul Islam, Ehsanur Rahman Rhythm, Annajiat Alim RaselCOMNETSAT2023
A Comparative Analysis of Customer Service Chatbots: Efficiency, Usability and Application

A computer programme that imitates and processes human interaction, either through the use of voice or text communication, is known as a chatbot. Its purpose is to be of assistance in the process of finding a solution to a problem. The transformation brought on by advances in technology has had an effect on every industry. The chatbot provides assistance with a wide variety of tasks, including Reservations, Customer Service, and a great number of other services. The fast development of technologies relating to artificial intelligence and natural language processing has resulted in an increase in the use of chatbots in a variety of fields, most notably in the field of customer service. Customers could receive advice that is prompt, accurate, and personalised through the use of chatbots, which has the potential to completely transform customer service. Because it can automate customer service and reduce the amount of work that needs to be done by humans, it has gained a lot of popularity in the business world. Which can help businesses improve the experience they provide for their customers. The purpose of this research is to undertake a comparative review of customer service chatbots, with a particular emphasis on their efficiency, usability, and application across a variety of business sectors. The research will uncover best practises, difficulties, and potential for improvement by analysing a variety of chatbot solutions.

Kefaiat Lamia Ehsani, Ehsanur Rahman Rhythm, Md Humaion Kabir Mehedi, Annajiat Alim RaselCATS2023
Text-based Q&A: Automated Question Generation and Answering for Enhanced Data Processing

In the age of information overload, where the world is getting increasingly digital, traditional methods of learning are becoming tedious and extensively outdated. Imagine a system with just a few clicks that can quickly generate perplexing questions and enlightening solutions from a given text. Likewise, this paper represents a groundbreaking system that uses stateof-the-art of natural language processing techniques to analyze subject-specific chapters to create questions and corresponding solutions of varying lengths. The system’s versatility as an ideal tool for a wide range of users, including students, researchers, and educators, is a result of its capability to handle a wide range of domains. By offering questions of insight along with appropriate answers, this aforementioned structure demonstrated its extraordinary accuracy and competency in our studies on an array of informational datasets. Altering the learning process and promoting knowledge discovery, this program is flexible in delivering brief or comprehensible solutions to inquiries, having the potential to completely change how individuals interact with written material, whether they are reading for short reference or conducting any depth research. Our suggested framework establishes a dynamic platform for immediate information that enables people to learn substantially more and comprehend any topic in depth. With this leading-edge method, bid goodbye to exhausting manual question generation and get ready to embrace a new era of seamless and fruitful learning with this cutting-edge system.

Ehsanur Rahman Rhythm, Abdul Halim Hosain, Nusrat Zaman Raya, Kazi Al Refat Pranta, Tonusree Talukder Trina, Md Sabbir Hossain, Md Humaion Kabir Mehedi, Annajiat Alim RaselCATS2023
Advancements in Optical Character Recognition for Bangla Scripts

Optical Character Recognition (OCR) systems are very powerful tools that are used to convert handwritten texts or digital data on an image to machine readable texts. The importance of Optical Character Recognition for handwritten documents cannot be overstated due to its widespread use in human transactions. OCR technology allows for the conversion of various types of documents or images into machine understandable data that can be analyzed, edited, and searched. In earlier years, manually crafted feature extraction techniques were used on comparatively small datasets which were not good enough for practical use. With the advent of deep learning, it was possible to perform OCR tasks more efficiently and accurately than ever before. In this paper, several OCR techniques have been reviewed. We mostly reviewed works on Bangla scripts and also gave an overview of the contemporary works and recent progresses in OCR technology (e.g. TrOCR, transformer w/CNN). It was found that for Bangla handwritten texts, CNN models like DenseNet121, ResNet50, MobileNet etc are the commonly adopted techniques because of their state of the art performance in object recognition tasks. Using an RNN layer like LSTM or GRU alongside the base CNN-based architecture, the accuracy can be further improved. TrOCR is a fairly new technique in this field that shows promise. Experimental results show that in synthetic IAM handwriting dataset it showed a Character Error Rate (CER) of 2.89. The goal of this paper is to provide a summary of the research conducted on character recognition of handwritten documents in Bangla Scripts and suggest future research directions.

Md Tanjim Mostafa, Ehsanur Rahman Rhythm, Md Humaion Kabir Mehedi, Annajiat Alim RaselASYU2023
Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

The heart of any substantial search engine is a crawler. A crawler is a program that collects web pages by following links from one web page to the next. Due to our complete dependence on search engines for finding information and insights into every aspect of human endeavors, from finding cat videos to the deep mysteries of the universe, we tend to overlook the enormous complexities of today’s search engines powered by the web crawlers to index and aggregate everything found on the internet. The sheer scale and technological innovation that enabled the vast body of knowledge on the internet to be indexed and easily accessible upon queries is constantly evolving. In this paper, we look at the current state of the massive apparatus of crawling the internet, specifically focusing on deep web crawling, given the explosion of information behind an interface that cannot be extracted from raw text. We also explore distributed search engines and the way forward for finding information in the age of large language models like ChatGPT or Bard. Our primary goal is to explore the junction of large-scale web crawling and search engines in an integrative approach to identify the emerging challenges and scopes in massive data where recent advancements in AI upend traditional means of information retrieval. Finally, we present the design of a new asynchronous crawler that can extract information from any domain into a structured format.

Asadullah Al Galib, Md Humaion Kabir Mehedi, Ehsanur Rahman Rhythm, Annajiat Alim RaselICOCI2023
Sentiment Analysis of Restaurant Reviews from Bangladeshi Food Delivery Apps

In this study, we conducted sentiment analysis on restaurant reviews from Bangladeshi food delivery apps using natural language processing techniques. Food delivery apps have become increasingly popular in Bangladesh, and understanding the sentiment of customer reviews can provide valuable insights for restaurant owners and food delivery app companies. In this research, we have created a dataset named “Bangladeshi Restaurant Reviews” by gathering customer reviews of restau-rants available on Foodpanda and Hungrynaki, which are two popular food delivery apps in Bangladesh. We used Robustly Optimized BERT Pretraining Approach (RoBERTa), AFINN, and DistilBERT, a distilled version of Bidirectional Encoder Repre-sentations from Transformers (BERT) to perform the sentiment analysis. Overall, this research paper highlights the importance of sentiment analysis in the food delivery industry and demonstrates the effectiveness of different models in performing this task. It also provides insights for businesses looking to use sentiment analysis to improve their services and products. The accuracy of the models evaluated, RoBERTa, AFINN, and DistilBERT, were 74%, 73%, and 77% respectively.

Ehsanur Rahman Rhythm, Rajvir Ahmed Shuvo, Md Sabbir Hossain, Md. Farhadul Islam, Annajiat Alim RaselESCI2023
Bengali Speech Recognition: An Overview

This study outlines the notable efforts of creating of automatic speech recognition (ASR) system in Bengali. It describes data from the Bengali language's existing voice corpus and the major reports that have contributed to the recent research scenario. It provides an overview of dataset or corpus that has been created for bengali ASR, challenge faced to create bengali ASR as well as techniques used to build Bengali ASR system. ASR techniques for the Bengali language have made significant progress in recent years. Our article contains studies from 2016 through 2020. We examined the results of these investigations, as well as the strategies used to accomplish this goal, for Automated voice recognition. We have examined these publications to obtain a feel of the present state of Bengali ASR. We have observed a dearth of sufficient datasets among these researchers, which is important for any automated system. Due to the language's abundance of consonant clusters, the Machine Learning (ML) system has difficulty interpreting Bengali words. As a result of these modifications, the system now confronts a new set of difficulties in terms of effectiveness and efficiency. Additionally, numerous words have nearly identical pronunciations. These are only some of the issues that the papers we examined face. This research makes use of a variety of techniques, including linear prediction coding, Mel Frequency Cepstral Coefficient, Hidden Markov Model, Neural Network, and Fuzzy logic. Bengali ASR will require further investigation shortly. While recent research is encouraging, ASR of other languages, such as English, is far from perfect and efficient.

Mashuk Arefin Pranjol, Farhin Rahman, Ehsanur Rahman Rhythm, Rajvir Ahmed Shuvo, Tanjib Ahmed, Bushra Yesmeen Anika, Md. Abdullah Al Masum Anas, Jahidul Hasan, Saiadul Arfain, Shadab Iqbal, Md Humaion Kabir Mehedi, Annajiat Alim RaselIICAIET2022

Preprints (1)

TitleAuthorsVenueYear
Distributed Computing for Big Data Analytics: Challenges and Opportunities

This paper explores the application of distributed computing systems for the processing and analysis of large data sets, also referred to as big data. The paper outlines the various challenges that can arise when working with big data, including issues with data storage, data processing, and data management. The report also explores the opportunities distributed computing systems present for overcoming these challenges and enabling efficient and effective big data analytics. Overall, the paper provides a detailed overview of distributed computing for big data analytics and offers insights into the potential benefits and drawbacks of using these systems for big data analysis.

Ehsanur Rahman Rhythm, Rajvir Ahmed Shuvo, Md Humaion Kabir Mehedi, Md Sabbir Hossain, Annajiat Alim RaselResearchGate2022

The * (asterisk) denotes equal contribution by both authors to the research.