Amanpreet Kaur Pawa

CAPSTONE PROJECT

Real-time Translation of Indian Sign Language to Hindi/Kannada

Final Year Capstone Project @ PES University with 2 Publications

Overview

Indian Sign Language (ISL) is a visual-gestural language used by the deaf community in India, with distinct grammar and syntax. This project aims to address the communication gap faced by the deaf-mute community through an application that translates ISL into text in Hindi and Kannada in real-time, providing accessibility to non-English speakers.

The system uses Mediapipe for video preprocessing and implements CNNs, RNN-LSTMs, and Transformer-Encoders, with LSTMs and Transformers achieving 95-97% accuracy. An ensemble model combines their strengths, and the output text is further processed into coherent Hindi or Kannada sentences.

This affordable, portable solution tackles the lack of interpreters in India, enabling better communication for the hearing-impaired population.

The project was undertaken as a capstone initiative at PES University in collaboration with Attili Subha Vidisha, Anubuthi Kottapalli, and Anshula Aithal.

Overview

This affordable, portable solution tackles the lack of interpreters in India, enabling better communication for the hearing-impaired population.

The project was undertaken as a capstone initiative at PES University in collaboration with Attili Subha Vidisha, Anubuthi Kottapalli, and Anshula Aithal.

Overview

This affordable, portable solution tackles the lack of interpreters in India, enabling better communication for the hearing-impaired population.

The project was undertaken as a capstone initiative at PES University in collaboration with Attili Subha Vidisha, Anubuthi Kottapalli, and Anshula Aithal.

Roles and Responsibilities

Researcher
Data Analyst
AI Developer

Roles and Responsibilities

Researcher
Data Analyst
AI Developer

Roles and Responsibilities

Researcher
Data Analyst
AI Developer

Tools used

Python
Google Colab Notebook
VS Code

Tools used

Python
Google Colab Notebook
VS Code

Tools used

Python
Google Colab Notebook
VS Code

Project Context

Final Year Capstone project
@ PES University, Bengaluru
Under mentorship of Dr. Ashwini M Joshi

Project Context

Final Year Capstone project
@ PES University, Bengaluru
Under mentorship of Dr. Ashwini M Joshi

Project Context

Final Year Capstone project
@ PES University, Bengaluru
Under mentorship of Dr. Ashwini M Joshi

ISL Translation Model Results

CNN: 75% training accuracy
RNN-LSTM: 87% validation accuracy
Transformer-Encoder: 83.1% validation accuracy

ISL Translation Model Results

CNN: 75% training accuracy
RNN-LSTM: 87% validation accuracy
Transformer-Encoder: 83.1% validation accuracy

ISL Translation Model Results

CNN: 75% training accuracy
RNN-LSTM: 87% validation accuracy
Transformer-Encoder: 83.1% validation accuracy

Publications

“Analysis of Vision based Techniques for the Translation of Indian Sign Language”
- Presented @ International Conference of Engineering and Technology (ICET 2023)
- Published in International Journal on Recent and Innovation Trends in Computing and Communication
“Real-time Translation of Indian Sign Language to Hindi and Kannada”
- Presented @ ICC Robins 2024 (International Conference on Cognitive Robotics and Intelligent Systems)
- Available on IEEE Xplore Digital Library

Publications

“Analysis of Vision based Techniques for the Translation of Indian Sign Language”
- Presented @ International Conference of Engineering and Technology (ICET 2023)
- Published in International Journal on Recent and Innovation Trends in Computing and Communication
“Real-time Translation of Indian Sign Language to Hindi and Kannada”
- Presented @ ICC Robins 2024 (International Conference on Cognitive Robotics and Intelligent Systems)
- Available on IEEE Xplore Digital Library

Publications

“Analysis of Vision based Techniques for the Translation of Indian Sign Language”
- Presented @ International Conference of Engineering and Technology (ICET 2023)
- Published in International Journal on Recent and Innovation Trends in Computing and Communication
“Real-time Translation of Indian Sign Language to Hindi and Kannada”
- Presented @ ICC Robins 2024 (International Conference on Cognitive Robotics and Intelligent Systems)
- Available on IEEE Xplore Digital Library

Defining the Problem

Deaf and Mute Population in India:

WHO estimates approximately 63 million Indians are hearing disabled, but there are only 300 certified interpreters in India, significantly affecting communication of non-verbal individuals with the rest of the population.

Existing Solutions:

Widespread solutions available for American Sign Language unfortunately do not extend to Indian Sign Language because ASL requires only the hands, while ISL combines hands, facial expressions and body language to deliver contextual meaning.

Defining the Problem

Deaf and Mute Population in India:

Existing Solutions:

Defining the Problem

Deaf and Mute Population in India:

Existing Solutions:

Implementation

Data Acquisition:

A Dataset consisting of 4292 videos was obtained from zenodo.org. Each category had multiple words and each word was signed by 3 or 4 different signers multiple times.The data set did not consist of any verb words needed for basic conversations, and thus needed to be constructed. We selected 48 most commonly used verb words: Fall, Run, Sleep, Break, Smell, Think, Suggest, Walk, Want, Watch, etc. and recorded 9 videos for each of the 48 signs, a total of 432 videos were added to the obtained dataset.

Implementation

Data Acquisition:

Implementation

Data Acquisition:

Pre-processing:

TFrames were generated from the inpute video, and then annotated using the mediapipe framework, which would assign coordinates to 21 unique position for each hand and 33 pose positions, with 3 x y z coordinates each resulting in 225 coordinates per frame.

Pre-processing:

Frames captured from the dataset

Annotated Frames using Mediapipe

SLR Model Training and Evaluation:

All the models created were done with the help of Tensorflow and Keras. These libraries provide various layers such as LSTM layer, Dense layer, Dropout layer, Conv3D layer, MaxPooling1D layer, Flatten layer, etc.

CNN: Trained for over 300 epochs with 75% training accuracy. However, it performed poorly on unseen data, with testing accuracy ranging between 16-21% and validation accuracy around 15-16%. Despite extensive training, the model struggled to generalize.
RNN - LSTMs: Achieved 95-97% training accuracy and 82-85% testing accuracy after 150-200 epochs. It also performed well on the validation dataset (83.1%).
Transformer Encoder: Trained for 30-40 epochs with 97% training accuracy and 87% testing accuracy. The model performed consistently on both training and validation datasets.
Ensemble Model: Combined LSTM and Transformer models to enhance performance. By comparing the probabilities of each model's output for each gloss, the ensemble model selects the label with higher confidence, improving predictive capabilities.
DeepTranslate Modules: Translated the final output sequence (English to Hindi and Kannada) using two modules, each containing Deep Translate layers to ensure coherence and produce meaningful translations from the derived labels. In the future, this can be extended to any language, particularly to include all Indian languages, enabling the system to cater to users across the nation.

SLR Model Training and Evaluation:

CNN: Trained for over 300 epochs with 75% training accuracy. However, it performed poorly on unseen data, with testing accuracy ranging between 16-21% and validation accuracy around 15-16%. Despite extensive training, the model struggled to generalize.
RNN - LSTMs: Achieved 95-97% training accuracy and 82-85% testing accuracy after 150-200 epochs. It also performed well on the validation dataset (83.1%).
Transformer Encoder: Trained for 30-40 epochs with 97% training accuracy and 87% testing accuracy. The model performed consistently on both training and validation datasets.
Ensemble Model: Combined LSTM and Transformer models to enhance performance. By comparing the probabilities of each model's output for each gloss, the ensemble model selects the label with higher confidence, improving predictive capabilities.
DeepTranslate Modules: Translated the final output sequence (English to Hindi and Kannada) using two modules, each containing Deep Translate layers to ensure coherence and produce meaningful translations from the derived labels. In the future, this can be extended to any language, particularly to include all Indian languages, enabling the system to cater to users across the nation.

SLR Model Training and Evaluation:

CNN: Trained for over 300 epochs with 75% training accuracy. However, it performed poorly on unseen data, with testing accuracy ranging between 16-21% and validation accuracy around 15-16%. Despite extensive training, the model struggled to generalize.
RNN - LSTMs: Achieved 95-97% training accuracy and 82-85% testing accuracy after 150-200 epochs. It also performed well on the validation dataset (83.1%).
Transformer Encoder: Trained for 30-40 epochs with 97% training accuracy and 87% testing accuracy. The model performed consistently on both training and validation datasets.
Ensemble Model: Combined LSTM and Transformer models to enhance performance. By comparing the probabilities of each model's output for each gloss, the ensemble model selects the label with higher confidence, improving predictive capabilities.
DeepTranslate Modules: Translated the final output sequence (English to Hindi and Kannada) using two modules, each containing Deep Translate layers to ensure coherence and produce meaningful translations from the derived labels. In the future, this can be extended to any language, particularly to include all Indian languages, enabling the system to cater to users across the nation.

Usability & Accuracy Considerations

Several efforts were made throughout the course of this project, to emphasize good user experience, accessibility, ethics etc.

Removal of Z-coordinate: Since 2D image frames were used, the Z-coordinate was excluded to avoid inaccuracies and reduce unnecessary computational load.
Bias Mitigation: Mediapipe was chosen for its neutrality, as it does not introduce skin tone bias, addressing challenges in traditional computer vision approaches, especially for diverse Indian skin tones.
Multilingual Output: The system offers multilingual support (Hindi and Kannada) to cater to non-English speaking users. English serves as an intermediate language for translation, with potential to extend to all local Indian languages.
Minimal Latency: Seamless conversation flow was prioritized by ensuring minimal latency, achieved through stream processing using Flask, to enhance the user experience with real-time interaction.

Usability & Accuracy Considerations

Several efforts were made throughout the course of this project, to emphasize good user experience, accessibility, ethics etc.

Removal of Z-coordinate: Since 2D image frames were used, the Z-coordinate was excluded to avoid inaccuracies and reduce unnecessary computational load.
Bias Mitigation: Mediapipe was chosen for its neutrality, as it does not introduce skin tone bias, addressing challenges in traditional computer vision approaches, especially for diverse Indian skin tones.
Multilingual Output: The system offers multilingual support (Hindi and Kannada) to cater to non-English speaking users. English serves as an intermediate language for translation, with potential to extend to all local Indian languages.
Minimal Latency: Seamless conversation flow was prioritized by ensuring minimal latency, achieved through stream processing using Flask, to enhance the user experience with real-time interaction.

Usability & Accuracy Considerations

Several efforts were made throughout the course of this project, to emphasize good user experience, accessibility, ethics etc.

Removal of Z-coordinate: Since 2D image frames were used, the Z-coordinate was excluded to avoid inaccuracies and reduce unnecessary computational load.
Bias Mitigation: Mediapipe was chosen for its neutrality, as it does not introduce skin tone bias, addressing challenges in traditional computer vision approaches, especially for diverse Indian skin tones.
Multilingual Output: The system offers multilingual support (Hindi and Kannada) to cater to non-English speaking users. English serves as an intermediate language for translation, with potential to extend to all local Indian languages.
Minimal Latency: Seamless conversation flow was prioritized by ensuring minimal latency, achieved through stream processing using Flask, to enhance the user experience with real-time interaction.

Application Design

The design aimed to provide a seamless user experience by first displaying the camera input for real-time sign language recognition. Alongside the camera view, the system shows the currently translating sentence, ensuring users can follow the translation in real time. To enhance usability, the design includes an option to easily scroll and view previous sentences, giving users control over the conversation flow. Additionally, recognizing the load-heavy nature of the recognition process, the system offers an option to pause the video input streaming, allowing users to control when they want input to be processed and when they prefer a break from the system’s recognition tasks. This feature ensures a more personalized and manageable interaction. It’s important to note that only a single screen was developed, as the project focused on the implementation of the core concept rather than a full-scale design.

Application Design

Challenges & Limitations

Several challenges fell outside the scope of this project, resulting in unresolved issues:

Contextual Limitations: The system processes individual sentences without tracking conversation context, leading to potential inaccuracies when coherence depends on prior sentences.
Sign Language Variations: Differences in regional and institutional sign language practices, such as varied symbols for punctuation, were not addressed, leading to potential ambiguities.
Dataset Constraints: The limited and non-diverse dataset impacted the system's ability to generalize across different users, regional variations, and linguistic contexts.

These limitations highlight areas for improvement and opportunities for future iterations of the project.

Challenges & Limitations

Several challenges fell outside the scope of this project, resulting in unresolved issues:

Contextual Limitations: The system processes individual sentences without tracking conversation context, leading to potential inaccuracies when coherence depends on prior sentences.
Sign Language Variations: Differences in regional and institutional sign language practices, such as varied symbols for punctuation, were not addressed, leading to potential ambiguities.
Dataset Constraints: The limited and non-diverse dataset impacted the system's ability to generalize across different users, regional variations, and linguistic contexts.

These limitations highlight areas for improvement and opportunities for future iterations of the project.

Challenges & Limitations

Several challenges fell outside the scope of this project, resulting in unresolved issues:

Contextual Limitations: The system processes individual sentences without tracking conversation context, leading to potential inaccuracies when coherence depends on prior sentences.
Sign Language Variations: Differences in regional and institutional sign language practices, such as varied symbols for punctuation, were not addressed, leading to potential ambiguities.
Dataset Constraints: The limited and non-diverse dataset impacted the system's ability to generalize across different users, regional variations, and linguistic contexts.

These limitations highlight areas for improvement and opportunities for future iterations of the project.

Future Works

Several areas have been identified for future improvement to enhance the system's performance and usability:

Context Awareness: Implement methods to track conversational context, ensuring coherence and continuity across multiple sentences.

Optimized Efficiency: Explore resource optimization techniques to reduce computational intensity, enabling real-time scalability.

Expanded Dataset: Build a more diverse and expansive dataset to improve generalization across varied users, regions, and linguistic contexts.

Sign Language Variations: Incorporate regional and institutional differences in sign language practices to accommodate diverse user needs.

Future Works

Several areas have been identified for future improvement to enhance the system's performance and usability:

Context Awareness: Implement methods to track conversational context, ensuring coherence and continuity across multiple sentences.

Optimized Efficiency: Explore resource optimization techniques to reduce computational intensity, enabling real-time scalability.

Expanded Dataset: Build a more diverse and expansive dataset to improve generalization across varied users, regions, and linguistic contexts.

Sign Language Variations: Incorporate regional and institutional differences in sign language practices to accommodate diverse user needs.

Future Works

Several areas have been identified for future improvement to enhance the system's performance and usability:

Context Awareness: Implement methods to track conversational context, ensuring coherence and continuity across multiple sentences.

Optimized Efficiency: Explore resource optimization techniques to reduce computational intensity, enabling real-time scalability.

Expanded Dataset: Build a more diverse and expansive dataset to improve generalization across varied users, regions, and linguistic contexts.

Sign Language Variations: Incorporate regional and institutional differences in sign language practices to accommodate diverse user needs.

Conference Presentations & Publications

Real-time Sign Language Translation using Computer Vision and Machine Learning

I presented my paper at ICC Robins 2024 (International Conference on Cognitive Robotics and Intelligent Systems), and it is available on the IEEE Xplore Digital Library. The paper documented our implementation, advancing research in the field by addressing existing gaps and furthering the practical application of the concept.

Conference Presentations & Publications

Real-time Sign Language Translation using Computer Vision and Machine Learning

Conference Presentations & Publications

Real-time Sign Language Translation using Computer Vision and Machine Learning

Analysis of Vision based Techniques for the Translation of Indian Sign Language

I presented my paper at the International Conference of Engineering and Technology (ICET 2023) and it was subsequently published in the International Journal on Recent and Innovation Trends in Computing and Communication. The paper served as a comprehensive literature survey, thoroughly documenting all advancements in Indian Sign Language (ISL) recognition, focusing on both hardware and software-based approaches for app development.

Analysis of Vision based Techniques for the Translation of Indian Sign Language

←PREV

ADHD Management Research

ISL Translation to Hindi/Kannada

Contact me

akpawa@utexas.edu

Lets Connect

Resume

Made with 🫶🏻

Contact me

akpawa@utexas.edu

Lets Connect

Resume

Made with 🫶🏻