Created a multi-modal Large Language Model (LLM) capable of processing and integrating text, image and, audio as input and generating output as a text format.
Emotion recognition on 'FRIENDS' audio dataset. The audio dataset is preprocessed for the outlier, data distribution, augmentation, etc. Worked on different features of the audio and visualized it.
Food recommendations based on present ingredients. It uses Graph Neural Network. It's built from the scratch. It is integrated with the MLOps using AWS cloud.
Used for language Identification and validation. Used LSTM deep learning algorithm. It is part of my research paper 'An architecture of machine translation'.