View on GitHub

AVESA

(Audio-Visual Event Sentiment Analysis)

AVESA (Audio-Visual Event Sentiment Analysis)

Contributors

License

The MIT License is used for infrastructure part of AVESA

Check Our Previous Work:

B. Karakaya, E.B. Boztepe, and B. Karasulu, “Development of a Deep Learning Based Model for Recognizing the Environmental Sounds in Videos,” in The SETSCI Conference Proceedings Book, vol. 5, no. 1, pp. 53-58, 2022.

Link

Backends (Used Frameworks and Tools)

Framework & Tool	Used For
Keras	Deep Learning API
Tensorflow-io & Opencv-Python	Getting spectrograms and Data Augmentation techniques such as Multiple Masking, CLAHE etc.
h5py	To Save trained deep learning model
Numpy & Pandas & Matplotlib	General Purpose
Moviepy & Pydub	Applying some operations into frames and videos
NLTK & Zeyrek & Jellyfish	To find Similarity Score
NLTK & Spacy	Applying NLP techniques
Vosk	Get the speech to text translation from the video
BERT	Pre-trained models for Sentiment Analysis
Transformers	To Apply BERT models into the text for Sentiment Analysis
Librosa	Audio Processing, Gaussian Noise and some data augmentation techniques
Gradio	To build GUI structure. "Apache License 2.0"
Torch	Used in backend for BERT, Vosk models

Link for Gradio Reference:

(Abid, Abubakar and Abdalla, Ali and Abid, Ali and Khan, Dawood and Alfozan, Abdulrahman and Zou, James, (2019), “Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild”, ICML HILL 2019, arXiv preprint arXiv:1906.02569.):

Metholodological Flow Diagram of our AVESA system

How To Use the AVESA:

In order to use this GIU, first, you need to download this repo using this link.

Download

You need to run requierements.py file. After running the .py file, we recommend you to restart the kernel. Otherwise, you might get some errors about gradio. If there is an error about installing any library, You can use requirements.txt to install all libraries individually.

Run this code section through your ide’s console:
```
!python -m spacy download en
```
Run project.py The output will give you a local IP adress. Copy that and paste it into the URL section of your browser. Do not forget to check your current folder inside the editor. Current folder must be the folder where google drive link downloaded!

You can try the model incoming web page on your browser. Sometimes an error might occur by using not MP4 files. So, we suggest you to use .MP4 files as inputs.

You can also start with the examples that was provided to you.

Known Issues and Fixing the Possible Errors

1) Make sure that your code runs in the same folder where your codes are. To ensure it, you can used this code.
```
import os
print(os.getcwd())
```
If it is not the same folder, paste this line into the console or terminal of your ide.
```
cd path\to\your\folder
```
2) If you encounter with an error while waiting the output of given video or exit while the model running, in order to run the model again, delete models/video_sum.h5 and delete Frames/name_of_the_video/. Otherwise you migth get these errors: ```diff
models/video_sum.h5 is running by other process or unable to create a file, the file models/video_sum.h5 already exits: ``` You need to restart to kernel in order to delete this .h5 file. After restarting the kernel, you can delete the .h5 file and retry the model.
3) Keep in mind, Frames folder has to be empty. There will be folder in it only while the model is running. In order to run the model again or after the model gives an output, check if the Frames folder is empty or not.
4) In order to run the model, video_sum.h5 from the models folder should not exists in the folder. In order to run the model, delete this .h5 file first. This file will only be created while the model running and when the output is given, this .h5 file should be deleted. Check if this file is deleted or not.

Some Notes:

Getting every frame and scoring by action rate for every frame process takes long and it gets longer if the video length gets longer! if there is a long text in the ner generated video for English, getting the speech by a vosk model takes long time. For Turkish, it takes short time but because of the proposed algorithm for Turkish NER, the process takes long time. So, please be patient to see the output while the model running.
You can also follow the process or you can see the reason of error through pycharm or spyder console if you are using one of them.

Enjoy the AVESA

To learn more about the model, you can check the journal article about the AVESA here: http://saucis.sakarya.edu.tr/tr/pub/issue/72246/1139765