PVTHust

Follow

🏠

Working from home

Viet Tien Pham PVTHust

🏠

Working from home

Follow

3 followers · 22 following

Ha Noi

Pinned Loading

Speech_project_Vin Speech_project_Vin Public

Multimodal Speech Emotion Recognition ViT (AST) for audio encoder and Multiscale Attention Net (MANet) for visual encoder

Python 3 2
project_NLP_final project_NLP_final Public

This is a group project in the vin program: Modality Balance for Multimodal Conversational Emotion Recognition

Python 1 1
HySonLab/LightMed HySonLab/LightMed Public

Light-weight Medical Image Segmentation

Python 3
ichigo ichigo Public

Forked from homebrewltd/ichigo

Llama3.1 learns to Listen

Python
LLaMA-Omni LLaMA-Omni Public

Forked from ictnlp/LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python
mini-omni mini-omni Public

Forked from gpt-omni/mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python