“생각하고 답변하는” 카카오의 하이브리드 멀티모달 언어모델, Kanana-v-4b-hybrid 개발기 - tech.kakao.com
The article discusses the development of Kakao's hybrid multimodal language model, Kanana-v-4b-hybrid. The team aimed to create a model capable of understanding various modalities, including text, images, and voice, that can reason and respond intelligently beyond simple interpretations. Feedback from users indicated a demand for AI that can comprehend and verify information autonomously, leading to the creation of a hybrid model that delivers fast intuitive responses for straightforward queries and logical reasoning for complex problems. The model achieved a score of 92.8 on the KoNET evaluation, confirming its effectiveness in Korean language processing. Key goals include enhancing performance with multiple image and video inputs, supporting external function calls for sophisticated tasks, and developing automatic reasoning capabilities to determine when to use inference versus non-inference modes. The article emphasizes the importance of maintaining performance consistency while managing operational costs across various applications.