Edge 431: Meet the Multimodal State Space Models

Extending SSMs behind language.

featured-image

The efficiencies of state space models(SSMs) were initially positioned as an alternative to transformer-based LLMs. A constant question in that space is whether SSMs could scale to other modalities. This is the goal of a very novel SSM model known as Cobra( you know, we need to keep the snake names coming 😊 ) In recent years, multimodal large language models (MLLMs) have seen significant advancements in various fields.

These models often rely on the well-known Transformer network, which, despite its popularity, suffers from inefficient quadratic computation complexity. To address this inefficiency, Cobra has been introduced as a solution with linear computational complexity. Cobra achieves this by incorporating the efficient Mamba language model into the visual modality.



.