GeneChat is a newly developed multi-modal large language model designed to predict gene functions by generating free-form, natural language descriptions directly from nucleotide sequences and textual prompts. It integrates a DNABERT-2-based gene encoder optimized for capturing long-range genomic context, an adaptor to align gene representations with the input space of a large language model, and Vicuna-13B, a fine-tuned variant of LLaMA-2, to produce coherent functional narratives. This model aims to bridge the gap between biological sequence models and user-friendly conversational interfaces for genomics research. GeneChat builds on advances in multimodal large language models that combine protein sequence-structure representations with natural language understanding, potentially advancing protein function prediction beyond traditional protein language models. Other related tools mentioned include ChatNT, a conversational agent trained on a dataset spanning 27 biological functions and species, capable of optimizing multiple tasks simultaneously in conversational English, and CellNEST, a method leveraging graph attention networks and contrastive learning to analyze cell-cell communication patterns in spatial transcriptomics.
GeneChat integrates a DNABERT-2-based gene encoder optimized for long-range genomic context, an adaptor that aligns gene representations with the input space of a large language model, and Vicuna-13B, a fine-tuned LLaMA-2 variant used to produce coherent functional narratives.
GeneChat, a multi-modal large language model designed to generate free-form, natural language descriptions of gene functions directly from nucleotide sequences and textual prompts.
GeneChat: A Multi-Modal Large Language Model for Gene Function Prediction https://t.co/PuUrQq0Y4q https://t.co/cZfJaf5HKD