Tonic.ai launched a secure data lakehouse for LLMs, Tonic Textual, to enable AI developers to securely leverage unstructured data for retrieval-augmented generation (RAG) systems and large language model (LLM) fine-tuning. Tonic Textual is a data platform designed to eliminate integration and privacy challenges ahead of RAG ingestion or LLM training bottlenecks. Leveraging its expertise in data management and realistic synthesis, Tonic.ai has developed a solution to tame and protect siloed, messy, and complex unstructured data into AI-ready formats ahead of embedding, fine-tuning, or vector database ingestion. With Tonic Textual: 

  1. Build, schedule, and automate unstructured data pipelines that extract and transform data into a standardized format convenient for embedding, ingesting into a vector database, or pre-training and fine-tuning LLMs. Textual supports TXT, PDF, CSV, TIFF, JPG, PNG, JSON, DOCX and XLSX out-of-the-box.
  2. Detect, classify, and redact sensitive information in unstructured data, and re-seed redactions with synthetic data to maintain the semantic meaning. Textual leverages proprietary named entity recognition (NER) models trained on a diverse data set spanning domains, formats, and contexts to ensure sensitive data is identified and protected.
  3. Enrich your vector database with document metadata and contextual entity tags to improve retrieval speed and context relevance in RAG systems.

https://www.tonic.ai/textual