IBM released an open-sourced part of the IBM Deep Search Experience in a new toolkit, Deep Search for Scientific Discovery (DS4SD), for scientific research and businesses with the goal of spurring on the rate of scientific discovery.

To help achieve this goal, we’re now publicly releasing a key component of the Deep Search Experience, our automatic document conversion service. It allows users to upload documents in an interactive fashion to inspect a document’s conversion quality. DS4SD has a simple drag-and-drop interface, making it very easy for non-experts to use. We’re also releasing deepsearch-toolkit, a Python package, where users can programmatically upload and convert documents in bulk.

Deep Search uses AI to collect, convert, curate, and ultimately search huge document collections for information that is too specific for common search tools to handle. It collects data from public, private, structured, and unstructured sources and leverages state-of-the-art AI methods to convert PDF documents into easily decipherable JSON format with a uniform schema that is ideal for today’s data scientists. It then applies dedicated natural language processing and computer vision machine-learning algorithms on these documents and ultimately creates searchable knowledge graphs.

https://research.ibm.com/blog/deep-search-toolkit