10. Getting Started#
10.1. Installation#
You can install the latest version of StructSense directly from PyPI using pip:
pip install structsense
Alternatively, you can install the latest version of StructSense from the source code on GitHub:
git clone https://github.com/sensein/structsense.git
cd structsense
pip install -e .
Or, if you prefer not to install anything, use StructSense directly from the BrainKB website at https://beta.brainkb.org/.
10.1.1. Python Version#
StructSense supports Python >=3.10,<3.13.
10.2. Requirements#
10.2.1. PDF Extraction with Grobid#
StructSense supports PDF extraction using Grobid (default) or an external API service.
10.2.1.1. Default: Grobid#
By default, StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup.
We recommend using Docker for easier setup and dependency management.
10.2.1.1.1. Run Grobid with Docker#
docker pull lfoppiano/grobid:0.8.0
docker run --init -p 8070:8070 -e JAVA_OPTS="-XX:+UseZGC" lfoppiano/grobid:0.8.0
Note: The
JAVA_OPTS="-XX:+UseZGC"flag helps prevent a macOS-specific error.
10.2.1.2. Alternative: Remote Service (e.g., Remote Grobid)#
If you prefer to use a remote service, set the environment variable as follows:
export GROBID_SERVER_URL_OR_EXTERNAL_SERVICE=http://your-remote-grobid:PORT
10.2.2. External PDF Extraction API#
If using a non‑Grobid API:
export GROBID_SERVER_URL_OR_EXTERNAL_SERVICE=https://api.SOMEAPIENDPOINT.com/api/extract
export EXTERNAL_PDF_EXTRACTION_SERVICE=True
The external API is assumed public (no auth) for now.
10.2.3. LLM#
10.2.3.1. LLM for Agents#
We are using OpenRouter to access models like GPT for agents. However, Ollama can also serve as a substitute for OpenRouter when using open-source models such as Llama.
10.2.3.2. Embedding configuration#
In our default setup, Ollama is used for embedding generation. You can also use other models via OpenRouter for this purpose.
10.3. Running#
10.3.1. Using OpenRouter#
structsense-cli extract \
--source somefile.pdf \
--api_key <YOUR_API_KEY> \
--config someconfig.yaml \
--env_file .env \
--save_file result.json # optional
10.3.2. Using Ollama (Local)#
structsense-cli extract \
--source somefile.pdf \
--config someconfig.yaml \
--env_file .env_file \
--save_file result.json # optional
10.3.3. Chunking#
Disabled by default. Enable with:
--chunking True
10.4. Using Docker Compose#
The docker/ directory contains Docker Compose files for running the following components:
Grobid – for PDF extraction
Weaviate – In our StructSense architecture, Weaviate acts as the vector database responsible for storing the ontology, effectively serving as the Ontology database.
These Compose files allow you to quickly stand up a complete local StructSense stack.
If you prefer not to install dependencies system-wide, you can use the provided Docker Compose setup to run everything in container mode.
This makes it easy to isolate services and manage your environment with minimal setup.