10. Getting Started#

10.1. Installation#

You can install the latest version of StructSense directly from PyPI using pip:

pip install structsense

Alternatively, you can install the latest version of StructSense from the source code on GitHub:

git clone https://github.com/sensein/structsense.git
cd structsense
pip install -e .

Or, if you prefer not to install anything, use StructSense directly from the BrainKB website at https://beta.brainkb.org/.

10.1.1. Python Version#

StructSense supports Python >=3.10,<3.13.

10.2. Requirements#

10.2.1. PDF Extraction with Grobid#

StructSense supports PDF extraction using Grobid (default) or an external API service.

10.2.1.1. Default: Grobid#

By default, StructSense uses Grobid for PDF extraction. You can install and run Grobid either with Docker or in a non-Docker setup.
We recommend using Docker for easier setup and dependency management.

10.2.1.1.1. Run Grobid with Docker#
docker pull lfoppiano/grobid:0.8.0
docker run --init -p 8070:8070 -e JAVA_OPTS="-XX:+UseZGC" lfoppiano/grobid:0.8.0

Note: The JAVA_OPTS="-XX:+UseZGC" flag helps prevent a macOS-specific error.

10.2.1.2. Alternative: Remote Service (e.g., Remote Grobid)#

If you prefer to use a remote service, set the environment variable as follows:

export GROBID_SERVER_URL_OR_EXTERNAL_SERVICE=http://your-remote-grobid:PORT

10.2.2. External PDF Extraction API#

If using a non‑Grobid API:

export GROBID_SERVER_URL_OR_EXTERNAL_SERVICE=https://api.SOMEAPIENDPOINT.com/api/extract
export EXTERNAL_PDF_EXTRACTION_SERVICE=True

The external API is assumed public (no auth) for now.

10.2.3. LLM#

10.2.3.1. LLM for Agents#

We are using OpenRouter to access models like GPT for agents. However, Ollama can also serve as a substitute for OpenRouter when using open-source models such as Llama.

10.2.3.2. Embedding configuration#

In our default setup, Ollama is used for embedding generation. You can also use other models via OpenRouter for this purpose.

10.3. Running#

10.3.1. Using OpenRouter#

structsense-cli extract \
  --source somefile.pdf \
  --api_key <YOUR_API_KEY> \
  --config someconfig.yaml \
  --env_file .env \
  --save_file result.json  # optional

10.3.2. Using Ollama (Local)#

structsense-cli extract \
  --source somefile.pdf \
  --config someconfig.yaml \
  --env_file .env_file \
  --save_file result.json  # optional

10.3.3. Chunking#

Disabled by default. Enable with:

--chunking True

10.4. Using Docker Compose#

The docker/ directory contains Docker Compose files for running the following components:

  • Grobid – for PDF extraction

  • Weaviate – In our StructSense architecture, Weaviate acts as the vector database responsible for storing the ontology, effectively serving as the Ontology database.

These Compose files allow you to quickly stand up a complete local StructSense stack.

If you prefer not to install dependencies system-wide, you can use the provided Docker Compose setup to run everything in container mode.
This makes it easy to isolate services and manage your environment with minimal setup.