Sosse (Selenium Open Source Search Engine) is an open source web archiving tool, crawler, and search engine. It’s designed to handle dynamic web content, and is built for transparency, reproducibility, and long-term usability.
It is ideal for monitoring, archiving, or indexing web pages — including those rendered with JavaScript. Sosse supports scheduled crawling, advanced querying, feed generation, and private search features.
Sosse is written in Python and supports browser-based crawling via Selenium, with Firefox or Chromium, and faster headless crawling using Requests. PostgreSQL is used as the primary database backend.
Current Version: Sosse {{ settings.SOSSE_VERSION_TAG }}
{% if settings.SOSSE_VERSION_TAG != 'dev' %}
(commit {{ settings.SOSSE_VERSION_COMMIT }}
)
{% endif %}
© 2022–2025 Laurent Defert.