{% extends 'base.html' %} {% block title %}Ingest Document{% endblock %} {% block content %}

Ingest Document

Parse documents (PDF, HTML, YouTube, DOCX, PPT, TXT) into clean text that can be used for content generation.

{{ form.csrf_token }}
{{ form.input_type(class="form-select", id="input_type_selector") }}
Select how you want to provide your document.
{{ form.upload_file(class="form-control") }} {% if form.upload_file.errors %}
{% for error in form.upload_file.errors %} {{ error }} {% endfor %}
{% endif %}
Supported formats: PDF, HTML, DOCX, PPT, TXT.
{{ form.output_name(class="form-control", placeholder="Leave blank to use original name") }} {% if form.output_name.errors %}
{% for error in form.output_name.errors %} {{ error }} {% endfor %}
{% endif %}
Optional. Specify a custom filename for the output text file.
{{ form.submit(class="btn btn-primary") }}

Supported Document Types

File Types
  • PDF Files
    Extract text from PDF documents
    .pdf
  • Word Documents
    Parse Microsoft Word documents
    .docx
  • PowerPoint
    Extract text from presentations
    .pptx
  • Text Files
    Plain text documents
    .txt
Web Content
  • Web Pages
    Extract content from HTML web pages
    URL
  • YouTube Videos
    Extract transcript from YouTube videos
    YouTube
Example URLs
  • YouTube: https://www.youtube.com/watch?v=example
  • Web Page: https://example.com/article
{% endblock %} {% block scripts %} {% endblock %}