Text Extraction
Overview
The Text Extraction node uses AI to extract text and metadata from different content sources such as PDFs, web pages, images, or audio files. It converts raw files or URLs into structured, readable text that can be used by other nodes in your workflow.
Description
Extract text and metadata from documents, websites, images, and audio.
This node is designed to automatically pull text and metadata from a variety of input formats. Depending on the selected extraction type, it can process different kinds of media.
PDF Extraction
Read and extract text from PDF documents.
Web Extraction
Capture and process the textual content and metadata of a webpage.
Image Extraction
Identify and read text from images (OCR-based).
Audio Extraction
Convert speech-to-text from audio files.
It helps automate content analysis, data gathering, and transcription workflows with minimal setup.
Input Parameters
The Text Extraction node accepts flat key-value inputs that determine the source and method of text extraction.
extractionTypestringRequired"pdf-extraction", "web-extraction", "image-extraction", "audio-extraction"webUrlstringOptional"https://example.com/article"attachmentsstringOptional"file1.pdf,file2.pdf" or "{{nodeId.output.file1}}"Provide flat key-value pairs for all input parameters.
- For multiple files, separate entries with commas.
- You can dynamically reference data from previous nodes using:
{{nodeId.input.<key>}}
Output Parameters
After execution, the Text Extraction node returns the extracted text along with processing information and detailed metadata. The available outputs depend on the extractionType.
Common Outputs (All Types)
contentstringOptional"Extracted text content..."processingCountnumberOptional1processingTimestringOptional"2025-10-27T12:05:43Z"processingIdstringOptional"proc_67892"PDF Extraction Outputs
markdownstringOptional"**Title**\n\nBody text..."metadataobjectOptional{ "title": "Report.pdf", "pages": 5 }Web Extraction Outputs
markdownstringOptional"# Page Title\n\nArticle content..."linksOnPage[]arrayOptional["https://example.com/link1", "https://example.com/link2"]metadataobjectOptional{ "title": "Page Title", "description": "...", "ogImage": "..." }Image Extraction Outputs
contentstringOptional"Invoice #12345..."Audio Extraction Outputs
speaker_labels[]arrayOptional[{ "speaker": "Speaker 1", "text": "Hello" }]timestamps[]arrayOptional[{ "word": "Hello", "startTime": 0.5, "endTime": 1.0 }]Access output values using variables like:
{{nodeId.output.content}} → Extracted text
{{nodeId.output.metadata.title}} → Page or document title
{{nodeId.output.linksOnPage[0]}} → First link on the page
{{nodeId.output.speaker_labels[0].text}} → First speaker’s text segment
Output Type
Output Type: text
The output type must always be exactly:
"text"
This ensures the node consistently provides extracted text output, regardless of the extraction source (PDF, web, image, or audio).
Example Usage
Example 1: Extract Text from a PDF
{ "extractionType": "pdf-extraction", "attachments": "file123.pdf"}
{ "content": "This is the text extracted from the PDF document.", "processingId": "proc_67892", "processingTime": "2025-10-27T12:05:43Z", "processingCount": 1}
Example 2: Extract Text from Image
{ "extractionType": "image-extraction", "attachments": "image123.png"}
{ "content": "Here is the article content extracted from the page...", "processingTime": "2025-10-27T12:06:11Z", "processingId": "output-1761738927697", "processingCount": 41}
Example 3: Extract Text from Audio
{ "extractionType": "audio-extraction", "attachments": "nature_audio.mp3"}
{ "content": "The evening sky was painted with shades of orange and violet...", "processingTime": "2025-10-27T12:05:43Z", "processingId": "dc9509af-3783-48bd-8819-85126a8e66a8", "processingCount": 188}
How to Use in a No-Code Workflow
Add the Text Extraction Node
Drag and drop the node into your workflow.
Select Extraction Type
Choose the appropriate extraction type:
"pdf-extraction"for PDF files"web-extraction"for web pages"image-extraction"for scanned images or screenshots"audio-extraction"for speech-to-text conversion
Provide Input
- For PDFs, images, or audio: add file references under
attachments. - For webpages: enter the target URL in
webUrl.
Run the Node
The node will extract text, metadata, or transcription automatically.
Access Results
Use variable syntax to retrieve outputs such as:
{{textExtraction.output.content}}
{{textExtraction.output.metadata.title}}
Connect to Next Nodes
You can pass the extracted text to other nodes like Text Generation, AI Analysis, or Summarization for further automation.
Best Practices
- Ensure the correct extraction type is selected before execution.
- When extracting from multiple files, separate file IDs with commas.
- For web extractions, make sure the URL is publicly accessible.
- Use OCR-friendly images (clear text visibility) for best results.
- For large audio files, split them into smaller segments to improve performance.
- Always review extracted content for accuracy before using it in production workflows.
Common Errors
Missing extractionTypeErrorOptionalMissing webUrlErrorOptionalMissing attachmentsErrorOptionalUnsupported file formatErrorOptionalEmpty outputErrorOptionalExample Workflow Integration
Use Case 1: Summarize a PDF Report
Automatically extract text from an uploaded PDF and generate a summary.
- File Upload Node – Receives the PDF file.
- Text Extraction Node – Extracts text using
pdf-extraction. - Text Generation Node – Summarizes the extracted content.
- Email Node – Sends the summary to the user.
Workflow Data Flow:
{{fileUpload.output.fileId}} → {{textExtraction.input.attachments}}
{{textExtraction.output.content}} → {{textGeneration.input.prompt}}
Use Case 2: Web Scraping & Analysis
Extract content from a competitor's blog post and analyze the sentiment.
- REST API Trigger – Receives the blog URL.
- Text Extraction Node – Extracts text and metadata using
web-extraction. - Text Generation Node – Analyzes the sentiment of the article.
- Database Node – Stores the analysis results.
Workflow Data Flow:
{{trigger.input.url}} → {{textExtraction.input.webUrl}}
{{textExtraction.output.content}} → {{textGeneration.input.prompt}}
Use Case 3: Receipt Processing (Image)
Extract data from a photo of a receipt for expense tracking.
- WhatsApp Trigger – User sends a photo of a receipt.
- Text Extraction Node – Extracts text using
image-extraction. - Text Generation Node – Extracts structured JSON (date, amount, vendor) from the text.
- Google Sheets Node – Adds a new row with the expense details.
Workflow Data Flow:
{{whatsapp.output.mediaId}} → {{textExtraction.input.attachments}}
{{textExtraction.output.content}} → {{textGeneration.input.prompt}}
Use Case 4: Meeting Transcription (Audio)
Transcribe a meeting recording and email the minutes.
- File Upload Node – User uploads an MP3 recording.
- Text Extraction Node – Transcribes audio using
audio-extraction. - Text Generation Node – Generates meeting minutes from the transcription.
- Email Node – Sends the minutes to all attendees.
Workflow Data Flow:
{{fileUpload.output.fileId}} → {{textExtraction.input.attachments}}
{{textExtraction.output.content}} → {{textGeneration.input.prompt}}
