Function for extracting text from web pages. It sends a request to the Web Extraction AI Model and returns the extracted text. Key components such as headlines, paragraphs, images, and tables are identified, and the algorithm extracts them in a structured format like JSON. Additionally, the extracted data is cleaned and normalized to enhance its usability for analysis and processing purposes. Read more at: https://docs.worqhat.com/ai-models/text-extraction/web-extraction

Parameters

ParameterTypeDescriptionDefault
code_blocksbooleanIndicates whether to extract code blocks.true
headlinebooleanIndicates whether to extract headlines.true
inline_codebooleanIndicates whether to extract inline code.true
referencesbooleanIndicates whether to extract references.true
url_pathstringRepresents the URL of the web page to extract text from.N/A

Initialize AI Modules

const worqhat = require('worqhat');

var config = new worqhat.Configuration({
    apiKey: "your-api-key",
    debug: true,
    max_retries: 3,
});


worqhat.initializeApp(config);

let ai = worqhat.ai();

Implementation

ai.textExtraction.web({
    url_path: "https://worqhat.com",
    code_blocks: true,
    headline: true,
    inline_code: true,
    references: true,
}).then((response) => {
    console.log(response);
}).catch((error) => {
    console.log(error);
});