Text Extraction
Web Extraction AI
Extract data, content, tables and images from any website with ease
Function for extracting text from web pages. It sends a request to the Web Extraction AI Model and returns the extracted text. Key components such as headlines, paragraphs, images, and tables are identified, and the algorithm extracts them in a structured format like JSON. Additionally, the extracted data is cleaned and normalized to enhance its usability for analysis and processing purposes. Read more at: https://docs.worqhat.com/ai-models/text-extraction/web-extraction
Parameters
Parameter | Type | Description | Default |
---|---|---|---|
code_blocks | boolean | Indicates whether to extract code blocks. | true |
headline | boolean | Indicates whether to extract headlines. | true |
inline_code | boolean | Indicates whether to extract inline code. | true |
references | boolean | Indicates whether to extract references. | true |
url_path | string | Represents the URL of the web page to extract text from. | N/A |
Initialize AI Modules
const worqhat = require('worqhat');
var config = new worqhat.Configuration({
apiKey: "your-api-key",
debug: true,
max_retries: 3,
});
worqhat.initializeApp(config);
let ai = worqhat.ai();
Implementation
ai.textExtraction.web({
url_path: "https://worqhat.com",
code_blocks: true,
headline: true,
inline_code: true,
references: true,
}).then((response) => {
console.log(response);
}).catch((error) => {
console.log(error);
});
Was this page helpful?