MultiModal Demonstration

WorqHat AI Multimodal Content Generator: Documentation

This document provides an overview of the WorqHat AI Multimodal Content Generator, a Streamlit application that enables users to create comprehensive, multimodal content by combining text, audio, video, and images.

Purpose:

The application leverages the WorqHat AI API to analyze different types of input files, including text, audio, video, and images, to generate a cohesive, content-rich response. It simplifies the process of creating rich, multimodal content by combining the unique elements of each input type.

Key Features:

Text Input: Users can enter custom text that serves as the basis for the AI-generated content.
File Upload: Users can upload multiple files, including audio, video, and image formats, relevant to the content creation process.
AI-Powered Analysis: The WorqHat AI API analyzes all input files and generates a multimodal response that cohesively integrates each input type's unique aspects.
Multimodal Content Generation: Based on the input, the API generates a JSON response containing:
- Multimodal Response (Markdown): A comprehensive response that combines text, audio, video, and image elements cohesively.
- Subheadings: Structured content with clear subheadings for easy readability.
User-Friendly Display: The application presents the generated content in markdown format, ensuring clarity and ease of use.

Benefits:

Streamlined Content Creation: Automates the creation of multimodal content by combining various input types.
Data-Driven Insights: Analyzes all types of inputs to create insightful and targeted content.
Versatility: Supports text, audio, video, and image inputs to offer a flexible content generation process.
Consistent Quality: The API ensures consistent and high-quality responses, regardless of the type of input provided.

How it Works:

Input Text and Files: Users provide a text description and upload any combination of audio, video, and image files.
API Call: The application sends the text and files to the WorqHat AI API along with the API key.
Multimodal Analysis: The API analyzes each input type, such as text, audio, video, and images, to extract key elements.
Content Generation: The API generates a JSON response containing a detailed multimodal response.
Display Results: The application displays the generated content in markdown format.

Procedure to Follow:

Enter Text and Upload Files: Input a text description and upload relevant audio, video, or image files that align with the content you want to generate.
API Key: To use the application, you will need an API Key, which can be found at https://app.worqhat.com/. Enter the API key and press send.
Generate Content: The AI will analyze the input and generate cohesive content by merging all the modalities into a single response.
Review Content: The generated multimodal content will be displayed in markdown format, allowing you to review or share it with others.

Example Code Snippet:

def callapi(fileArray, api_key, text_input):
    start_time = time.time()
    url = "https://api.worqhat.com/api/ai/content/v4"
    payload = {
        "question": f"Please analyze the provided text, audio, video, and image files. Based on these inputs, generate a multimodal response...",
        "training_data": "...",
        "response_type": "json",
        "model": "aicon-v4-nano-160824",
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
    }
    response = requests.request(
        "POST", url, headers=headers, data=payload, files=fileArray)
    duration = time.time() - start_time
    if response.status_code == 200:
        return response.json(), round(duration, 2)
    else:
        return None, round(duration, 2)

This code snippet demonstrates the core function that sends the user's text, audio, video, and image files to the WorqHat AI API, processes them, and returns a JSON response containing the multimodal content.

Conclusion:

The WorqHat AI Multimodal Content Generator is a powerful tool for creating engaging, comprehensive content that incorporates multiple media types. By leveraging the WorqHat AI API, users can generate content that cohesively integrates text, audio, video, and images, ensuring a streamlined and efficient content creation process.

EndNote:

WorqHat is here to help you innovate and create by simplifying complex tasks. Our AI ensures that every aspect of your content creation process is covered, allowing you to focus on new ideas and creativity.

Try it Yourself

Generate your own multimodal content at: ``

Checkout the Code at our GitHub: https://github.com/WorqHat/worqhat-ai-examples/tree/main/Multimodal-Input

Last modified: 2 months ago