Introducing Vidscribe AI: The Future of Effortless Content Creation
Effortlessly convert your multimedia content into engaging written blogs
Table of contents
Content creation is exhausting. Recording videos, editing podcasts, then writing blogs? It's like running three marathons before breakfast. But what if there was a way to cut that workload by 90%?
Well, this all started with a personal problem. You spend multiple hours drastically, planning, recording and editing a video or podcast. Your content is packed with insights and value. But then comes the most honest task - turning that multimedia into a written blog that can reach a wider audience.
Imagine a tool that understands the true value of your content and help you in speeding up your content creation process.
Vidscribe AI: Build for the future
Meet VidScribe AI: Not just another tool, but a content creation revolution that turns hours of work into minutes of magic. ✨
Vidscribe AI lets you turn your videos and audio into captivating, SEO-optimized blog posts. Vidscribe AI is designed for content creators, podcasters, writers, etc., who want to maximize their content reach. The AI understands the context of your content and creates a readable, SEO-optimized article that captures the essence of your original media.
You might be wondering how all this thing works? Let’s see how Vidscibe AI does it’s magic.
How Vidscribe AI works?
Well, it’s too simple:
Upload your multimedia content Whether it's a passionate podcast discussion, an informative YouTube video, or a detailed video tutorial, VidScribe AI can handle it.
Next up, Let Vidscribe AI do it’s magic. It doesn't just transcribe—it understands context, tone, and key messages.
Done! Congratulations on generating your blog. Now your multimedia content is transformed into a engaging, SEO-optimised blog post. 🚀
Features
Now, let's explore some amazing features that VidScribe AI offers.
Video and audio transcription 📝
Converts spoken words from videos or audios intro written text. Ensuring high accuracy through AI-powered speech recognition.
AI-Powered blog structuring ✍️
Formats the extracted content into blog-ready structures with headlines, subheading, and paragraphs.
Customization Options 🎨
Users can adjust length and depth of the generated blogs, choosing between brief summaries or detailed articles. This feature, also offers manual editing and the ability to add personal modifications to AI generated content.
SEO-Optimised Blogs 🌐
Identifies trending and SEO-optimized keywords based on the content of the video/audio. And incorporates keywords naturally into the blogs to boost discoverability on search engines. No more hassle with drafting SEO-optimized blogs. VidScribe AI has already done it for you!
Instant Extraction ⬇️
Extract generated blogs within a blink of your eye. You don’t need to manually convert it into any format. VidScribe AI already provides it in markdown format.
From concept to creation
Every groundbreaking idea begins with a concept, and same goes for VidScribe AI. We started with a simple concept; let's see how we turned this idea into reality.
Ideation
I began with a simple yet powerful observation and problem. People spend more time converting their content than creating it. Sometimes, they need to write as well as create. This is where we decided to develop something to solve this problem.
Design
As said by steve jobs:
"Design is not just what it looks like and feels like. Design is how it works."
– Steve Jobs
With keeping this philosophy in mind, I began designing the interface for VidScribe AI by conducting research and thinking about how to make the design both creative and simple. Many existing AI tools seem too complicated for new users. That's where I decided to design an interface that is both simple and creative, allowing new users to use it easily without any hassle.
Development
After the design, it was time to get all jacked. I began by selecting the best technology to meet the project's needs, such as high-accuracy speech-to-text conversion and scalability.
In the initial phase, I went through multiple iterations and overcame various issues. It's true that the early development stages are like obstacles. Each iteration brought its own set of challenges, but I tackled them one by one.
As the iterations began to take shape, each test felt like a milestone, and each resolved issue was a small victory. This was my first time integrating Modus into an AI app. There were different types of issues, but referring to the Modus documentation made my process easier. This development process was not just about writing code; it was about continuously evolving the solution and applying critical thinking.
Next, comes the AI part where I was supposed to refine and train the model in such a way that it does it’s job perfectly.
- I decided to use ChatGPT’s Whisper model for speech-to-text conversion. Developed by OpenAI, Whisper stands out in the crowded field of speech-to-text technologies for its remarkable accuracy and multilingual support. The model's deep learning architecture, trained on a massive, diverse dataset, allows it to handle various accents, background noises, and linguistic nuances with exceptional precision.
- Here’s a sneak peek of whisper model integration:
const transcriptions = await openai.audio.transcriptions.create({
model: "whisper-1",
file: file,
});
The integration process involved several steps:
Carefully configuring the model's parameters to optimize performance.
Implementing robust error handling and fallback mechanisms.
Fine-tuning the model to our specific use case and audio input characteristics.
Secondly, for blog generation I turned to go with Meta's Llama 3.1-8B-Instruct model. This powerful large language model brought sophisticated natural language understanding and generation capabilities to the project. The 8B parameter model strikes an optimal balance between efficiency and generation, enabling us to create contextually relevant content.
The integration of this model included:
Developing a precise prompting strategy to guide the model's output
Implementing context management to ensure consistent and relevant content generation
Creating safeguards to maintain content quality and originality
Here's how the model configuration looks like:
{
"$schema": "https://schema.hypermode.com/modus.json",
"endpoints": {
"default": {
"type": "graphql",
"path": "/graphql",
"auth": "bearer-token"
}
},
"models": {
"text-generator": {
"sourceModel": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"provider": "hugging-face",
"connection": "hypermode"
}
}
}
Next, I used the Modus SDK to dynamically invoke models. Then, I provided instructions and prompts to the models to achieve the best desired results. Modus makes it much easier to get any AI model up and running quickly.
import { models } from "@hypermode/modus-sdk-as";
import {
OpenAIChatModel,
ResponseFormat,
SystemMessage,
UserMessage,
} from "@hypermode/modus-sdk-as/models/openai/chat";
// this model name should match the one defined in the modus.json manifest file
const modelName: string = "text-generator";
export function generateBlogContent(transcriptions: string): string {
const instruction =
"You are a skilled content writer that converts audio transcriptions into well-structured, engaging blog posts in Markdown format. Create a comprehensive blog post with a catchy title, introduction, main body with multiple sections, and a conclusion. Analyze the user's writing style from their previous posts and emulate their tone and style in the new post. Keep the tone casual and professional.";
const prompt = `Please convert the following transcription into a well-structured blog post using Markdown formatting. Follow this structure:
1. Start with a SEO friendly catchy title on the first line.
2. Add two newlines after the title.
3. Write an engaging introduction paragraph.
4. Create multiple sections for the main content, using appropriate headings (##, ###).
5. Include relevant subheadings within sections if needed.
6. Use bullet points or numbered lists where appropriate.
7. Add a conclusion paragraph at the end.
8. Ensure the content is informative, well-organized, and easy to read.
9. Emulate my writing style, tone, and any recurring patterns you notice from my previous posts.
Here's the transcription to convert: ${transcriptions}`;
const model = models.getModel<OpenAIChatModel>(modelName);
const input = model.createInput([
new SystemMessage(instruction),
new UserMessage(prompt),
]);
// this is one of many optional parameters available for the OpenAI chat interface
input.temperature = 0.7;
const output = model.invoke(input);
return output.choices[0].message.content.trim();
}
To be honest, none of this would have been possible without the help of Modus. Their model invoking API provided to be a transformative hence easiest solution for AI Integration process. By using modus, I’ve got flexibility and ease in working with this complex AI Models.
The collaboration between Modus, Whisper, and Llama 3.1 created a powerful ecosystem that transformed this initial concept into a fully functional, intelligent content generation tool.
Deployment 🎉
Last but not the least, I hit that “Deploy” button on vercel with a successful tool that streamlines the content creation process.
Tech Stack
Frontend: NextJS, TailwindCSS, Shadcn UI
Backend: Hypermode (https://hypermode.com)
Useful Links
Live at: https://vidscribe-ai.vercel.app
Vidscibe AI repository: https://github.com/Darshancodes/Vidscribe-ai
Hypermode model instance (Vidscribe Backend): https://github.com/Darshancodes/vidscribe-modus-backend
Conclusion
The rapid growth of AI is revolutionizing the way we create content. No more manual writing, creation, and conversion - AI has got it covered!
This project won’t stop here. It is planned with multiple exciting upcoming features that will take Vidscribe to next level. Stay Tuned!
Special thanks to Hypermode and Hashnode for organizing this amazing hackathon. Here's to innovation and creativity, Learned a lot! 🚀