Three Prompts: Speech To Slides

Introduction

Have you ever been struck by inspiration and wished you could quickly turn your thoughts into a presentation without the hassle of typing everything out? The Speech-to-Slides app solves this problem by letting you create presentations using just your voice. This innovative web application leverages the Web Speech API to convert spoken words into beautifully formatted Reveal.js slide decks in real-time.

What makes this project particularly interesting is that it was created using only three AI prompts, demonstrating how complex functionality can be implemented efficiently with the right guidance. The app combines speech recognition technology with modern web development practices to create a seamless experience for users who prefer speaking over typing when creating presentations.

The Three Prompts

Here are the exact prompts that were used to create this project:

Prompt 1

@projects/speech-to-slides 

Speech-to-Slides	Hit record, speak bullet points, get an auto-generated Reveal.js deck.	Web Speech API → Markdown → Reveal; 2: add voice commands (“next slide”, “theme X”); 3: fix mis-captured punctuation.

Prompt 2

This error is showing in the console

App.jsx:73 Uncaught ReferenceError: Cannot access 'processVoiceCommand' before initialization
    at App (App.jsx:73:20)

Prompt 3

All the text in the inputs and presenation are white on white. Fix this.


Also remove the ability to changes themes by saying it. It's broken.



1. fix the input colors
2. Fix the presentation colors, the text in the presentation is white on white...
2. Fix the UI, 1 background colors full width, keep it simple
3. Remove the text 'Say "theme [name]" to change the theme.'

Let's analyze how these prompts guided the development process:

Initial Concept and Core Functionality: The first prompt outlined the basic concept of the application—converting speech to slides using Web Speech API, markdown conversion, and Reveal.js for presentation. It established the core functionality and technology stack.
Bug Fix: The second prompt addressed a specific JavaScript error related to function initialization order, highlighting the importance of proper code organization in React components.
UI and UX Improvements: The final prompt focused on fixing visual issues with the application, including text color problems, simplifying the UI, and removing problematic features (theme voice commands). This demonstrates the iterative refinement process that's essential in software development.

These prompts show a natural progression from concept to implementation to refinement, covering the full development lifecycle of the application.

Technologies Used

The Speech-to-Slides application leverages several modern web technologies:

React: The UI is built using React (v19.1.0), providing a component-based architecture for the application
Web Speech API: The core speech recognition functionality uses the browser's native Web Speech API
Reveal.js: This powerful presentation framework (v4.5.0) renders the final slides with professional transitions and styling
Marked: The markdown parsing library (v9.1.0) converts the processed speech text into HTML for the slides
Vite: The project uses Vite as the build tool and development server for fast iteration

Key Features and Functionality

Speech Recognition

The application uses the Web Speech API's continuous recognition mode to capture speech in real-time. The speech recognition engine processes spoken words and converts them into text that's displayed in the transcript area. This happens seamlessly as you speak, allowing you to see your words appear on screen instantly.

Voice Commands

One of the most powerful features is the ability to control the presentation structure using voice commands. Saying "next slide" automatically creates a new slide break in your presentation, allowing you to organize your content without touching the keyboard.

Automatic Formatting

The app doesn't just transcribe your words—it intelligently formats them into proper presentation content:

Spoken sentences are converted into bullet points
Punctuation is automatically corrected and standardized
First letters of sentences are capitalized
Periods are added where needed

Live Preview

As you speak or edit the transcript, you can see a live preview of how your slides will look. Each slide is displayed as a card, giving you immediate feedback on your presentation's structure and content.

Presentation Mode

When you're ready to view your presentation, the app transforms into a full-screen Reveal.js deck with professional styling and navigation controls. You can navigate through slides using arrow keys and return to the editor when needed.

Responsive Design

The application features a responsive design that works well on both desktop and mobile devices, with special attention paid to touch-friendly controls and appropriate sizing for smaller screens.

Implementation Details

Speech Recognition Implementation

The speech recognition functionality is implemented using React's useEffect and useRef hooks. The Web Speech API is initialized when the component mounts, and the recognition process is configured to be continuous, allowing for uninterrupted speech capture:

useEffect(() => {
  // Initialize Web Speech API
  if ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
    recognitionRef.current = new SpeechRecognition()
    recognitionRef.current.continuous = true
    recognitionRef.current.interimResults = true
    
    // Event handlers for speech recognition
    // ...
  }
}, [isRecording, processVoiceCommand])

Voice Command Processing

Voice commands are processed in real-time using a callback function that analyzes the speech transcript for specific phrases:

const processVoiceCommand = useCallback((command) => {
  // Handle "next slide" command
  if (command.includes('next slide')) {
    setTranscript(prev => prev + '\n\nnext slide\n\n')
    return
  }
}, [setTranscript])

Transcript Processing

The raw transcript is processed into formatted slides using a series of string manipulations and regular expressions:

const processTranscript = () => {
  // Split transcript by "next slide" or periods to create bullet points
  const slideTexts = transcript.split(/next slide/i).map(text => text.trim()).filter(Boolean)
  
  const processedSlides = slideTexts.map(slideText => {
    // Split by periods or line breaks to create bullet points
    const points = slideText
      .split(/\.\s+|\n/)
      .map(point => point.trim())
      .filter(Boolean)
      .map(point => {
        // Fix common punctuation issues
        let fixedPoint = point
          .replace(/\s+([.,;:!?])/g, '$1') // Remove spaces before punctuation
          .replace(/([.,;:!?])([A-Za-z])/g, '$1 $2') // Add space after punctuation if missing
        
        // Capitalize first letter if it's not already
        if (fixedPoint && /[a-z]/.test(fixedPoint[0])) {
          fixedPoint = fixedPoint[0].toUpperCase() + fixedPoint.slice(1)
        }
        
        // Add period at the end if missing
        if (fixedPoint && !fixedPoint.match(/[.,;:!?]$/)) {
          fixedPoint += '.'
        }
        
        return fixedPoint
      })
      .filter(Boolean)
    
    // Convert points to markdown
    const markdown = points.map(point => `* ${point}`).join('\n')
    return markdown
  })
  
  setSlides(processedSlides)
  setStatus('Ready')
}

Reveal.js Integration

The presentation mode uses Reveal.js, which is initialized when the user enters presentation mode and destroyed when they exit:

useEffect(() => {
  if (showPresentation && presentationRef.current) {
    if (revealRef.current) {
      revealRef.current.destroy()
    }
    
    revealRef.current = new Reveal(presentationRef.current, {
      controls: true,
      progress: true,
      center: true,
      hash: true,
    })
    
    revealRef.current.initialize()
  }
  
  return () => {
    if (revealRef.current) {
      revealRef.current.destroy()
      revealRef.current = null
    }
  }
}, [showPresentation, slides])

Challenges and Solutions

Challenge 1: Function Reference Error

Problem: The application initially had a ReferenceError where the processVoiceCommand function was being used before it was defined.

Solution: The function was moved before its usage and wrapped in a useCallback hook to maintain proper dependency management in React:

// Define processVoiceCommand before using it in useEffect
const processVoiceCommand = useCallback((command) => {
  // Function implementation
}, [setTranscript])

useEffect(() => {
  // Now the function can be safely used here
}, [isRecording, processVoiceCommand])

Challenge 2: Text Visibility Issues

Problem: The application had text visibility issues where text in inputs and the presentation was white on a white background, making it unreadable.

Solution: The CSS was updated to explicitly set text colors and ensure proper contrast:

textarea {
  background-color: #ffffff;
  color: var(--text-color);
}

.reveal {
  color: #333333;
}

.reveal h1, 
.reveal h2, 
.reveal h3, 
.reveal h4, 
.reveal h5, 
.reveal h6 {
  color: #333333;
}

.reveal ul li,
.reveal ol li,
.reveal p {
  color: #333333;
}

Challenge 3: Voice Command Reliability

Problem: The theme voice commands were unreliable and causing confusion.

Solution: The theme voice command functionality was removed, and the UI was simplified to use a fixed theme, focusing on the core functionality of speech-to-slides conversion.

Conclusion

The Speech-to-Slides application demonstrates how modern web technologies can be combined to create a powerful tool that transforms the way we create presentations. By leveraging the Web Speech API, React, and Reveal.js, the app provides a seamless experience for converting spoken words into professional-looking slides.

The development process, guided by just three prompts, showcases the efficiency of iterative development—starting with core functionality, addressing bugs, and refining the user experience. The result is a practical tool that could be valuable for anyone who prefers speaking over typing when creating presentations.

Potential future enhancements could include additional voice commands for text formatting, export options to various formats, image insertion capabilities, and improved punctuation handling. The foundation laid by this project provides a solid base for these extensions.