Build a Real-Time AI Chat in Next.js with Streaming Responses (Step by Step)

Every AI chat tutorial shows you the same thing: send a message, wait for the full response, display it. That's not how ChatGPT works. That's not how users expect AI to work. They expect streaming — tokens appearing in real time, word by word.

Here's how to build a production-ready AI chat with streaming in Next.js, including the parts tutorials usually skip: error handling, rate limiting, and a polished UI.

What Streaming Means and Why It Matters

Without streaming, your AI chat flow looks like this:

User sends message
Spinner for 3-5 seconds
Full response appears at once

With streaming:

User sends message
First words appear in 100ms
Response types out in real time

The perceived latency drops from seconds to milliseconds. Users feel like the AI is "thinking" alongside them rather than disappearing into a black box.

Setting Up the API Route

We'll use Server-Sent Events (SSE) for streaming. Create an API route:

typescript

// app/api/chat/stream/route.ts
import Groq from 'groq-sdk';

const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });

export async function POST(request: Request) {
  const { message } = await request.json();

  const stream = await groq.chat.completions.create({
    model: 'llama3-70b-8192',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: message },
    ],
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          const text = chunk.choices[0]?.delta?.content || '';
          if (text) {
            controller.enqueue(
              encoder.encode(`data: ${JSON.stringify({ text })}\n\n`)
            );
          }
        }
        controller.enqueue(encoder.encode('data: [DONE]\n\n'));
        controller.close();
      } catch (error) {
        controller.enqueue(
          encoder.encode(`data: ${JSON.stringify({ error: 'Stream failed' })}\n\n`)
        );
        controller.close();
      }
    },
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}

Key points:

stream: true tells Groq to return chunks instead of a complete response
We use ReadableStream to pipe chunks to the client as SSE events
The [DONE] sentinel tells the client the stream is complete
Error handling inside the stream prevents hanging connections

Building the Chat UI Component

The client needs to consume the SSE stream and render tokens as they arrive:

typescript

// hooks/useStreamingChat.ts
import { useState, useCallback } from 'react';

export function useStreamingChat() {
  const [messages, setMessages] = useState<Array<{
    role: 'user' | 'assistant';
    content: string;
  }>>([]);
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = useCallback(async (content: string) => {
    setMessages((prev) => [...prev, { role: 'user', content }]);
    setMessages((prev) => [...prev, { role: 'assistant', content: '' }]);
    setIsStreaming(true);

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: content }),
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();

      if (!reader) throw new Error('No reader available');

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter((l) => l.startsWith('data: '));

        for (const line of lines) {
          const data = line.replace('data: ', '');
          if (data === '[DONE]') break;

          try {
            const parsed = JSON.parse(data);
            if (parsed.text) {
              setMessages((prev) => {
                const updated = [...prev];
                const last = updated[updated.length - 1];
                updated[updated.length - 1] = {
                  ...last,
                  content: last.content + parsed.text,
                };
                return updated;
              });
            }
          } catch {
            // Skip malformed chunks
          }
        }
      }
    } catch (error) {
      setMessages((prev) => {
        const updated = [...prev];
        updated[updated.length - 1] = {
          role: 'assistant',
          content: 'Sorry, something went wrong. Please try again.',
        };
        return updated;
      });
    } finally {
      setIsStreaming(false);
    }
  }, []);

  return { messages, sendMessage, isStreaming };
}

Handling Errors Gracefully

Production AI chat needs to handle three failure modes:

1. Network Errors

The fetch can fail entirely. Wrap it in try/catch and show a user-friendly message.

2. Rate Limiting

Groq and OpenAI both rate limit. Check for 429 responses and show a "please wait" message:

typescript

if (response.status === 429) {
  setMessages((prev) => {
    const updated = [...prev];
    updated[updated.length - 1] = {
      role: 'assistant',
      content: 'I\'m receiving too many requests right now. Please try again in a few seconds.',
    };
    return updated;
  });
  return;
}

3. Partial Stream Failures

The stream can start successfully but fail midway. The error handler inside ReadableStream.start() catches this and sends an error event so the client knows to stop waiting.

Rate Limiting on Your End

Don't rely solely on the AI provider's rate limits. Implement your own:

typescript

// Simple in-memory rate limiter
const rateLimit = new Map<string, number[]>();
const WINDOW_MS = 60_000;
const MAX_REQUESTS = 10;

function isRateLimited(ip: string): boolean {
  const now = Date.now();
  const timestamps = rateLimit.get(ip) || [];
  const recent = timestamps.filter((t) => now - t < WINDOW_MS);
  rateLimit.set(ip, recent);

  if (recent.length >= MAX_REQUESTS) return true;
  recent.push(now);
  return false;
}

The Typewriter Effect

For polish, add a CSS animation that makes the streaming text feel like typing:

css

@keyframes blink {
  0%, 50% { opacity: 1; }
  51%, 100% { opacity: 0; }
}

.streaming-cursor::after {
  content: '▋';
  animation: blink 1s infinite;
  color: #00F5FF;
}

Add the streaming-cursor class to the last message while isStreaming is true. Remove it when the stream completes.

Deploying to Vercel

Two things to configure:

Environment variables: Add GROQ_API_KEY in Vercel dashboard
Runtime: SSE requires the Node.js runtime, not Edge. Add to your route:

typescript

export const runtime = 'nodejs';

Edge runtime doesn't support streaming with all providers. Stick with Node.js for reliability.

The Complete Flow

User types a message and hits send
Message appears in the chat UI immediately
Empty assistant message placeholder appears with blinking cursor
Fetch POST to /api/chat/stream
API creates Groq streaming completion
SSE chunks flow back to the client
Each chunk appends to the assistant message in real time
[DONE] event removes the cursor and enables the input
Error at any point shows a friendly message

Conclusion

Streaming transforms AI chat from a "request-response" interaction into a conversation. The implementation is more complex than a simple fetch, but the UX improvement is dramatic.

The code above is production-tested — it's the foundation of the AI chat on my portfolio. Every response you see there streams in real time using this exact pattern.

Building a custom AI chat for your product? I've done it multiple times and can help you ship faster. Let's talk.

Build a Real-Time AI Chat in Next.js with Streaming Responses (Step by Step)

What Streaming Means and Why It Matters

Setting Up the API Route

Building the Chat UI Component

Handling Errors Gracefully

1. Network Errors

2. Rate Limiting

3. Partial Stream Failures

Rate Limiting on Your End

The Typewriter Effect

Deploying to Vercel

The Complete Flow

Conclusion

Written by Youness Haji

Enjoyed this article?

Related Articles

How I Built a Full AI-Powered App in a Weekend (And What I Learned)

Next.js SEO in 2025: The Complete Checklist I Use on Every Project

Groq vs OpenAI in Production: What 3 Months of Real Usage Taught Me

Youness Haji

Stay Updated