Ich habe Speech to Text getestet und mir dazu auf YouTube ein Video einer Haushaltsrede gesucht und mittels Onlinedienst als MP3 heruntergeladen.
Anschließend mit OpenAI Whisper in Text umgewandelt. Die Umwandlung hat ungefähr so lange gedauert wie die Audiodatei. Das Ergebnis ist recht gut geworden, muss aber auf jeden Fall kontrolliert und korrigiert werden.
According to OpenAPI documentation this is a sample code to generate Speech from Text:
from pathlib import Path
from openai import OpenAI
client = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
)
response.stream_to_file(speech_file_path)
To execute this sample I have to install openai first:
pip install openai
To play the mp3-file I have to install ffmpeg first:
sudo apt install ffmpeg
Create mp3 and play it:
# run sample code
python sample.py
# play soundfile
ffplay speech.mp3
Play MP3 with Python
Install pygame:
pip install pygame
from pathlib import Path
import pygame
def play_mp3(file_path):
pygame.mixer.init()
pygame.mixer.music.load(file_path)
pygame.mixer.music.play()
# Keep the program running while the music plays
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# Usage
speech_file_path = Path(__file__).parent / "speech.mp3"
play_mp3(speech_file_path)
python playmp3.py
Read Heise Article
from dotenv import load_dotenv
from pathlib import Path
from openai import OpenAI
import selenium.webdriver as webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pygame
def scrape_website(website):
print("Launching chrome browser...")
service = Service()
options = Options()
options.headless = True # Headless-Modus aktivieren, um den Browser unsichtbar zu machen
driver = webdriver.Chrome(service=service, options=options)
try:
driver.get(website)
print("Page loaded...")
html = driver.page_source
return html
finally:
driver.quit()
def split_dom_content(dom_content, max_length=6000):
return [
dom_content[i : i + max_length] for i in range(0, len(dom_content), max_length)
]
def scrape_heise_website(website):
html = scrape_website(website)
# BeautifulSoup zum Parsen des HTML-Codes verwenden
soup = BeautifulSoup(html, 'html.parser')
# Artikel-Header und -Inhalt extrahieren
# Der Header ist oft in einem
-Tag zu finden
header_title = soup.find('h1', {'class': 'a-article-header__title'}).get_text().strip()
header_lead = soup.find('p', {'class': 'a-article-header__lead'}).get_text().strip()
# Der eigentliche Artikelinhalt befindet sich oft in einem
-Tag mit der Klasse 'article-content'
article_div = soup.find('div', {'class': 'article-content'})
paragraphs = article_div.find_all('p') if article_div else []
# 'redakteurskuerzel' entfernen
for para in paragraphs:
spans_to_remove = para.find_all('span', {'class': 'redakteurskuerzel'})
for span in spans_to_remove:
span.decompose() # Entfernt den Tag vollständig aus dem Baum
article_content = "\n".join([para.get_text().strip() for para in paragraphs])
return article_content
# Header und Artikelinhalt ausgeben
#result = "Header Title:" + header_title + "\nHeader Lead:" + header_lead + "\nContent:" + article_content
#return result
def article_to_mp3(article_content):
client = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=article_content
)
response.stream_to_file(speech_file_path)
def play_mp3():
speech_file_path = Path(__file__).parent / "speech.mp3"
pygame.mixer.init()
pygame.mixer.music.load(speech_file_path)
pygame.mixer.music.play()
# Keep the program running while the music plays
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)
# .env-Datei laden#
load_dotenv()
article_content = scrape_heise_website("https://www.heise.de/news/Streit-ueber-Kosten-Meta-kappt-Leitungen-zur-Telekom-9953162.html")
article_to_mp3(article_content)
play_mp3()
To start with an easier example, I will use PrivateGPT with OpenAI/ChatGPT as AI. Of course therefore the chat will not be private, what is the main reason to use PrivateGPT, but it is a good start to bring things up and running and in a next step add a local AI.
OpenAI API key
To use ChatGPT we need an OpenAI API key. The key itself is free, but I needed to charge my account with 5$ to get it working.
For testing a Playground is available.
Before funding my account:
After funding my account with the minimum of 5$:
Docker
The OpenAI API key is stored in a file .env, that provides its content to docker compose as environment variables.
In docker-compose we set the API key and profile: openai as environment for our Docker container:
In Docker image we configure installation for openai:
RUN poetry install --extras "ui llms-openai vector-stores-qdrant embeddings-openai"
PrivateGPT will download Language Model files during its setup, so we provide a mounted volume for this model files and execute the setup at the start of the container and not at image build:
volumes:
- ../models/cache:/app/privateGPT/models/cache
command: /bin/bash -c "poetry run python scripts/setup && make run"
Here are the complete files, you can also find them on my GitHub:
# Use the specified Python base image
FROM python:3.11-slim
# Set the working directory in the container
WORKDIR /app
# Install necessary packages
RUN apt-get update && apt-get install -y \
git \
build-essential
# Clone the private repository
RUN git clone https://github.com/imartinez/privateGPT
WORKDIR /app/privateGPT
# Install poetry
RUN pip install poetry
# Lock and install dependencies using poetry
RUN poetry lock
RUN poetry install --extras "ui llms-openai vector-stores-qdrant embeddings-openai"
Open http://localhost:8001 in your browser to open Private GPT and run a simple test:
Have a look at the logs to see that there is communication with OpenAI servers:
Chat with document
To "chat" with a document we first need a public available one, because right now we are using ChatGPT where we must not upload internal project documents.
So first ask PrivateGPT/ChatGPT to help us to find a document:
Working fine, we could easily find and download a PDF:
The upload of the PDF (The Go to Guide for Healthy Meals and Snacks.pdf) with 160 pages in 24 MB into PrivateGPT took nearly two minutes. In the logs we can see, that the file was uploaded to ChatGPT:
Let's chat with the book:
Uh, was that question too hard? Give it another try:
OK, sounds better. In the logs we can see the traffic to OpenAI:
Local, Ollama-powered setup
Now we want to go private, baby.
Copy configuration to a new folder, can be found in GitHub.
In docker-compose we change the profile to ollama:
environment:
- PGPT_PROFILES=ollama
In Docker image we configure installation for ollama:
RUN poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
As before we can build the image, start the container and watch the logs:
I did not use the large ~24MB file I tried with ChatGPT, but a much smaller one ~297 KB I randomly found in the internet. It is written in german, but it seems, like Ollama understands german.
Well, then I tried the 24 MB file and ... it worked pretty well, the result of the first question was even better than the result from ChatGPT!