Merge commit 'b5a2b4e6d1b958dbb3ad702026889172514c1fd6'

This commit is contained in:
ForeverPyrite
2025-05-08 01:36:25 -04:00
18 changed files with 599 additions and 567 deletions

View File

@@ -4,9 +4,12 @@ __pycache__
*.pyd *.pyd
*.env *.env
*venv/ *venv/
*.git .git
.gitignore .gitignore
Dockerfile Dockerfile
docker-compose.yml docker-compose.yml
log.md log.md
.vscode <<<<<<< HEAD
.vscode
=======
>>>>>>> b5a2b4e6d1b958dbb3ad702026889172514c1fd6

50
.vscode/launch.json vendored
View File

@@ -1,26 +1,26 @@
{ {
// Use IntelliSense to learn about possible attributes. // Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes. // Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0", "version": "0.2.0",
"configurations": [ "configurations": [
{ {
"name": "Python Debugger: Flask", "name": "Python Debugger: Flask",
"type": "debugpy", "type": "debugpy",
"request": "launch", "request": "launch",
"cwd": "./app", "cwd": "./app",
"module": "flask", "module": "flask",
"env": { "env": {
"FLASK_APP": "./app.py", "FLASK_APP": "./app.py",
"FLASK_DEBUG": "1" "FLASK_DEBUG": "1"
}, },
"args": [ "args": [
"run", "run",
"--debug", "--debug",
"--no-reload" "--no-reload"
], ],
"jinja": true, "jinja": true,
"autoStartBrowser": false "autoStartBrowser": false
} }
] ]
} }

View File

@@ -1,5 +1,5 @@
{ {
"html.autoClosingTags": true, "html.autoClosingTags": true,
"html.format.enable": true, "html.format.enable": true,
"html.autoCreateQuotes": true "html.autoCreateQuotes": true
} }

View File

@@ -1,5 +1,5 @@
# Use an official Python runtime as a parent image # Use an official Python runtime as a parent image
FROM python:3.11-slim FROM python:3.13-slim
# Set environment variables # Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONDONTWRITEBYTECODE=1
@@ -18,7 +18,7 @@ COPY requirements.txt .
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt
# Copy application files # Copy application files
COPY . /app COPY /app /app
# Make start.sh executable # Make start.sh executable
RUN chmod +x /app/start.sh RUN chmod +x /app/start.sh

View File

@@ -1,18 +1,18 @@
This is simple web application that is made for a very specific purpose: to spite my 10th grade social studies teacher. This is simple web application that is made for a very specific purpose: to spite my 10th grade social studies teacher.
See, basically he didn't teach us anything, he wanted us to watch 10+ year old lectures on [his youtube channel](https://www.youtube.com/@mikebardonaro3227). See, basically he didn't teach us anything, he wanted us to watch 10+ year old lectures on [his youtube channel](https://www.youtube.com/@mikebardonaro3227).
Yes, he's been doing this for so long that he can probably monatise his channel. Yes, he's been doing this for so long that he can probably monatise his channel.
For each lecture, he also wanted us to take notes and create 5 Questions and Answers with a few critera. For each lecture, he also wanted us to take notes and create 5 Questions and Answers with a few critera.
Now here's what I thought about this: Now here's what I thought about this:
If he isn't actually going to teach us the content of the class that I have to physically attend, then why should I do anything but match the effort he's putting forth. If he isn't actually going to teach us the content of the class that I have to physically attend, then why should I do anything but match the effort he's putting forth.
So I made a Python script that took a video id and got the youtube transcript of it and then fed it to an OpenAI assistant to do it for me. So I made a Python script that took a video id and got the youtube transcript of it and then fed it to an OpenAI assistant to do it for me.
This was pretty pointless as you can also just copy the transcript and paste it into the assistant threads on OpenAI's playground platform but I still did it for the hell of it. This was pretty pointless as you can also just copy the transcript and paste it into the assistant threads on OpenAI's playground platform but I still did it for the hell of it.
That got me through the year just fine. That got me through the year just fine.
However, the next year I had a friend who also got into his class, and instead of having him repeatedly ask me for my old work, I figured why not let him create some..."original" work himself? However, the next year I had a friend who also got into his class, and instead of having him repeatedly ask me for my old work, I figured why not let him create some..."original" work himself?
So I spent a few nights developing a web application with a very simple task, and here it is. So I spent a few nights developing a web application with a very simple task, and here it is.
It is some pretty bad code. Like, actually "minimum to make it work" code. However, I've decided to use this as an oppurtunity to still learn some things and hopefully be able to do more dedicated things. It is some pretty bad code. Like, actually "minimum to make it work" code. However, I've decided to use this as an oppurtunity to still learn some things and hopefully be able to do more dedicated things.
I still occasionally revist it to try to make it a little better, and I might even scale up the website a bit and make it so that anyone can use it. Of course, this would come at a cost but I feel like it would be relatively deserved for the teacher after more than a dozen years of not doing anything. I still occasionally revist it to try to make it a little better, and I might even scale up the website a bit and make it so that anyone can use it. Of course, this would come at a cost but I feel like it would be relatively deserved for the teacher after more than a dozen years of not doing anything.
If I ever make this repository public, judge the hell out of me. Just know that, unfortunately for everyone who would be looking for it, I never commit any hardcoded API keys, or `.env`...sorry. If I ever make this repository public, judge the hell out of me. Just know that, unfortunately for everyone who would be looking for it, I never commit any hardcoded API keys, or `.env`...sorry.

View File

@@ -1,68 +1,90 @@
import logging import logging
import os import os
from flask import Flask, render_template, Response, request, session import uuid
from main import yoink, process, user_streams, stream_lock from flask import Flask, render_template, Response, request, session
import uuid # Import UUID from main import yoink, process, user_streams, stream_lock
app = Flask(__name__, static_folder="website/static", template_folder="website") app = Flask(__name__, static_folder="website/static", template_folder="website")
app.secret_key = os.urandom(24) # Necessary for using sessions app.secret_key = os.urandom(24) # Necessary for using sessions
# Configure logging
# Configure logging logging.basicConfig(
logging.basicConfig( filename='./logs/app.log',
filename='./logs/app.log', level=logging.DEBUG,
level=logging.DEBUG, format='%(asctime)s %(levelname)s: %(message)s',
format='%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S'
datefmt='%Y-%m-%d %H:%M:%S' )
)
def create_session():
def create_session(): """
session_id = str(uuid.uuid4()) Create a new session by generating a UUID and ensuring it does not collide
# This should never happen but I'm putting the logic there anyways with an existing session in the user_streams global dictionary.
try:
if user_streams[session_id]: Returns:
session_id = create_session() str: A unique session ID.
except KeyError: """
pass session_id = str(uuid.uuid4())
return session_id # Even though collisions are unlikely, we check for safety.
try:
if user_streams[session_id]:
@app.route('/') session_id = create_session()
def home(): except KeyError:
session_id = create_session() pass
session['id'] = session_id return session_id
logging.info(f"Home page accessed. Assigned initial session ID: {session_id}")
return render_template('index.html', session_id=session_id) @app.route('/')
def home():
@app.route('/process_url', methods=['POST']) """
def process_url(): Render the home page and initialize a session.
session_id = session.get('id')
if not session_id: Returns:
session_id = create_session() Response: The rendered home page with a unique session id.
session['id'] = session_id """
logging.info(f"No existing session. Created new session ID: {session_id}") session_id = create_session()
session['id'] = session_id
url = request.form['url'] logging.info(f"Home page accessed. Assigned initial session ID: {session_id}")
logging.info(f"Received URL for processing from session {session_id}: {url}") return render_template('index.html', session_id=session_id)
success, msg, status_code, = process(url, session_id)
@app.route('/process_url', methods=['POST'])
if success: def process_url():
logging.info(f"Processing started successfully for session {session_id}.") """
return Response("Processing started. Check /stream_output for updates.", content_type='text/plain', status=200) Accept a YouTube URL (from a form submission), initialize the session if necessary,
else: and trigger the transcript retrieval and AI processing.
logging.error(f"Processing failed for session {session_id}: {msg}")
return Response(msg, content_type='text/plain', status=status_code) Returns:
Response: Text response indicating start or error message.
@app.route('/stream_output') """
def stream_output(): session_id = session.get('id')
session_id = session.get('id') if not session_id:
if not session_id or session_id not in user_streams: session_id = create_session()
logging.warning(f"Stream requested without a valid session ID: {session_id}") session['id'] = session_id
return Response("No active stream for this session.", content_type='text/plain', status=400) logging.info(f"No existing session. Created new session ID: {session_id}")
logging.info(f"Streaming output requested for session {session_id}.") url = request.form['url']
return Response(yoink(session_id), content_type='text/plain', status=200) logging.info(f"Received URL for processing from session {session_id}: {url}")
success, msg, status_code = process(url, session_id)
if __name__ == '__main__': if success:
logging.info("Starting Flask application.") logging.info(f"Processing started successfully for session {session_id}.")
app.run(debug=True, threaded=True) # Enable threaded to handle multiple requests return Response("Processing started. Check /stream_output for updates.", content_type='text/plain', status=200)
else:
logging.error(f"Processing failed for session {session_id}: {msg}")
return Response(msg, content_type='text/plain', status=status_code)
@app.route('/stream_output')
def stream_output():
"""
Stream the AI processing output for the current session.
Returns:
Response: A streaming response with text/plain content.
"""
session_id = session.get('id')
if not session_id or session_id not in user_streams:
logging.warning(f"Stream requested without a valid session ID: {session_id}")
return Response("No active stream for this session.", content_type='text/plain', status=400)
logging.info(f"Streaming output requested for session {session_id}.")
return Response(yoink(session_id), content_type='text/plain', status=200)
if __name__ == '__main__':
logging.info("Starting Flask application.")
# Running with threaded=True to handle multiple requests concurrently.
app.run(debug=True, threaded=True)

View File

@@ -1,244 +1,331 @@
import re """
import threading Main module that handles processing of YouTube transcripts and connecting to the AI service.
import asyncio Each user session has its own output stream and thread to handle the asynchronous AI response.
from asyncio import sleep """
from typing_extensions import override
from datetime import datetime import re
import pytz import threading
import os import asyncio
import logging from asyncio import sleep
import uuid from datetime import datetime
import pytz
# Youtube Transcript imports import os
import youtube_transcript_api._errors import logging
from youtube_transcript_api import YouTubeTranscriptApi import uuid
from youtube_transcript_api.formatters import TextFormatter
# Youtube Transcript imports
# OpenAI API imports import youtube_transcript_api._errors
from openai import AssistantEventHandler from youtube_transcript_api import YouTubeTranscriptApi
from openai import OpenAI from youtube_transcript_api.formatters import TextFormatter
# Load environment variables # OpenAI API imports
from dotenv import load_dotenv from openai import AssistantEventHandler
load_dotenv() from openai import OpenAI
# Initialize user stream dictionary from dotenv import load_dotenv
user_streams = {} load_dotenv()
# Threading lock for thread safe stuff I think, idk it was used in the docs # Global dict for per-user session streams.
stream_lock = threading.Lock() user_streams = {}
# Lock to ensure thread-safe operations on shared memory.
# Handle async outside of async functions stream_lock = threading.Lock()
awaiter = asyncio.run
# For running async code in non-async functions.
# Configure logging awaiter = asyncio.run
try:
logging.basicConfig( # Configure logging
filename='./logs/main.log', try:
level=logging.INFO, logging.basicConfig(
format='%(asctime)s %(levelname)s: %(message)s', filename='./logs/main.log',
datefmt='%Y-%m-%d %H:%M:%S' level=logging.INFO,
) format='%(asctime)s %(levelname)s: %(message)s',
except FileNotFoundError as e: datefmt='%Y-%m-%d %H:%M:%S'
with open("./logs/main.log", "x"): )
pass except FileNotFoundError as e:
logging.basicConfig( with open("./logs/main.log", "x"):
filename='./logs/main.log', pass
level=logging.INFO, logging.basicConfig(
format='%(asctime)s %(levelname)s: %(message)s', filename='./logs/main.log',
datefmt='%Y-%m-%d %H:%M:%S' level=logging.INFO,
) format='%(asctime)s %(levelname)s: %(message)s',
logging.info(f"No main.log file was found ({e}), so one was created.") datefmt='%Y-%m-%d %H:%M:%S'
)
logging.info(f"No main.log file was found ({e}), so one was created.")
# The StreamOutput class to handle streaming
class StreamOutput: class StreamOutput:
def __init__(self): """
self.delta: str = "" Class to encapsulate a session's streaming output.
self.response: str = ""
self.done: bool = False Attributes:
self.buffer: list = [] delta (str): Last delta update.
response (str): Cumulative response from the AI.
def reset(self): done (bool): Flag indicating if streaming is complete.
self.delta = "" buffer (list): List of output delta strings pending streaming.
self.response = "" """
self.done = False def __init__(self):
self.buffer = [] self.delta: str = ""
self.response: str = ""
def send_delta(self, delta): self.done: bool = False
awaiter(self.process_delta(delta)) self.buffer: list = []
async def process_delta(self, delta): def reset(self):
self.delta = delta """
self.response += delta Reset the stream output to its initial state.
"""
def get_index(lst): self.delta = ""
if len(lst) == 0: self.response = ""
return 0 self.done = False
else: self.buffer = []
return len(lst) - 1
def send_delta(self, delta):
if self.buffer: """
try: Process a new delta string. This method is a synchronous wrapper that calls the async
if self.delta != self.buffer[get_index(self.buffer)]: method process_delta.
self.buffer.append(delta)
except IndexError as index_error: Args:
logging.error(f"Caught IndexError: {str(index_error)}") delta (str): The delta string to process.
self.buffer.append(delta) """
else: awaiter(self.process_delta(delta))
self.buffer.append(delta)
return async def process_delta(self, delta):
"""
# OpenAI Config Process a new delta chunk asynchronously to update buffering.
# Setting up OpenAI Client with API Key Args:
client = OpenAI( delta (str): The delta portion of the response.
organization='org-7ANUFsqOVIXLLNju8Rvmxu3h', """
project="proj_NGz8Kux8CSka7DRJucAlDCz6", self.delta = delta
api_key=os.getenv("OPENAI_API_KEY") self.response += delta
)
def get_index(lst):
# Screw Bardo Assistant ID return 0 if not lst else len(lst) - 1
asst_screw_bardo_id = "asst_JGFaX6uOIotqy5mIJnu3Yyp7"
if self.buffer:
# Event Handler for OpenAI Assistant try:
class EventHandler(AssistantEventHandler): if self.delta != self.buffer[get_index(self.buffer)]:
self.buffer.append(delta)
def __init__(self, output_stream: StreamOutput): except IndexError as index_error:
super().__init__() logging.error(f"Caught IndexError: {str(index_error)}")
self.output_stream = output_stream self.buffer.append(delta)
else:
@override self.buffer.append(delta)
def on_text_created(self, text) -> None: return
self.output_stream.send_delta("Response Received:\n\nScrew-Bardo:\n\n")
logging.info("Text created event handled.") # OpenAI Client configuration
client = OpenAI(
@override organization='org-7ANUFsqOVIXLLNju8Rvmxu3h',
def on_text_delta(self, delta, snapshot): project="proj_NGz8Kux8CSka7DRJucAlDCz6",
self.output_stream.send_delta(delta.value) api_key=os.getenv("OPENAI_API_KEY")
logging.debug(f"Text delta received: {delta.value}") )
def on_tool_call_created(self, tool_call): asst_screw_bardo_id = "asst_JGFaX6uOIotqy5mIJnu3Yyp7" # Assistant ID for processing
error_msg = "Assistant shouldn't be calling tools."
logging.error(error_msg) class EventHandler(AssistantEventHandler):
raise Exception(error_msg) """
Event handler for processing OpenAI assistant events.
def create_and_stream(transcript, session_id):
logging.info(f"Starting OpenAI stream thread for session {session_id}.") Attributes:
event_handler = EventHandler(user_streams[session_id]['output_stream']) output_stream (StreamOutput): The output stream to write updates to.
try: """
with client.beta.threads.create_and_run_stream( def __init__(self, output_stream: StreamOutput):
assistant_id=asst_screw_bardo_id, """
thread={ Initialize the event handler with a specific output stream.
"messages": [{"role": "user", "content": transcript}]
}, Args:
event_handler=event_handler output_stream (StreamOutput): The session specific stream output instance.
) as stream: """
stream.until_done() super().__init__()
with stream_lock: self.output_stream = output_stream
user_streams[session_id]['output_stream'].done = True
logging.info(f"OpenAI stream completed for session {session_id}.") def on_text_created(self, text) -> None:
except Exception as e: """
logging.exception(f"Exception occurred during create_and_stream for session {session_id}.") Event triggered when text is first created.
def yoink(session_id): Args:
logging.info(f"Starting stream for session {session_id}...") text (str): The initial response text.
with stream_lock: """
user_data = user_streams.get(session_id) self.output_stream.send_delta("Response Received:\n\nScrew-Bardo:\n\n")
if not user_data: logging.info("Text created event handled.")
logging.critical(f"User data not found for session id {session_id}?")
return # Session might have ended def on_text_delta(self, delta, snapshot):
output_stream: StreamOutput = user_data.get('output_stream') """
thread: threading.Thread = user_data.get('thread') Event triggered when a new text delta is available.
thread.start()
while True: Args:
delta (Any): Object that contains the new delta information.
if not output_stream or not thread: snapshot (Any): A snapshot of the current output (if applicable).
logging.error(f"No output stream/thread for session {session_id}.\nThread: {thread.name if thread else "None"}") """
break self.output_stream.send_delta(delta.value)
logging.debug(f"Text delta received: {delta.value}")
if output_stream.done and not output_stream.buffer:
break def on_tool_call_created(self, tool_call):
"""
try: Handle the case when the assistant attempts to call a tool.
if output_stream.buffer: Raises an exception as this behavior is unexpected.
delta = output_stream.buffer.pop(0)
yield bytes(delta, encoding="utf-8") Args:
else: tool_call (Any): The tool call info.
asyncio.run(sleep(0.018))
except Exception as e: Raises:
logging.exception(f"Exception occurred during streaming for session {session_id}: {e}") Exception: Always, since tool calls are not allowed.
break """
error_msg = "Assistant shouldn't be calling tools."
logging.info(f"Stream completed successfully for session {session_id}.") logging.error(error_msg)
logging.info(f"Completed Assistant Response for session {session_id}:\n{output_stream.response}") raise Exception(error_msg)
with stream_lock:
thread.join() def create_and_stream(transcript, session_id):
del user_streams[session_id] """
logging.info(f"Stream thread joined and resources cleaned up for session {session_id}.") Create a new thread that runs the OpenAI stream for a given session and transcript.
def process(url, session_id): Args:
# Should initialize the key in the dictionary transcript (str): The transcript from the YouTube video.
current_time = datetime.now(pytz.timezone('America/New_York')).strftime('%Y-%m-%d %H:%M:%S') session_id (str): The unique session identifier.
logging.info(f"New Entry at {current_time} for session {session_id}") """
logging.info(f"URL: {url}") logging.info(f"Starting OpenAI stream thread for session {session_id}.")
event_handler = EventHandler(user_streams[session_id]['output_stream'])
video_id = get_video_id(url) try:
if not video_id: with client.beta.threads.create_and_run_stream(
logging.warning(f"Could not parse video id from URL: {url}") assistant_id=asst_screw_bardo_id,
return (False, "Couldn't parse video ID from URL. (Are you sure you entered a valid YouTube.com or YouTu.be URL?)", 400) thread={
logging.info(f"Parsed Video ID: {video_id}") "messages": [{"role": "user", "content": transcript}]
},
# Get the transcript for that video ID event_handler=event_handler
transcript = get_auto_transcript(video_id) ) as stream:
if not transcript: stream.until_done()
logging.error(f"Error: could not retrieve transcript for session {session_id}. Assistant won't be called.") with stream_lock:
return (False, "Successfully parsed video ID from URL, however the ID was either invalid, the transcript was disabled by the video owner, or some other error was raised because of YouTube.", 200) user_streams[session_id]['output_stream'].done = True
user_streams[session_id] = { logging.info(f"OpenAI stream completed for session {session_id}.")
'output_stream': None, # Ensure output_stream is per user except Exception as e:
'thread': None logging.exception(f"Exception occurred during create_and_stream for session {session_id}.")
}
# Create a new StreamOutput for the session def yoink(session_id):
with stream_lock: """
user_streams[session_id]['output_stream'] = StreamOutput() Generator that yields streaming output for a session.
thread = threading.Thread(
name=f"create_stream_{session_id}", This function starts the AI response thread, then continuously yields data from the session's output buffer
target=create_and_stream, until the response is marked as done.
args=(transcript, session_id)
) Args:
user_streams[session_id]['thread'] = thread session_id (str): The unique session identifier.
logging.info(f"Stream preparation complete for session {session_id}, sending reply.")
return (True, None, None) Yields:
bytes: Chunks of the AI generated response.
def get_video_id(url): """
youtu_be = r'(?<=youtu.be/)([A-Za-z0-9_-]{11})' logging.info(f"Starting stream for session {session_id}...")
youtube_com = r'(?<=youtube\.com\/watch\?v=)([A-Za-z0-9_-]{11})' with stream_lock:
user_data = user_streams.get(session_id)
id_match = re.search(youtu_be, url) if not user_data:
if not id_match: logging.critical(f"User data not found for session id {session_id}?")
id_match = re.search(youtube_com, url) return
output_stream: StreamOutput = user_data.get('output_stream')
if not id_match: thread: threading.Thread = user_data.get('thread')
# Couldn't parse video ID from URL thread.start()
logging.warning(f"Failed to parse video ID from URL: {url}") while True:
return None if not output_stream or not thread:
logging.error(f"No output stream/thread for session {session_id}.")
return id_match.group(1) break
# Stop streaming when done and there is no pending buffered output.
def get_auto_transcript(video_id): if output_stream.done and not output_stream.buffer:
trans_api_errors = youtube_transcript_api._errors break
try: try:
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'], proxies=None, cookies=None, preserve_formatting=False) if output_stream.buffer:
except trans_api_errors.TranscriptsDisabled as e: delta = output_stream.buffer.pop(0)
logging.exception(f"Exception while fetching transcript: {e}") yield bytes(delta, encoding="utf-8")
return None else:
# A short sleep before looping again
formatter = TextFormatter() # Ensure that you create an instance of TextFormatter asyncio.run(sleep(0.018))
txt_transcript = formatter.format_transcript(transcript) except Exception as e:
logging.info("Transcript successfully retrieved and formatted.") logging.exception(f"Exception occurred during streaming for session {session_id}: {e}")
return txt_transcript break
logging.info(f"Stream completed successfully for session {session_id}.")
# Initialize output stream logging.info(f"Completed Assistant Response for session {session_id}:\n{output_stream.response}")
output_stream = StreamOutput() with stream_lock:
thread.join()
logging.info(f"Main initialized at {datetime.now(pytz.timezone('America/New_York')).strftime('%Y-%m-%d %H:%M:%S')}. Presumably application starting.") # Clean up the session data once done.
del user_streams[session_id]
logging.info(f"Stream thread joined and resources cleaned up for session {session_id}.")
def process(url, session_id):
"""
Process a YouTube URL: parse the video id, retrieve its transcript, and prepare the session for AI processing.
Args:
url (str): The YouTube URL provided by the user.
session_id (str): The unique session identifier.
Returns:
tuple: (success (bool), message (str or None), status_code (int or None))
"""
current_time = datetime.now(pytz.timezone('America/New_York')).strftime('%Y-%m-%d %H:%M:%S')
logging.info(f"New Entry at {current_time} for session {session_id}")
logging.info(f"URL: {url}")
video_id = get_video_id(url)
if not video_id:
logging.warning(f"Could not parse video id from URL: {url}")
return (False, "Couldn't parse video ID from URL. (Are you sure you entered a valid YouTube.com or YouTu.be URL?)", 400)
logging.info(f"Parsed Video ID: {video_id}")
transcript = get_auto_transcript(video_id)
if not transcript:
logging.error(f"Error: could not retrieve transcript for session {session_id}. Assistant won't be called.")
return (False, "Successfully parsed video ID from URL, however the transcript was disabled by the video owner or invalid.", 200)
# Initialize session data for streaming.
user_streams[session_id] = {
'output_stream': None,
'thread': None
}
with stream_lock:
user_streams[session_id]['output_stream'] = StreamOutput()
thread = threading.Thread(
name=f"create_stream_{session_id}",
target=create_and_stream,
args=(transcript, session_id)
)
user_streams[session_id]['thread'] = thread
logging.info(f"Stream preparation complete for session {session_id}, sending reply.")
return (True, None, None)
def get_video_id(url):
"""
Extract the YouTube video ID from a URL.
Args:
url (str): The YouTube URL.
Returns:
str or None: The video ID if found, otherwise None.
"""
youtu_be = r'(?<=youtu.be/)([A-Za-z0-9_-]{11})'
youtube_com = r'(?<=youtube\.com\/watch\?v=)([A-Za-z0-9_-]{11})'
id_match = re.search(youtu_be, url)
if not id_match:
id_match = re.search(youtube_com, url)
if not id_match:
logging.warning(f"Failed to parse video ID from URL: {url}")
return None
return id_match.group(1)
def get_auto_transcript(video_id):
"""
Retrieve and format the transcript from a YouTube video.
Args:
video_id (str): The YouTube video identifier.
Returns:
str or None: The formatted transcript if successful; otherwise None.
"""
trans_api_errors = youtube_transcript_api._errors
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'], proxies=None, cookies=None, preserve_formatting=False)
except trans_api_errors.TranscriptsDisabled as e:
logging.exception(f"Exception while fetching transcript: {e}")
return None
formatter = TextFormatter()
txt_transcript = formatter.format_transcript(transcript)
logging.info("Transcript successfully retrieved and formatted.")
return txt_transcript
# Initialize a global output_stream just for main module logging (not used for per-session streaming).
output_stream = StreamOutput()
logging.info(f"Main initialized at {datetime.now(pytz.timezone('America/New_York')).strftime('%Y-%m-%d %H:%M:%S')}. Application starting.")

2
app/start.sh Normal file
View File

@@ -0,0 +1,2 @@
#!/bin/bash
exec gunicorn -b 0.0.0.0:1986 -w 4 --thread 2 --log-level debug app:app --timeout 120 --worker-class gthread --access-logfile - --error-logfile - --capture-output

View File

@@ -7,19 +7,23 @@
<title>Screw You Bardo</title> <title>Screw You Bardo</title>
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}"> <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
<link rel="icon" type="image/x-icon" href="https://www.foreverpyrite.com/favicon.ico"> <link rel="icon" type="image/x-icon" href="https://www.foreverpyrite.com/favicon.ico">
<script src="https://unpkg.com/htmx.org@2.0.4" integrity="sha384-HGfztofotfshcF7+8n44JQL2oJmowVChPTg48S+jvZoztPfvwD79OC/LTtG6dMp+" crossorigin="anonymous"></script>
<script defer src="{{ url_for('static', filename='script.js') }}"></script> <script defer src="{{ url_for('static', filename='script.js') }}"></script>
</head> </head>
<body> <body class="font-sans flex justify-center items-center h-screen bg-[#1F1F1F] text-white">
<main class="container"> <main class="flex flex-col w-11/12 h-[90vh] bg-[#2E2E2E] rounded-lg shadow-lg overflow-hidden">
<section id="response-section"> <section id="response-section" class="flex-1 p-5 bg-[#1E1E1E] overflow-y-auto text-base leading-relaxed scroll-smooth">
<pre id="response-area">Response will appear here.</pre> <pre id="response-area" class="whitespace-pre-wrap">Response will appear here.</pre>
</section> </section>
<section class="form-section"> <section class="py-4 px-5 bg-[#3A3A3A]">
<form id="url-form"> <form id="url-form" hx-post="/process_url" hx-swap="none" class="flex gap-2">
<input type="url" id="url_box" name="url" placeholder="Paste the lecture URL here." required autofocus> <input id="url_box" type="url" name="url" placeholder="Paste the lecture URL here." required autofocus
<button type="submit" id="submit">Submit</button> class="flex-1 py-2 px-3 bg-[#4A4A4A] text-white text-base rounded-md focus:outline-none placeholder-[#B0B0B0]">
<button type="submit" id="submit" class="py-2 px-5 bg-[#5A5A5A] text-white text-base rounded-md hover:bg-[#7A7A7A] disabled:bg-[#3A3A3A] disabled:cursor-not-allowed">
Submit
</button>
</form> </form>
</section> </section>
</main> </main>

View File

@@ -1,88 +1,71 @@
document.addEventListener("DOMContentLoaded", () => { document.addEventListener("DOMContentLoaded", () => {
const responseArea = document.getElementById('response-area'); const responseArea = document.getElementById('response-area');
const responseSection = document.getElementById('response-section'); const responseSection = document.getElementById('response-section');
const submitButton = document.getElementById('submit'); const submitButton = document.getElementById('submit');
const urlForm = document.getElementById('url-form'); const urlBox = document.getElementById('url_box');
const urlBox = document.getElementById('url_box');
// Before sending HTMX request, prepare UI and handle empty input
urlForm.addEventListener('submit', function(event) { document.body.addEventListener('htmx:beforeRequest', function(evt) {
event.preventDefault(); // Prevent form from submitting the traditional way if (evt.detail.elt.id === 'url-form') {
const url = urlBox.value.trim(); const url = urlBox.value.trim();
if (!url) {
if (!url) { evt.detail.shouldCancel = true;
responseArea.innerText = 'Please enter a URL.'; responseArea.innerText = 'Please enter a URL.';
return; return;
} }
urlBox.value = '';
// Clear the input and update UI submitButton.disabled = true;
urlBox.value = ""; responseArea.innerText = 'Processing...';
submitButton.disabled = true; }
responseArea.innerText = 'Processing...'; });
// Process the URL document.body.addEventListener('htmx:afterRequest', function(evt) {
fetch('/process_url', { if (evt.detail.elt.id === 'url-form') {
method: 'POST', const text = evt.detail.xhr.responseText.trim();
headers: { if (text === "Processing started. Check /stream_output for updates.") {
'Content-Type': 'application/x-www-form-urlencoded', streamOutput(responseArea, responseSection, submitButton);
}, } else {
body: new URLSearchParams({ url: url }) responseArea.innerText = text;
}) submitButton.disabled = false;
.then(response => { }
if (!response.ok) { }
throw new Error('Network response was not ok'); });
}
return response.text(); function streamOutput(responseArea, responseSection, submitButton) {
}) // Fetch the streaming output
.then(text => { fetch('/stream_output')
if (text === "Processing started. Check /stream_output for updates.") { .then(response => {
streamOutput(responseArea); if (!response.ok) {
} else { throw new Error('Network response was not ok');
responseArea.innerText = text; }
submitButton.disabled = false; const reader = response.body.getReader();
} const decoder = new TextDecoder("utf-8");
})
.catch(error => { responseArea.innerHTML = "";
console.error('Error processing URL:', error);
responseArea.innerText = 'Error processing URL: ' + error.message; function readStream() {
submitButton.disabled = false; reader.read().then(({ done, value }) => {
}); if (done) {
}); submitButton.disabled = false;
return;
function streamOutput(responseArea) { }
// Fetch the streaming output const chunk = decoder.decode(value, { stream: true });
fetch('/stream_output') responseArea.innerHTML += chunk;
.then(response => { responseSection.scrollTop = responseSection.scrollHeight;
if (!response.ok) { readStream();
throw new Error('Network response was not ok'); }).catch(error => {
} console.error('Error reading stream:', error);
const reader = response.body.getReader(); responseArea.innerText = 'Error reading stream: ' + error.message;
const decoder = new TextDecoder("utf-8"); submitButton.disabled = false;
});
responseArea.innerHTML = ""; }
function readStream() { readStream();
reader.read().then(({ done, value }) => { })
if (done) { .catch(error => {
submitButton.disabled = false; console.error('Error fetching stream:', error);
return; responseArea.innerText = 'Error fetching stream: ' + error.message;
} submitButton.disabled = false;
const chunk = decoder.decode(value, { stream: true }); });
responseArea.innerHTML += chunk; }
responseSection.scrollTop = responseSection.scrollHeight });
readStream();
}).catch(error => {
console.error('Error reading stream:', error);
responseArea.innerText = 'Error reading stream: ' + error.message;
submitButton.disabled = false;
});
}
readStream();
})
.catch(error => {
console.error('Error fetching stream:', error);
responseArea.innerText = 'Error fetching stream: ' + error.message;
submitButton.disabled = false;
});
}
});

View File

@@ -1,109 +1,3 @@
@font-face { @font-face{font-display:swap;font-family:NimbusSansD;font-style:normal;font-weight:400;src:url(/static/font/nimbus-sans-d-ot-light.woff2) format("woff2"),url(/static/font/nimbus-sans-d-ot-light.woff) format("woff")}*,:after,:before{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }::backdrop{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x: ;--tw-pan-y: ;--tw-pinch-zoom: ;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position: ;--tw-gradient-via-position: ;--tw-gradient-to-position: ;--tw-ordinal: ;--tw-slashed-zero: ;--tw-numeric-figure: ;--tw-numeric-spacing: ;--tw-numeric-fraction: ;--tw-ring-inset: ;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgba(59,130,246,.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur: ;--tw-brightness: ;--tw-contrast: ;--tw-grayscale: ;--tw-hue-rotate: ;--tw-invert: ;--tw-saturate: ;--tw-sepia: ;--tw-drop-shadow: ;--tw-backdrop-blur: ;--tw-backdrop-brightness: ;--tw-backdrop-contrast: ;--tw-backdrop-grayscale: ;--tw-backdrop-hue-rotate: ;--tw-backdrop-invert: ;--tw-backdrop-opacity: ;--tw-backdrop-saturate: ;--tw-backdrop-sepia: ;--tw-contain-size: ;--tw-contain-layout: ;--tw-contain-paint: ;--tw-contain-style: }
font-family: 'NimbusSansD';
src: url('font-files/nimbus-sans-d-ot-light.woff2') format('woff2'),
url('font-files/nimbus-sans-d-ot-light.woff') format('woff');
font-weight: normal;
font-style: normal;
}
* { /*! tailwindcss v3.4.15 | MIT License | https://tailwindcss.com*/*,:after,:before{border:0 solid #e5e7eb;box-sizing:border-box}:after,:before{--tw-content:""}:host,html{line-height:1.5;-webkit-text-size-adjust:100%;font-family:NimbusSansD,sans-serif;font-feature-settings:normal;font-variation-settings:normal;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-tap-highlight-color:transparent}body{line-height:inherit;margin:0}hr{border-top-width:1px;color:inherit;height:0}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,pre,samp{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;font-feature-settings:normal;font-size:1em;font-variation-settings:normal}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-.25em}sup{top:-.5em}table{border-collapse:collapse;border-color:inherit;text-indent:0}button,input,optgroup,select,textarea{color:inherit;font-family:inherit;font-feature-settings:inherit;font-size:100%;font-variation-settings:inherit;font-weight:inherit;letter-spacing:inherit;line-height:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type=button]),input:where([type=reset]),input:where([type=submit]){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type=search]{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dd,dl,figure,h1,h2,h3,h4,h5,h6,hr,p,pre{margin:0}fieldset{margin:0}fieldset,legend{padding:0}menu,ol,ul{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder,textarea::-moz-placeholder{color:#9ca3af;opacity:1}input::placeholder,textarea::placeholder{color:#9ca3af;opacity:1}[role=button],button{cursor:pointer}:disabled{cursor:default}audio,canvas,embed,iframe,img,object,svg,video{display:block;vertical-align:middle}img,video{height:auto;max-width:100%}[hidden]:where(:not([hidden=until-found])){display:none}.static{position:static}.flex{display:flex}.h-\[90vh\]{height:90vh}.h-screen{height:100vh}.w-11\/12{width:91.666667%}.flex-1{flex:1 1 0%}.flex-col{flex-direction:column}.items-center{align-items:center}.justify-center{justify-content:center}.gap-2{gap:.5rem}.overflow-hidden{overflow:hidden}.overflow-y-auto{overflow-y:auto}.scroll-smooth{scroll-behavior:smooth}.whitespace-pre-wrap{white-space:pre-wrap}.rounded-lg{border-radius:.5rem}.rounded-md{border-radius:.375rem}.bg-\[\#1E1E1E\]{--tw-bg-opacity:1;background-color:rgb(30 30 30/var(--tw-bg-opacity,1))}.bg-\[\#1F1F1F\]{--tw-bg-opacity:1;background-color:rgb(31 31 31/var(--tw-bg-opacity,1))}.bg-\[\#2E2E2E\]{--tw-bg-opacity:1;background-color:rgb(46 46 46/var(--tw-bg-opacity,1))}.bg-\[\#3A3A3A\]{--tw-bg-opacity:1;background-color:rgb(58 58 58/var(--tw-bg-opacity,1))}.bg-\[\#4A4A4A\]{--tw-bg-opacity:1;background-color:rgb(74 74 74/var(--tw-bg-opacity,1))}.bg-\[\#5A5A5A\]{--tw-bg-opacity:1;background-color:rgb(90 90 90/var(--tw-bg-opacity,1))}.p-5{padding:1.25rem}.px-3{padding-left:.75rem;padding-right:.75rem}.px-5{padding-left:1.25rem;padding-right:1.25rem}.py-2{padding-bottom:.5rem;padding-top:.5rem}.py-4{padding-bottom:1rem;padding-top:1rem}.font-sans{font-family:NimbusSansD,sans-serif}.text-base{font-size:1rem;line-height:1.5rem}.leading-relaxed{line-height:1.625}.text-white{--tw-text-opacity:1;color:rgb(255 255 255/var(--tw-text-opacity,1))}.placeholder-\[\#B0B0B0\]::-moz-placeholder{--tw-placeholder-opacity:1;color:rgb(176 176 176/var(--tw-placeholder-opacity,1))}.placeholder-\[\#B0B0B0\]::placeholder{--tw-placeholder-opacity:1;color:rgb(176 176 176/var(--tw-placeholder-opacity,1))}.shadow-lg{--tw-shadow:0 10px 15px -3px rgba(0,0,0,.1),0 4px 6px -4px rgba(0,0,0,.1);--tw-shadow-colored:0 10px 15px -3px var(--tw-shadow-color),0 4px 6px -4px var(--tw-shadow-color);box-shadow:var(--tw-ring-offset-shadow,0 0 #0000),var(--tw-ring-shadow,0 0 #0000),var(--tw-shadow)}.hover\:bg-\[\#7A7A7A\]:hover{--tw-bg-opacity:1;background-color:rgb(122 122 122/var(--tw-bg-opacity,1))}.focus\:outline-none:focus{outline:2px solid transparent;outline-offset:2px}.disabled\:cursor-not-allowed:disabled{cursor:not-allowed}.disabled\:bg-\[\#3A3A3A\]:disabled{--tw-bg-opacity:1;background-color:rgb(58 58 58/var(--tw-bg-opacity,1))}
box-sizing: border-box;
margin: 0;
padding: 0;
font-family: 'NimbusSansD', sans-serif;
color: #FFFFFF;
}
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
background-color: #1F1F1F;
}
.container {
display: flex;
flex-direction: column;
width: 85vw;
height: 90vh;
background-color: #2E2E2E;
border-radius: 10px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
overflow: hidden;
}
#response-section {
flex: 1;
padding: 20px;
background-color: #1E1E1E;
overflow-y: auto;
font-size: 1rem;
line-height: 1.5;
scroll-behavior: smooth;
}
.form-section {
padding: 15px 20px;
background-color: #3A3A3A;
}
#response-area {
white-space: pre-wrap;
}
#url-form {
display: flex;
gap: 10px;
}
#url_box {
flex: 1;
padding: 10px 15px;
border: none;
border-radius: 5px;
background-color: #4A4A4A;
color: #FFFFFF;
font-size: 1rem;
outline: none;
}
#url_box::placeholder {
color: #B0B0B0;
}
#submit {
padding: 10px 20px;
border: none;
border-radius: 5px;
background-color: #5A5A5A;
color: #FFFFFF;
font-size: 1rem;
cursor: pointer;
transition: background-color 0.3s ease;
}
#submit:hover {
background-color: #7A7A7A;
}
#submit:disabled {
background-color: #3A3A3A;
cursor: not-allowed;
}
/* Responsive Adjustments */
@media (max-width: 600px) {
.container {
height: 95vh;
}
#url_box {
font-size: 0.9rem;
}
#submit {
font-size: 0.9rem;
padding: 10px;
}
}

View File

@@ -3,10 +3,17 @@ services:
build: . build: .
container_name: screw-bardo container_name: screw-bardo
ports: ports:
<<<<<<< HEAD
- "$PORT:1986" - "$PORT:1986"
env_file: env_file:
- .env - .env
volumes: volumes:
- ./app/logs:/app/app/logs/:rw - ./app/logs:/app/app/logs/:rw
restart: unless-stopped restart: unless-stopped
=======
- "1986:1986"
volumes:
- ./app/logs:/app/logs
restart: unless-stopped
>>>>>>> b5a2b4e6d1b958dbb3ad702026889172514c1fd6

3
src/build-css.sh Normal file
View File

@@ -0,0 +1,3 @@
#!/bin/bash
cd "$(dirname "$0")"
./tailwindcss -i input.css -o ../app/website/static/style.css --minify

14
src/input.css Normal file
View File

@@ -0,0 +1,14 @@
@font-face {
font-family: 'NimbusSansD';
src: url('/static/font/nimbus-sans-d-ot-light.woff2') format('woff2'),
url('/static/font/nimbus-sans-d-ot-light.woff') format('woff');
font-weight: normal;
font-style: normal;
font-display: swap;
}
@tailwind base;
@tailwind components;
@tailwind utilities;

13
src/tailwind.config.js Normal file
View File

@@ -0,0 +1,13 @@
/** @type {import('tailwindcss').Config} */
module.exports = {
content: ["../app/website/**/*.html"],
theme: {
extend: {
fontFamily: {
sans: ['NimbusSansD', 'sans-serif'],
},
},
},
plugins: [],
}

BIN
src/tailwindcss Normal file

Binary file not shown.