Caption-gen is an image caption generation web application using Hugging Face transformers. It provides a Flask web interface for uploading images and generating descriptive captions using the Salesforce BLIP model.
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
pip install flask pillow torch gunicornrn# Development mode (Flask development server on port 5000)
python app.py
# Standalone caption generation (processes selfie.jpg)
python main.py
# Production mode (SystemD service)
sudo systemctl start caption-gen
sudo systemctl status caption-gen
sudo systemctl stop caption-gengen# Run specific model tests
python main2.py # DeepSeek model test
python main3.py # Llama model test-
Web Application (
app.py): Flask server with two routes:/- Serves the HTML interface/generate_caption- REST API endpoint accepting base64-encoded images
-
Frontend (
index.html): Simple JavaScript-based interface for image upload and caption display -
Model Pipeline: Uses
Salesforce/blip-image-captioning-largewith configured generation parameters (beam search, repetition penalty, etc.)
The application is configured as a SystemD service (caption-gen.service) that:
- Runs Gunicorn with Unix socket at
/home/pabrubio/.caption-gen.pabrubio.hackclub.app.webserver.sock - Uses 1 worker with 600-second timeout
- Automatically restarts on failure
- Waits for network connectivity before starting
main.py: Enhanced caption generation using BLIP + OPT-125M with LangChainmain2.py: DeepSeek R1 Distill model testingmain3.py: Meta Llama 3.2 1B Instruct model testing
- Images are sent as base64-encoded data in JSON POST requests
- The BLIP model uses specific generation parameters for quality optimization
- Error verbosity is suppressed using
set_verbosity_error() - The service runs from
/home/pabrubio/pub/caption-genin production