Web Scraper Suite

A tool designed to automatically extract data from multiple websites and manage that data. This is the first time I'm designing a web scraper.

It was a frightful experience at the end, I started feeling as if I would lose the chance to get the 650$ grant from Siege as the deadline was coming closer.

Features

Extract Text Content from Websites
Capture full-page screenshots using Zyte's screenshot API.
Take screenshots of a web page and other pages linked in that particular page.
CORS is enabled for better security

Tech Stack

Backend: Python, Flask, Scrapy, Zyte API
Frontend: React.JS

API Endpoints

Screenshot Endpoint

URL: /api/screenshot Method: POST Body: : { "url": "", "n": "no of other urls to screenshot, optional" }nal" } Response: JSON Array of objects which have the base64 encoded imagee

Text Extraction

URL: /api/text Method: POST Body: : { "url": "" }gt;" } Response: JSON Array of objects. Objects have fields url and content where content contains the text extract of url.

License

MIT License

Contributing

Feel free to open issues or submit pull request to improve features or fix bugs.

Name	Name	Last commit message	Last commit date
Latest commit PuneetGopinath fix: add scrapy-zyte-api to requirements.txt Oct 8, 2025 b387558 · · Oct 8, 2025 History 80 Commits
src	src	feat: add more data for debugging	Oct 8, 2025
.gitignore	.gitignore	chore: git ignore client-dist/	Oct 6, 2025
LICENSE	LICENSE	Initial commit	Sep 22, 2025
README.md	README.md	docs: update README	Oct 6, 2025
package-lock.json	package-lock.json	Revert "deps: move vite to deps since vite is needed to build client-…	Oct 7, 2025
package.json	package.json	fix: don't specify host and port in dev script as it prevents the val…	Oct 8, 2025
requirements.txt	requirements.txt	fix: add scrapy-zyte-api to requirements.txt	Oct 8, 2025
scrapinghub.yml	scrapinghub.yml	feat: add scrapinghub.yml	Oct 7, 2025
scrapy.cfg	scrapy.cfg	feat: add scrapy cfg file	Oct 7, 2025
vite.config.js	vite.config.js	fix: update client dist output to src/client-dist path	Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper Suite

Features

Tech Stack

API Endpoints

Screenshot Endpoint

Text Extraction

License

Contributing

About

Releases

Packages

Languages

License

PuneetGopinath/WebScraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper Suite

Features

Tech Stack

API Endpoints

Screenshot Endpoint

Text Extraction

License

Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages

Languages