Warehouse is a distributed object storage system (an alternative to S3) that is fully self hostable
- HTML 67.6%
- JavaScript 15.8%
- Go 12.8%
- templ 3.5%
- CSS 0.1%
- Other 0.1%
| build | ||
| cmd | ||
| internal | ||
| pkg | ||
| proto | ||
| sql | ||
| .env.example | ||
| .gitignore | ||
| .goreleaser.yaml | ||
| .harper-dictionary.txt | ||
| go.mod | ||
| go.sum | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| sqlc.yaml | ||
Warehouse is a distributed object storage system (an alternative to S3) that is fully self hostable.
Features:
- Scalable: Deploy as many volume servers as you want to expand the storage pool
- Optimized for small files - based on the haystack paper and inspired by SeaweedFS
- Web UI - Easily manage volume servers, buckets, objects, and API keys through the Web UI
- Fine grained API keys - create API keys with access to only what you allow
Docs:
Feature Goals
- Basic Bucket CRUD
- Basic Object CRUD
- Web UI - In progress
- Authentication
- Golang client
- TypeScript client
- Graph based upload processing (replacement for a message broker)
- Cache server (would reconstruct chunks as well)
- FFmpeg integration
- TensorFlow integration
- In memory database instead of SQLite
Unfortunately I've ran out of time to implement all these features!
I'll update during Stardance by Hackclub
Optimizations
- Optimized for small files
- Usually to read a file its metadata has to read from disk first (unless its in cache) and then do another read to actually read the file. This adds overhead.
- Each file usually has over 128 bytes of metadata overhead (256+ bytes in ext4!)
- Warehouse solves this problem by
- Having a very small metadata overhead (17 bytes)
- Storing all metadata in memory
- This means that each read from a volume server is only one disk read, lowering latency
- Direct connections
- Some file storage systems may proxy data to the underlying storage
- This increases bandwith
- Warehouse uses direct connections (with JWTs) so that clients upload and read directly from the volume servers the file (or chunk) is located at
- Chunking
- Warehouse supports large files using chunks
- Each file is split into 80 MiB chunks, by the client. This means less work for the server
- The client uploads chunks directly to multiple volumes, spreading out work and increasing upload speed by using concurrency.