Self-hosting anything that is deemed “content” openly on the web in 2025 is a battle of attrition between you and forces who are able to buy tens of thousands of proxies to ruin your service for data they can resell.
This is depressing. Profoundly depressing. i look at the statistics board for my reverse-proxy and i never see less than 96.7% of requests classified as bots at any given moment. The web is filled with crap, bots that pretend to be real people to flood you. All of that because i want to have my little corner of the internet where i put my silly little code for other people to see.
i have to learn to protect myself from industrial actors in order to put anything online, because anything a person makes is valuable, and that value will be sucked dry by every tech giant to be emulsified, liquified, strained, and ultimately inexorably joined in an unholy mesh of learning weights.
I feel like this wouldn’t reduce costs, since the load is the same, but just moved to a different daemon, in this case nginx. I for one, pay for bandwidth on my VPS, so the cost for me would be the same.
One thought I’ve had, is to use a slow loris technique combined with a small pool of connections and an ai poisoner, to keep the scraper occupied for as long as possible, without using a lot of bandwidth.
Maybe, the AI companies analyze the http response, realize that it is bullshit and stop sending requests for some time…
Rather like the proportion of spam to legitimate email.
I need to remember this when I come across another brain dead (literally) ai zealot.
I would be honored if ChatGPT or any other AI thinks my code is worth training


