btdigg API anyone?
#1
Guys, what do you think about a DHT crawler, for more accurate seed/peer statistics?
Like, say, using the BtDigg API?
It's relatively fast for update and works quite accurately.
For example
TPB page
BtDigg one
neat, huh?
Reply
#2
I agree, it is neat, and I suggested it to Winston a long time ago.

But it isn't going to happen and there are good reasons for that.

Firstly, just to clarify, you're wrong about it providing "more accurate seed/peer statistics". DHT doesn't provide any seed statistics and it doesn't provide statistics on peers not using DHT. You could say it provides "statistics more reliably" but that isn't the same thing (and most people wouldn't understand the difference).

In any case, TPB couldn't use the BTDigg API. They are the only people scraping DHT (that I'm aware of; in any case there are far fewer DHT scrapers than trackers) so relying on them would create a massive and quite possibly self-fulfilling vulnerability. Were we to adopt them, they would immediately become a high priority target of the MAFIAA.

So we would need to scrape DHT ourselves but DHT scraping is significantly more resource intensive than tracker scraping. That is why it takes DHT longer than trackers to find peers when you start a torrent in your client. [Multiply that by several million times for the number of torrents we hold, and repeat it regularly throughout the day every single day of the year. It mounts up.]

tl; dr = it is neat but it isn't as accurate as you think and there are more problems than you think.
Reply
#3
(Mar 30, 2016, 17:08 pm)pid=\138216 Wrote:Firstly, just to clarify, you're wrong about it providing "more accurate seed/peer statistics". DHT doesn't provide any seed statistics and it doesn't provide statistics on peers not using DHT. You could say it provides "statistics more reliably" but that isn't the same thing (and most people wouldn't understand the difference).

Well it's of course not ideal. It might just be better than what already is here.

(Mar 30, 2016, 17:08 pm)pid=\138216 Wrote:Were we to adopt them, they would immediately become a high priority target of the MAFIAA.

You don't have to advertise it on every corner. MAFIAA is rather dull and narrow seeing. It does not attack the ad providers, and other things you've happened to touch.

(Mar 30, 2016, 17:08 pm)pid=\138216 Wrote:So we would need to scrape DHT ourselves but DHT scraping is significantly more resource intensive than tracker scraping. That is why it takes DHT longer than trackers to find peers when you start a torrent in your client. [Multiply that by several million times for the number of torrents we hold, and repeat it regularly throughout the day every single day of the year. It mounts up.]

Again, if you are an idealist, you may. For just getting an esteem you can only launch it, say, once in a month. Or on increeesing time intervals. Often on newer torrents, less and less often on older.

(Mar 30, 2016, 17:08 pm)pid=\138216 Wrote:tl; dr = it is neat

At least we've agreed on that.
Reply
#4
baltic8 Wrote:you can only launch it, say, once in a month. Or on increeesing time intervals. Often on newer torrents, less and less often on older.

Once in a month? This will not even be accurate. It can't be called accurate. Not even reliable.
Often on newer torrents, less and less often on older? This will create a big balance problem as during time, torrents like The Hobbit which will still be downloaded/seeded will be left out. And so, we return to the phrase: It can't be called accurate.

Your ideas would lead to a worse scraping system than the current one.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  API for ThePirateBay brevitan 5 24,854 May 03, 2022, 11:16 am
Last Post: Authority924



Users browsing this thread: 1 Guest(s)