Aug 29, 2017, 17:34 pm
(This post was last modified: Aug 29, 2017, 17:46 pm by gesserit. Edited 1 time in total.)
Written by me in July 2014; originally posted as a KAT tutorial.
Thoroughly updated and TPBified in August 2017.
This tutorial is mainly aimed at those who have already tried their hand at web-seeding in the past, only to then become frustrated with the litany of limitations imposed by the implementation of this technique in the current generation of BitTorrent clients.
Here, we'll be making use of the Apache Rewrite Engine to insert an abstraction layer between the torrent client and the web server in such a way as to obviate the lion's share of said limitations. I imagine alternative server solutions are bound to provide for a roughly equivalent functionality, but can't tell you anything beyond that.
Note that I'm not aiming for universal accessibility in this case; the write-up is lengthy enough as it is, I reckon. Readers are instead expected to already possess a reasonable degree of familiarity and experience with (conventional) web-seeding, as well as with basic Regular-Expression-style pattern-matching. If you're lacking the former, I'd recommend acquiring it by reading the corresponding section of your torrent client manual, and then trying it out on a few sample torrents. If you're lacking the latter, you can proceed with the tutorial regardless, and read up on the more advanced patterns as you encounter them in its course (suitable links will be provided), though again first-hand experience is definitely preferrable. And there are plenty of sites to help with that; simply google something like "regular expression tester" and experiment until the syntax feels comfortable.
0) Preparation
To try out this technique for yourself, you'll need the following:
- Access to a web server running Apache with the Rewrite Engine enabled, or equivalent. If you don't have this, take a look here.
- Optionally, a file-renaming utility. If you don't have this, take a look here.
- A text editor to create config files for the web server. Like Notepad. If you don't have this... you're kidding, right?!
- An FTP client to upload both data and config files to the web server. If you don't have this, take a look here.
- Last but not least, obviously a BitTorrent client which supports web-seeding in the first place, such as uTorrent 3.x (2.x builds can create compatible .torrent files, but are a bit wobbly when it comes to actually accessing the web seeds subsequently), or the corresponding mainline releases (meaning 7.5 and above, I believe), or pretty much any other client which is similarly modern and has a similar range of features. If you don't have this, take a look here.
1) Demonstration
This is exactly the sort of thing best learned hands-on, so I'm going to start out with a working example and leave you to understand why and how it works over the course of the later sections, rather than the other way 'round. So, have a torrent:
LordOfTheRings.torrent (Size: 578 bytes / Downloads: 1,459)
On the face of it, it doesn't look very interesting - a folder containing three "Lord of the Rings"-themed paintings. What's curious about it is that it somehow works just fine, in spite of the facts that nobody is seeding it (unless someone else trying it out left it running afterwards, that is...) and that the address the web seed parameter points to (http://both.000webhostapp.com/redirector/first/, displayable via Torrent Properties->Advanced in uTorrent) apparently contains, well, nothing whatsoever, when you open it in your web browser.
Must be magic, right? No? Well then, the way this works is that, firstly, I gleaned those images directly from various web sites, namely:
- "The Fellowship of the Ring" @ pikabu.ru
- "The Dark Tower" @ flickr.com
- "The Return of the King" @ theonering.net
--RewriteRule ReturnOfTheKing http://img-fan.theonering.net/~rolozo/im...return.jpg
We'll be discussing each token in depth later on, but the line's purpose should be readily apparent straight away - essentially, "if someone asks for something named like this, give them whatever is stored at that location". And that's also essentially the core concept of the entire approach; the rest is just background, details, and additional examples.
2) Abstraction
In the context of information theory, "abstraction" refers to the redirection of any type of exchange between two entities through an intermediary.
As a real-life analogy, consider your hand and some sort of food as the two entities, and an eating utensil as the "abstraction layer". Without the utensil, you need to touch the food directly, which works well in some cases (salad leafs) and not quite so well in other cases (hot soup). With the utensil, your hand connects to one end (the handle), and the food connects to the other end (the tip). This provides immediate advantages like thermal insulation, and further benefits which become apparent once one takes a broader view. A fork and a spoon have different types of tips but the same type of handle, so if you know how to use the one to eat one kind of food (salad), you automatically know how to use the other to eat another kind of food (soup), whereas if you had no utensils you'd have to learn two entirely different techniques (okay, this strains the illustration a bit, but work with me here). Vice versa, whereas a dwarf might have trouble eating something huge (whale roasted on a spit) and a giant trouble eating something tiny (popcorn) without utensils, this too can be overcome by fitting the handle to the hand and the tip to the food. In general terms, the advantage of abstraction is a significant improvement where qualities like extensibility, flexibility, and robustness are concerned. Typically, the price is an increased resource overhead (and therefore a decreased performance) in those cases in which a direct linkage would have worked just fine.
In the case at hand, the abstraction layer sits between the torrent client and the remotely hosted data, which turns the web-seeding model from
--torrent client----<-->----data host
into
--torrent client----<- redirector ->----data host(s)
which opens up a whole slew of new possibilities:
- As indicated by the plural suffix, the data may be distributed across multiple hosts: The redirector can divert differing requests to different targets - see example above.
- Moreover, the hosted data does not have to be structured and named as it is in the torrent, which means that torrents may be seeded directly from larger web repositories without the need to gather the files in a single location, and that the remote data may be anonymized, and so on: The redirector can translate from one structural and/or naming scheme to another - see examples below.
- As a bonus, you can do whatever you like with the files later on, like rename them, reorganize them, or reupload them to a different host (just as long as you don't remove them completely), without breaking the web-seeding capability of the torrent you posted: The redirector can be updated, independently and repeatedly.
3) Concretion
Moving from the abstract to the concrete side of things, setting up such a redirector is surprisingly straightforward. All that is required is to upload a suitable config file named .htaccess to the location you intend to use as your web seed. Apache takes care of the rest - which, broadly speaking, means the following, with reference to the initial example.
When your torrent client attempts to access the web seed for one of the constituent files, the full request path is constructed by prepending the supplied web_seed_path to the (URL-encoded) relative_path of the file in question, which is of course part of the .torrent file's regular metadata, like so:
--http://both.000webhostapp.com/redirector/first/LordOfTheRings/ReturnOfTheKing.jpg
Ordinarily, for the web seed to function, the file would have to be located at that exact address. However, the server takes various preliminary steps before attempting to access the resource located at the end of the requested path, among which is the processing of all extant config files along the path, meaning each of
--http://both.000webhostapp.com/.htaccess
--http://both.000webhostapp.com/redirector/.htaccess
--http://both.000webhostapp.com/redirector/first/.htaccess
--http://both.000webhostapp.com/redirector/first/LordOfTheRings/.htaccess
in their turn. If those are absent, or present but blank, the path is being followed all the way (which, in this case, would result in a final redirection to an error page). If at least one of them is present and pertinent, though, such a request may be intercepted and subsequently diverted elsewhere. In short, we have an abstraction layer.
4) Specification
The specific mechanism supporting this is called "path rewriting", and is documented in full at mod_rewrite @ apache.org. For our purposes, the relevant instructions are these three (keywords in bold, placeholders in bold italics):
--RewriteEngine On
This one is typically the first line in any active .htaccess file, and is about as self-explanatory as it gets, yes? It's essential to be aware that the default setting is Off, not On, so if you don't include this line, nothing else you do include will have any effect at all. Which is why forgetting about this is liable to result in a lot of frustration when moving on to the testing stage.
--RewriteBase base_path
... is actually more easily explained in the context of the next one, so...
--RewriteRule pattern target_path
"The real rewriting workhorse", as the official documentation puts it. The pattern designates which addresses to replace. The target_path defines what to replace them with, if and only if they do match the pattern. The particulars of the substitution process are as follows.
The string against which the supplied pattern is matched is not the full path, as one might expect, but only that portion which has yet to be walked. So, in our example
--http://both.000webhostapp.com/redirector/first/LordOfTheRings/ReturnOfTheKing.jpg
that'd be:
--redirector/first/LordOfTheRings/ReturnOfTheKing.jpg in http://both.000webhostapp.com/.htaccess
--first/LordOfTheRings/ReturnOfTheKing.jpg in http://both.000webhostapp.com/redirector/.htaccess
--LordOfTheRings/ReturnOfTheKing.jpg in http://both.000webhostapp.com/redirector/first/.htaccess
--ReturnOfTheKing.jpg in http://both.000webhostapp.com/redirector/first/LordOfTheRings/.htaccess
For there to be a match, the full pattern must appear within the string; if there is a match somewhere, it does not matter how much of the string is left over (overlap in orange):
--ReturnOfTheKing matches ReturnOfTheKing
--ReturnOfTheKing matches TheReturnOfTheKing
--ReturnOfTheKing matches redirector/first/LordOfTheRings/ReturnOfTheKing.jpg
Conversely, there is no such thing as a partial match here; if any part of the pattern does not match, there is no match at all:
--ReturnOfTheKings does not match ReturnOfTheKing
--ReturnOfTheKings does not match ReturnsOfTheKing
--ReturnOfTheKings does not match ReturnOfTwoKings
If there is a match, then the old address is discarded wholesale, and replaced with a new one concocted from up to three ingredients, which happens to be rather similar to the way the torrent client constructed the old one in the first place. Namely, we have a root_path, which points to the root directory of the directory tree the .htaccess file is located in, a base_path defined via a RewriteBase instruction, and the RewriteRule's target_path. The exact recipe employed depends on the target's head, thusly:
--RewriteRule pattern http://...------ if pattern matches ->----http://...
--RewriteRule pattern /...------ if pattern matches ->----root_path /...
--RewriteRule pattern ...------ if pattern matches ->----root_path base_path ...
(Note that a valid base_path both starts and ends with a slash, so in each case, the result will be a well-formed address.) For instance, assuming ReturnOfTheKing matches the old address, that means the following:
--RewriteRule ReturnOfTheKing http://img-fan.theonering.net/~rolozo/im...return.jpg
----redirects from anywhere to http://img-fan.theonering.net/~rolozo/im...return.jpg
--RewriteRule ReturnOfTheKing /images/kingreturn.jpg
----redirects from anywhere within http://both.000webhostapp.com/* to http://both.000webhostapp.com/images/kingreturn.jpg
--RewriteBase /images/
--RewriteRule ReturnOfTheKing kingreturn.jpg
----redirects from anywhere within http://both.000webhostapp.com/* to http://both.000webhostapp.com/images/kingreturn.jpg
If you find some of these conventions a bit arbitrary, I don't disagree. But it's important to be fully aware of them, because writing instructions based in ignorance, or, worse, a mistaken understanding, of any one of them, pretty much guarantees ending up with something that doesn't work as expected, or at all.
5) Abbreviation
Supplying a base_path is principally useful to abbreviate the rules by doing away with as much repetition as possible. If, say, there are a bunch of separate redirects, all but a few of which point to the same local directory, an economical solution might look something like this:
--RewriteEngine On
--RewriteBase /images/
--RewriteRule pattern /documents/animals.txt
--RewriteRule pattern anteater.png
--RewriteRule pattern baboon.jpg
----...
--RewriteRule pattern yak.jpg
--RewriteRule pattern zebra.jpg
--RewriteRule pattern http://en.wikipedia.org/wiki/Zoo
Another, and far more powerful, means of condensing many related redirects into a single line is the use of Regular Expression syntax in both pattern and target_path. If you feel you need a primer or refresher regarding those, try Regular Expressions @ perl.org (the linked-to "Version 8" section only; the bulk of the page deals with features beyond the scope of our Rewrite Engine) and Character Classes @ perl.org for a char code cheat sheet. Multiple (subpatterns) can be captured during the match and then recalled by $position in the target_path.
We'll be making full use of that mechanism in our second example. Let's assume we have the same set of images on a local computer and on a web server, but organized somewhat differently, and want to create a web-seed-supported torrent from the local copies. Abstraction gives us that capability... and Regular Expressions make things a whole lot more convenient.
--Local file structure
----Animals\Anteater (drawing).png
----Animals\Baboon (photo).jpg
----Animals\Yak (photo).jpg
----Animals\Zebra (photo).jpg
--Server file structure
----images/A/Anteater.png
----images/B/Baboon.jpg
----images/Y/Yak.jpg
----images/Z/Zebra.jpg
Translating from one structure into the other can be accomplished via a single rule such as this:
--RewriteRule Animals/(.)(.*)\s\((.+)\)\.(.+) /images/$1/$1$2.$4
So, the pattern is Animals, followed by a slash, followed by (1) one character (initial letter of animal name), followed by (2) a bunch of characters (rest of animal name), followed by a space, followed by an opening parenthesis, followed by (3) a bunch of characters (image type), followed by a closing parenthesis, followed by a dot, followed by (4) a bunch of characters (file type). The target_path starts with a slash, so the redirection uses the same root_path, and points to the file with filename $1$2 (all of animal name) and filetype $4 in the subdirectory $1 (initial letter of animal name) of the directory images.
Which leaves the question where to put the config file containing that rule, once we've created it. The answer is that it hardly matters. As long as you don't venture outside the same root_path entirely, you can put it anywhere you like, just as long as you then use that address for the torrent's web seed parameter.
I've added the above rule to an .htaccess file located at http://both.000webhostapp.com/redirector/second/. As you can see, this directory too is shown as being empty. With the rule in effect, however, these non-existent paths work just fine despite that:
- http://both.000webhostapp.com/redirector/second/Animals/Anteater (drawing).png
- http://both.000webhostapp.com/redirector/second/Animals/Baboon (photo).jpg
- http://both.000webhostapp.com/redirector/second/Animals/Yak (photo).jpg
- http://both.000webhostapp.com/redirector/second/Animals/Zebra (photo).jpg
6) Implementation
As a third, final, and far more practical example, consider this next scenario. Say we have a local copy of, and want to share, "Stargate.SG-1.S01.DVDRip.XviD-LOCK", which contains 21 files and weighs in at about 7.5 GBs in total:
--Local file structure
----Stargate.SG-1.S01.DVDRip.XviD-LOCK\Stargate.SG-1.S01E01E02.DVDRip.XviD-LOCK.avi
----Stargate.SG-1.S01.DVDRip.XviD-LOCK\Stargate.SG-1.S01E03.DVDRip.XviD-LOCK.avi
----Stargate.SG-1.S01.DVDRip.XviD-LOCK\Stargate.SG-1.S01E04.DVDRip.XviD-LOCK.avi
------...
----Stargate.SG-1.S01.DVDRip.XviD-LOCK\Stargate.SG-1.S01E21.DVDRip.XviD-LOCK.avi
----Stargate.SG-1.S01.DVDRip.XviD-LOCK\Stargate.SG-1.S01E22.DVDRip.XviD-LOCK.avi
We'd like to employ web-seeding, and, hypothetically, know that there is a free hosting service called "threehost.com" where you can sign up for as many accounts as you like, each with a 3 GB disk space allowance. Here's how we might go about constructing an anonymized distributed web seed using four such accounts, which we'll call "zero", "one", "two", and "three".
We could start by making a temporary copy of the "Stargate.SG-1.S01.DVDRip.XviD-LOCK" folder and its contents, which we anonymize by renaming the files to "third.01.dat", "third.03.dat", "third.04.dat", and so on, thru "third.21.dat" and "third.22.dat" (that's where the file-renaming utility I listed at the outset comes in useful), and upload three subsets, each of which contains 7 files and weighs in at at around 2.5 GBs in total, to accounts "one", "two", and "three".
Secondly, we'd need to create a config file which can translate from the real filenames to the anonymized ones, and which tells the server where any given constituent file is located. Something along these lines (note that the below could be made even more compact by using the alternation operator |, at the price of reduced clarity):
--RewriteEngine On
--RewriteBase /
--RewriteRule S01E(0[1-8]) http://one.threehost.com/data/third.$1.dat
--RewriteRule S01E(0[9-9]) http://two.threehost.com/data/third.$1.dat
--RewriteRule S01E(1[0-5]) http://two.threehost.com/data/third.$1.dat
--RewriteRule S01E(1[6-9]) http://three.threehost.com/data/third.$1.dat
--RewriteRule S01E(2[0-2]) http://three.threehost.com/data/third.$1.dat
This, we'd upload to "http://zero.threehost.com/redirector/third/", and then use that address as a web seed when creating a torrent from the original local folder.
Finally, torrents like this should always be tested prior to upload. The process is sufficiently intricate for mistakes to leak in every now and then, even if one knows what one is doing - and while most of them will be server-side and thus reparable without requiring any changes to the torrent, by resubmitting the config file and/or reorganizing the data, that's not the sort of thing one should take for granted. When everything is demonstrably working (personally, I take that to mean that the torrent client has successfully downloaded at least one full piece from each host, but YMMV), we post the torrent to TPB (or wherever) just as we normally would. Woot!
Rather than going to the lengths of doing all that for real, I've merely placed corresponding sets of empty .dat files into three separate folders under the account already familiar from the earlier examples, and added a set of closely analogous rules to another .htaccess file. You can once again check for yourself that http://both.000webhostapp.com/redirector/third/ contains no data, so that redirection must definitely be taking place for requests like these (which, to better illustrate the anonymization and distribution aspect, open folder views instead of the files themselves) to work as intended:
- http://both.000webhostapp.com/redirector/third/Stargate.SG-1.S01.DVDRip.XviD-LOCK/Stargate.SG-1.S01E05.DVDRip.XviD-LOCK.avi
- http://both.000webhostapp.com/redirector/third/Stargate.SG-1.S01.DVDRip.XviD-LOCK/Stargate.SG-1.S01E12.DVDRip.XviD-LOCK.avi
- http://both.000webhostapp.com/redirector/third/Stargate.SG-1.S01.DVDRip.XviD-LOCK/Stargate.SG-1.S01E19.DVDRip.XviD-LOCK.avi
Feel free to ask questions about any particular subtopic you still don't understand after having read (and re-read) this tutorial. That'll allow me to improve those passages for future readers. On the other hand, if you get completely lost, the more probable problem is that you're lacking some relevant background knowledge, which I'm not going to be in much of a position to help you with, beyond supplying keywords for searches and/or links for further reading. Constructive criticism is, as always, expressly invited.
Also, it might be helpful for future readers if those who actually try their hand at this approach (and manage to make it work) were to post torrent and web links, and the pertinent config file contents, below. The more examples, the merrier.
Thanks for that in advance!
Written by me in July 2014; originally posted as a KAT tutorial.
Thoroughly updated and TPBified in August 2017.
Thoroughly updated and TPBified in August 2017.