Content filtering software inspects web and FTP requests before they leave the network. Based on the software’s rule set, it will either allow or deny the connection. Corporate networks and schools use content filtering to ensure that employees and students cannot access ‘inappropriate’ web content.
Filtering applications use three criteria to determine if the client is requesting banned content. The HTTP header is scanned for (1) domain name, (2) IP address and (3) key words. The application hosts a database of all blacklisted content. After the HTTP request is sent from the client, the application compares the HTTP Header to all listings in the database. When a match is found, the application reacts.
Three reactions are typical of a content filter. First, the content filter drops the packet requesting the data. That means that instead of sending your web request to
http://mypronsite.org , it quietly drops the packet. The remote web server never sees the request.
Additionally, most content filters will redirect your browser to an internal web page. This is usually an intimidating page stating that you have broken corporate policy and that the material you are trying to access is inappropriate.
Finally, most content filters will log the action. The application’s administrator has a full listing of each filtered web access attempt. The logs will show who broke the web policy, which workstation sent the request, what time the event occurred and which web page was requested. The more sophisticated applications have a snazzy front with reports such as ‘top ten offenders’, ‘most popular banned sites’, and so on. If the redirected page states that your action has been recorded, it probably has—and in great detail.
WHY CONTENT FILTERS ARE USED
Corporations use content filters to two reasons. The most important reason is to mitigate liability. Corporations are responsible for the environments of their employees. If an employee is surfing porn from his desk and a coworker is offended, the corporation can be sued (quite successfully) for fostering sexual harassment in the workplace. Likewise,while in classes. All of those downloads clogged the campus networks. In most cases, universities relied on QoS to solve the issue, instead of censoring web access to dormitories.
The other resource is employee time and attention. Games sites and sports sites chew up a lot of hours of procrastination. If an employee can not update his fantasy football team at work, then he might spend some of that time filling out spreadsheets instead.
To slip past the filters, one must first understand how they work, and how your requests get from your browser to the Internet web server and back.
NAMES
Most users rely on names to connect to resources on the Internet. Names, however, cannot be routed across networks. The network devices that interconnect the networks around the world use IP addresses to figure out where to pass the requests. The entire Internet relies on public IP addressing to support connections from anywhere to anywhere.
Users cannot be expected to memorize the IP address of every computer connected to the Internet. Instead, the user types the name of the computer that hosts the content he wants. Before the browser creates a request, it passes the name of the computer to Domain Name System (DNS) server.
A DNS server provides a simple service to users. It waits for a client to send a name. It then looks up the name in a database and returns the IP address for the name. The server works much like a telephone directory. When someone needs to call Bill Gates on the phone, he doesn’t dial the name. Instead, he uses a directory to find the number listed for his name. He then dials the number to reach the person he is looking for. DNS is the IP directory for the Internet. First a browser does a lookup of the web server’s name, and then it sends packets to the number.
If content filters only looked at the names of web servers, it would be very simple to bypass them. A user could simply ping the web server’s name and connect to the server by IP address. For example, if
http://www.google.com is a banned website, a user can
ping the name. The console would show replies from 173.194.115.78. A connection can be made to the search engine by typing either 173.194.115.78 or
http://173.194.115.78 into a browser.
By the way, even if the name is banned by the content filter, the user can still safely ping the server. Content filters only scan HTTP and FTP headers. They do not inspect ICMP (such as ping) packets.
IP ADDRESS
Knowing the IP Address of the web server doesn’t really help if a true content filter is inspecting all packets leaving the network. The second criterion the application uses is the IP Address of the remote server.
This is the crux of the issue. Every device between the user’s computer and the server located half-way around the world uses the IP Address to figure out how to forward the request to the correct server. Without an IP address, the packets go nowhere.
KEYWORDS
Before the solution the IP Address issue is discussed, the last of the criterion will be explained. Content filter vendors cannot reliably list every server in a category of banned content. There are too many sports sites, game sites, porn sites and financial sites for the vendor to make a complete list. New sites are also added to the Internet daily.
To catch any sites that may have been overlooked when the database was created or updated, the content filter also scans the text of the URL to find banned words.
For example, the keyword SEX would stop a multitude of web requests, even if the name of the site or IP address for a porn site was not in the database. Unfortunately, this would also block access to the web page for the town of Essex, Connecticut. Unintentionally banning access to a legitimate website is called a ‘false positive’.
The entire URL is scanned for keywords, not just the domain name. That means that if SEX was anywhere in the URL (or the name of a picture displayed), then the content filter will block access to the content.
ROUTE WITHOUT IP ADDRESSES
Back to the main issue, how can a client make a connection to an Internet server without using its Name or IP address? The answer is, you can’t—but you can still slip the request past the content filter.
The answer is in how the address is placed in the packet. In the above example, where the IP address of
www.google.com could be used to get to the search engine, the request would be blocked. If one knows how to put the IP Address into the packet header, but write it in a way that fools the content filter, then the request slips through undetected, unhindered and without being logged.
The complicated answer is:
conversion of decimal octets to binary, then combined into a 32-bit stream, then converted to decimal. If that doesn’t make sense, don’t worry. It only sounds complicated. Done manually, all that is needed is the built-in scientific calculator.
SIMPLE IP ADDRESS STRUCTURE
An IP Address is four decimal numbers separated by dots. An example is an IP Address assigned to
www.google.com: 173.194.115.78. Each of these four numbers can have any value ranging from 0 to 255.
The numbers are referred to as ‘octets’. An octet is an eight bit (read as: eight digit) binary number. Eight bits can represent any value from 0 (00000000) to 255 (11111111). All IP addresses are 32 bits long. Four octets (4 x 8) represent these 32 bits. Users and administrators read and write IP addresses in octets because using a stream of ones and zeroes is impractical—and could give your retina serious screen burn!
Content filters are expecting IP addresses in the standard decimal notation. Instead, we can express the same 32-bit number as one big number, instead of four smaller ones.
SIMPLE MATH FOR A SIMPLE TECHNIQUEStart by pulling up your scientific calculator. In Windows type ‘calc’ into the Run prompt. On Linux, type ‘gcaltool’ in the terminal console. You could also use a site like Math is Fun and use their online
Binary/Decimal/Hexadecimal ConverterOnce the calculator appears, select ‘Scientific’ from the View menu. This will add lots of buttons and options to your plain old calculator. Above the buttons, notices the radial buttons next to each of the number systems: bin (binary), oct (octal), dec (decimal) and hex (hexadecimal). These buttons are used to switch back and forth between the different bases, as well as convert the numbers.
Follow these steps, using the example IP address of 173.194.115.78:
Verify that the calculator is in Decimal (‘dec’ should be selected)
Type in the first octet of the IP address (173)
Convert the number to binary by clicking the ‘bin’ radial button.
Write this number down. The calculator displays ‘10101101’. Octets represent EIGHT digits. If the result from the calculator shows only seven digits, one needs to modify it. In order for this technique to work correctly enter each result in eight digits. Pad the beginning of the number with zeroes until the octet has eight digits. This means if a binary has seven numbers like '1000000' you should write down '01000000’
Switch the calculator back to Decimal.
Clear the calculator display.
Repeat steps 1 through 6 for the remaining octets. Your results should be: 173 (10101101), 194 (11000010) 115 (01110011) and 78 (01001110)
Switch the calculator to binary.
Combine the results of your conversion into a single 32-bit number (10101101110000100111001101001110) Notice, if you failed to pad the last number with a zero, the result would be only 31 bits, and the technique would fail.
Type this number into the calculator and convert it to decimal. This should give you a decimal result of 2915201870.
In your browser, type
http://2915201870 and hit enter.
Notice that the Google search engine appears.
A content filter will see a request for a web server named 2915201870. This does not match (1) the name of a banned server, (2) an IP address or (3) a keyword. The browser wrote the 32 address into the packet header, but the content filter, which only inspects the HTTP header, doesn’t notice that the server is blacklisted. Because this activity is not significant, it will not flag your request. Instead, it will fetch the content that was requested.
VENDORS AND COMPANIES KNOW ABOUT THIS
Content filter vendors are aware of this vulnerability. A developer can easily fix the hole of the application, but they won’t. It would be detrimental to the vendor to do so.
With the current structure of the application, three separate queries and functions must be run for every HTTP packet the passes through the device. This delay causes network latency (in other words, slows down the network). Three queries need to be run against a database to determine whether the packet passes through or gets dropped.
To add a feature to the application that patches this hole, the developer could build another complete table in the database to hold all of the converted decimal addresses for blocked content. This would increase the size of the database by 25%. More importantly, there would be a corresponding decrease in performance, due to the added query.
An alternative is to build a function into the filter that performed a text-to-strings-to-decimal-{many mathematical calculations}-to-string-to-IP operation—followed by a database query. This is too much processing overhead to perform on every HTTP packet that passes through the device. This again slows down the performance of the content filter.
If a vendor chose one of the above options for the application, his product would perform only 75% as efficiently as his competition. It is a hard sell when your product is slow and inefficient but protects against a really obscure method of looking a websites that involves lots of stubby pencil math to exploit. So for competitive reasons, developers have no interest in changing the way they filter traffic.
Furthermore, there is little demand for a product with this feature. Companies are concerned with legal liability. If an employee goes to the extents of what we saw above just to look at a website, then the company has shown due diligence to protect the work environment. The fact that an employee must consciously bypass software and devices to get a single blacklisted page shows that the company did spend time, effort and money to secure the workplace. If a couple of employees abuse this hole and a lawsuit is filed, the company is in the clear and the employee is liable.
WHEN THIS TECHNIQUE DOES NOT WORK
The technique illustrated does not work in all environments. This method of surfing was not a planned and supported function of web browsers and servers. The procedures only work by taking advantage of some features that are a part of web standards. The following circumstances may break the feature:
Internet Explorer 7: IE7 and newer have closed the hole on this technique. Before sending the web request, it translates the decimal IP address back to octets. This is a browser-level function. The latest version of Firefox works perfectly. IE6 can also be used to successfully bypass filters.
Websites That Use Host Headers: Host Headers are a technique for hosting multiple domain names on a single IP address. If eight sites are running on a single IP, and the header asks for a long string of number for the site, then the web service will not know which web site the client is requesting.
Sites with URL Security: Some anti-leaching and other settings do not allow requests framed from a domain name other than the site’s true name. To see an example of a site that would not accept this technique, navigate to
http://linux.org. Note the error. The site only accepts requests for
http://www.linux.org. Many browsers automatically add the www if the original address was not available, so you may have not even noticed the error, but did notice that the web page now includes the www in the address bar.
BEWARE OF CONTENT IN THE PAGE
The technique allows a user to grab a webpage, despite the vigilance of the content filter. The page itself contains links to other content, such as pictures. The URLs for those images are written into the HTML page. They have not been converted to a decimal equivalent of the domain name. However, if the web page uses relative links to images and content on the same server, the user will get all of the content without any problems.
If literal links, or off-site links, are inside the page, you may still get flagged by the content filter.
TOO MUCH MATH
The technique above involves a great deal of effort to pull up a page through the filter. The amount of work involved is seldom worth a single page of content. The same math used to manually translate a link can be coded to automate the process. One can easily find applications such as Proxy Offender on the net which does the same math for you. But I can not guarantee the applications are not legit or have some form of virus, spam, trojan, adware or malware.
If you are filtered, there is a reason, and bypassing these filters will leave you liable for those actions. Is it worth the risk?
Hacking is a good thing, cheating at games is not.
Happy surfing...
Edit:
Fixed a couple misspellings.