Posted on March 25, 2020
We've been developing core logic of quite advanced Web Application Firewalls (WAFs) since 2013. For example, from 2013 and till 2018 we have been developing the first 3 versions of core of an enterprise grade next generation WAF Positive Technologies Application Firewall (PT AF) - that version was mentioned by Gartner in the Magic Quadrant 2015 among the visionaries.
During the years of our work in the WAF area, we learned an important lesson: there is an inevitable trade-off - WAFs are either accurate in their security checks or fast, not the both. As an example, the modern WAFs emphasize usage of high accuracy machine learning techniques to maximize detection rate and minimize false positives. Meantime, the accuracy of machine learning algorithms requires high computation and memory resource (e.g. observe complexity of the most basic algorithm - k-means). In the world of server software even O(n) algorithms are considered too slow because web servers must handle hundreds of thousands client requests per a second. As an example, H2O uses less accurate, but faster O(1), algorithm for HTTP/2 streams prioritization instead of more accurate O(log(n)) from nghttp2 library.
We know a similar problem with web servers running a dynamic logic -they're too slow, so there are web accelerators which offload many simple jobs from them. A WAF accelerator works in similar way: it offloads simple security checks from more advanced WAFs, load balances traffic and protects a backend WAF cluster against DDoS attacks.
Since WAFs are just modules for HTTP proxies, Nginx in most cases, then they inherit all the performance limitations of the platform.
Nginx has a reasonable HTTP parser implementation. But if you ever think about launching an HTTP flood attack, then you should try some more or less realistic URI. Nginx uses switch
-driven HTTP parser. This is a code snippet for URI processing (the function is actually much longer):
case sw_check_uri:
if (usual[ch >> 5] & (1U << (ch & 0x1f)))
break;
switch (ch) {
case '/':
r->uri_ext = NULL;
state = sw_after_slash_in_uri;
break;
case '.':
r->uri_ext = p + 1;
break;
case ' ':
r->uri_end = p;
state = sw_check_uri_http_09;
break;
case CR:
r->uri_end = p;
r->http_minor = 9;
state = sw_almost_done;
break;
case LF:
r->uri_end = p;
r->http_minor = 9;
goto done;
case '%':
r->quoted_uri = 1;
// ...
If we try a real query to look for a hotel at Booking come with wrk
traffic generator
./wrk -t 4 -c 128 -d 60s --header 'Connection: keep-alive' --header 'Upgrade-Insecure-Requests: 1' \
--header 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/52.0.2743.116 Safari/537.36' --header 'Accept: text/html,application/xhtml+xml, \
application/xml;q=0.9,image/webp,*/*;q=0.8' --header 'Accept-Encoding: gzip, deflate, sdch' \
--header 'Accept-Language: en-US,en;q=0.8,ru;q=0.6' \
--header 'Cookie: a=sdfasd; sdf=3242u389erfhhs; djcnjhe=sdfsdafsdjfb324te1267dd' \
'https://192.168.100.4:9090/searchresults.en-us.html?aid=304142&label=gen173nr-1FCAEoggI46Ad \
IM1gEaIkCiAEBmAExuAEZyAEP2AEB6AEB-AECiAIBqAIDuAKAg4DkBcACAQ&sid=686a0975e8124342396dbc1b331fab24 \
&tmpl=searchresults&ac_click_type=b&ac_position=0&checkin_month=3&checkin_monthday=7&checkin_year= \
2019&checkout_month=3&checkout_monthday=10&checkout_year=2019&class_interval=1&dest_id=20015107& \
dest_type=city&dtdisc=0&from_sf=1&group_adults=1&group_children=0&inac=0&index_postcard=0&label_ \
click=undef&no_rooms=1&postcard=0&raw_dest_type=city&room1=A&sb_price_type=total&sb_travel_purpose \
=business&search_selected=1&shw_aparth=1&slp_r_match=0&src=index&srpvid=e0267a2be8ef0020&ss=Pasadena%2C
%20California%2C%20USA&ss_all=0&ss_raw=pasadena&ssb=empty&sshis=0&nflt=hotelfacility%3D107%3Bmealplan
%3D1%3Bpri%3D4%3Bpri%3D3%3Bclass%3D4%3Bclass%3D5%3Bpopular_activities%3D55%3Bhr_24%3D8%3Btdb
%3D3%3Breview_score%3D70%3Broomfacility%3D75%3B&rsf='
, then we see the HTTP parser functions in perf top
profiling session:
8.62% nginx [.] ngx_http_parse_request_line
2.52% nginx [.] ngx_http_parse_header_line
1.42% nginx [.] ngx_palloc
0.90% [kernel] [k] copy_user_enhanced_fast_string
0.85% nginx [.] ngx_strstrn
0.78% libc-2.24.so [.] _int_malloc
0.69% nginx [.] ngx_hash_find
0.66% [kernel] [k] tcp_recvmsg
The functions are the performance bottleneck in this case, and performance bottlenecks are good targets for a DDoS attack. PCRE JIT significantly improves performance of the PCRE regular expression library. Intel HyperScan goes even further by applying various optimization techniques to regular expressions as well as compiling number of rules into a single finite state machine (so called multi-pattern matching).
Consider these results from our study of regular expressions performance. The figure at the below shows performance degradation of a WAF, built on top of Nginx, if it runs regular expression rules using the fastest regular expression libraries, PCRE-JIT or Intel HyperScan. While HyperScan is about 60% faster than PCRE-JIT, for the provided rule set we were able to reach only 29% performance of the original Nginx build.
Regular expressions is the classic target for asymmetric DDoS attacks. A single inaccurate regular expression rule led CloudFlare to an outage.
The next question is, should a WAF be installed at the front of a web cache or behind it? Web architecture best practises suggest to setup a slower application behind a faster one, so WAF should reside behind the cache. However, WAF needs to inspect as much traffic as possible to make the right correlations. Also there are the family of web cache specific attacks, e.g. Web cache poisoning, deception, and cache poisoned denial of service (CPDoS) attacks, so the web cache needs a protection. (By the way, read about CPDoS attacks prevention in our previous post.)
Since WAFs typically work as web accelerator modules, their logic processes traffic before the cache. This way the web cache performance is limited by a WAF.
/
, then the first 1,000 requests are quickly serviced from the cache and the following 2nd or 3rd requests from each bot lead to banning the bot by the WAF. The numbers are good in this scenario. But what if all the 1,000 bots request different heavy content from the cache? You'll likely stuck on disk I/O, especially if your cache size is much larger than the RAM (the usual reason why web caches need to store content on disk). Or what if the botnet has 402,000 different IP addresses? The WAF will be fighting with the malicious requests for hours, or, most likely, your Linux machine will just die on system pages reclamation.
We constantly develop new web security features for Tempesta FW, but it's considered as a WAF accelerator. It's main purpose is to handle all ingress traffic, including DDoS, and distribute pre-filtered traffic among WAF instances (if you need to build a seriosly powerful solution) and resources which might not require WAF protection (e.g. static data servers). Tempesta FW performs only fast (not so simple though!) checks, which can be done with wire speed.
tc
, eBPF
and XDP
, tcpdump
, IPtables
and nftables
, IPVS
, and any others which you can imagine. Thanks to the fast network I/O, Tempesta FW can handle traffic even from quite serious botnet launching an application layer DDoS attack against your web resource. Full compatibility with the Linux network I/O infrastructure allows you to employ XDP filtration against volumetric DDoS on the same box with Tempesta FW.
For example, Google was attacked in 2016 with Relative Path Overwrite (RPO) with following line in the URI:
/tools/toolbar/buttons/gallery?q=%0a{}*{background:red}/..//apis/howto_guide.html
(note the anomaly characters {
, }
, :
in the URI). Tempesta FW can block such attacks by the restriction of allowed characters set in URI. To restrict characters set in URI content to only [a-zA-Z0-9/-%*]
, use following configuration option:
http_uri_brange 0x25 0x2A 0x2D 0x2F 0x30-0x39 0x41-5A 0x61-0x7A;
You can find more configuration and attack examples in our wiki and the slides. Recently Wallarm posted the case "when your WAF needs its own WAF". This is a proven practice to setup double security shields from different vendors to make sure that a threat of one of them will be protected by another. Everybody have bugs. We try all the attacks, which we can find, against Tempesta FW and this case wasn't an exception. The request, which crashed ModSecurity, surely was blocked by Tempesta FW:
$ echo -e 'GET / HTTP/1.1\nCookie: =ffoo\nHost: test\n\n' | nc test 80
HTTP/1.1 400 Bad Request
date: Wed, 25 Mar 2020 19:27:04 GMT
content-length: 0
server: Tempesta FW/0.7.0
connection: close
$
MARK
for network packets carrying an HTTP message. This allows you to mark a packet depending on low level network rules (e.g. for a particular TCP port and/or IP address) and continue to process the rules on HTTP layer. For example, suppose you want to filter out all HTTP requests from IP 192.168.100.1
and containing header Referer
with value badhost.com/*
. Firstly, you write a normal nftable
rule to mark all packets from the IP address:
$ nft add rule inet filter input ip saddr 192.168.100.1 mark set 1
Next, define a default http_chain
which checks the mark
and if it's equal to 1
sends the packet for processing by multi_layer_rules
chain or pass to an upstream server otherwise. The multi_layer_rules
checks whether the Referer
header matches the pattern and blocks the request if so or passes otherwise.
srv_group backend { server 127.0.0.1:8080; }
vhost protected_host { proxy_pass backend; }
http_chain multi_layer_rules {
hdr "Referer" == “badhost.com/*” -> block;
-> protected_host; # all checks are passed
}
http_chain {
mark == 1 -> multi_layer_rules;
-> protected_host; # pass all by default
}
It's cool to be able to write such rules, but it's not so handy to do this by hands under a DDoS attack. Hopefully, WAFs can offload their filtration policies to Tempesta FW automatically. For example, exec action for ModSecurity can be used to add a blocking rule to HTTPtables.
The largest application layer DDoS attack registered by Imperva in 2019 was 292,000 requests per second. This means that you can serve the largest application layer DDoS attack right from the cache on entry-level 4-core Xeon server. (This works for HTTP flood attacks targeting the same resource, e.g. /
, which is frankly the most frequent case. For more advanced attacks Tempesta FW uses various challenges and rate limits.)
Placing a web cache at front of a WAF requires Tempesta FW to take care about all web cache related attacks and we constantly work on enhancement of the web cache security.
As always, we welcome you to try and test Tempesta FW alpha.
CPDoS: Cache Poisoned Denial of Service
Being a light-weight web application firewall, Tempesta FW takes care about prevention of well-known web cache deception and poisoning attacks. However, recently a new attack of the web cache poisoning class, Cache-Poisoned Denial-of-Service (CPDoS), has appeared and made us to extend our HTTP parser to prevent the attack.