Infopost | 2015.03.22

A few months ago Steve and I noted that our respective sites were getting tons of hits from Samara Oblast, an obscure(?) territory in Russia. Russian search engine maybe? Cybercriminals? Proxy for the American or Chinese or Syrian electronic armies? Who really cares? Only port 80 should be open and doing nothing fancy

But since this kilroy thing has gotten pretty lengthy I was scoping the possibility of doing some sort of 'top content' thing based on hits. So I pulled my server logs and was looking through them to see how hard it'd be to parse.
Attack surface

Missile command screenshot

Well this is fun:

91.200.13.119 "GET /kilroy/archive/2008/04/index.html HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/leader-board-r.html HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /kilroy/2008/01/index.php HTTP/1.0"...
91.200.13.119 "GET /2008/01/index.php HTTP/1.0"...

How am I going to count hits for 2008/01/index.php when there is no anything.php?

Eight sequential hits from the same person, within 10 seconds. That's what I call quick on the mouse. Whois says it's from Ukraine. I'm going to stop me right here, this is my first time actually looking at http traffic, this is old hat to 80% of the world. Okay, let's continue.
Maybe they're just guessing about site map, but probably they're looking to have some fun with php.

Another interesting one:

POST /cgi-bin/php?
%2D%64+%61%6C%6C%6F%77%5F%75%72%6C%5F%69%6E%63%6C%75%64%65
%3D%6F%6E+%2D%64+%73%61%66%65%5F%6D%6F%64%65%3D%6F%66%66+%2D%64+%73%75%68%6
F
%73%69%6E%2E%73%69%6D%75%6C%61%74%69%6F%6E%3D%6F%6E+%2D%64+%64%69%73%61%62
%6C%65%5F%66%75%6E%63%74%69%6F%6E%73%3D%22%22+%2D%64+%6F%70%65%6E%5F%62%61
%73%65%64%69%72%3D%6E%6F%6E%65+%2D%64+%61%75%74%6F%5F%70%72%65%70%65%6E%64
%5F%66%69%6C%65%3D%70%68%70%3A%2F%2F%69%6E%70%75%74+%2D%64+%63%67%69%2E%66
%6F%72%63%65%5F%72%65%64%69%72%65%63%74%3D%30+%2D%64+%63%67%69%2E%72%65%64
%69%72%65%63%74%5F%73%74%61%74%75%73%5F%65%6E%76%3D%30+%2D%6E HTTP/1.1

Looking to do injection or overflow or something? Not really my wheelhouse, but it was kind of a fun digression.
Classifier

So I wrote some code to classify site traffic into one of the following categories:
Some of it was pretty easy, bots tend to declare themselves in the user agent string and hit robots.txt first. Malicious stuff sends PUTs and looks for files that aren't .html/.jpg/etc. And, of course, sequential traffic from the same IP can be classified together. This is important because an attack might hit numerous legit links but it's not visit traffic.
Data

Logs go back about a year. Here's some excel because easy.

Classification of web site hits

I get indexed about twice as much as I get visited. There have been more than 20,000 malicious http requests.

Web site bot hits histogram

Google, Baidu, and Majestic 12 (a distributed indexing project) turned up most. But there are quite a few bots out there.

So the top visited content, the main reason for this whole endeavor:

Pages
Images
Labels - which are now just links to search
Data skew: some content has been around longer. On the other hand, the logs are only from about a year back.

When I get some more fun-coding time I'll see about putting this in the sidebar.



Related - internal

Some posts from this site with similar content.

Post
2010.01.10

Favorite photos, 2010-2019

Here I present the photos from this decade that I like technically, aesthetically, or nostalgically. You may notice the post is at the beginning of the decade, I've chosen this as a convention so I can keep a running post for in-progress decades.
Post
2014.01.01

Popular

A gallery of my most popular photos, updated regularly.
Post
2015.01.19

Improvements

Static site generator changes, fantasy football, and some video games.

Related - external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

embracethered.com

Hacking Google Bard - From Prompt Injection to Data Exfiltration · Embrace The Red

404ed
positive.security

Windows 10 RCE: The exploit is in the link | Positive Security

Chaining a misconfiguration in IE11/Edge Legacy with an argument injection in a Windows 10/11 default URI handler and a bypass for a previous Electron patch, we developed a drive-by RCE exploit for Windows 10. The main vulnerability in the ms-officecmd URI handler has not been patched yet and can also be triggered through other browsers (requires confirmation of an inconspicuous dialog) and desktop applications that allow URI opening.
404ed
gwern.net

Epigrams · Gwern.net

Witticisms, parodies, pointed observations, japeries and/or jocularities, Tom Swifties, nominative determinism, and discursive drollery

Created 2024.06 from an index of 272,339 pages.