Fetching leaked creds from Github ( Don't commit your DNSSEC keys part 2 )

I'm a fan of using command line for everything, I admit it. I love the long lines of code that you write into the terminal, so I had to create an excuse to do so.

I wanted to grab leaked credentials from a VPS with a script that was being called every X hours directly from the cronjob avenue.

I chose some utilities which are well known for its efficiency and I will try to explain why I chose them.

First of all, let's echo the URL we're trying to scrap here:

echo "https://github.com/search?utf8=%E2%9C%93&q=remove+password&type=Commits"

Wget. Linux web get utility will do the HTTP content fetching for me. I chose it because, from the man page:

Wget is non-interactive, meaning that it can work in the background, while the user is not logged on.

A.K.A I can run it and, even if I'm not connected into the system, it will fetch it for me. This will be of maximum relevance since what I want is to receieve the data straight from the server, without me requiring to log in or run any commands at all.

Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.

Always have a plan B and specially with networks, since the conditions in which the script will have to run one random day will be totally unbeknownst to you. Wanna be careful.

wget --report-speed=bits -i- --rejected-log=/some/dir/file.log -O-

I included some params like the speed or logs error for later comparisons.

Grep. Well, cuz grep is important. Man.

Also we will need to parse through HTML tags, just to keep track of which was the last upload and maybe create a new alert so we know a new password was commited.

It may be necessary a bit of regex in order to filter tags.

This line will give us all the results for all the commits with the search parameters we stablished, which were 'remove password' (in grosso modo).

grep -E 'sha btn btn-outline BtnGroup-item' | grep -oE '\/commit\/[a-zA-Z00-9]{7}' > ~/output.txt

Now we will have to check if the ones we have match the other ones that were stored in the past. If they do, we will do nothing. If there are entries which do not match with the older ones, we will send a notification.

To check whether this was processed in the past, we'll make use of the commit ID.

Note: I didn't use egrep instead of grep -E because the former is already deprecated, and even though it's also mantained for backwards compatibility I decided to avoid all trouble and just use whatever cool kids use these days.

After getting the results, I created a file in ~ where I wrote the current results I got.

Cat. Maybe I should have realized this before, but as I thought of a way to compare the old strings to the new strings we'd get after every parsing, I came to the conclusion that it would be ideal to store the old list of the ID of commited credentials into an old file and then compare both.
So I'm going to include the command that I should probably run first of all.

cat ~/output.txt > ~/old_output.txt

Now we'll have two files with strings that we will be able to compare. After writing that first line, however, we'll be using '&&' instead of piping it, just because we would have nothing to pipe and those are two independent commands which we are running in the same line for the sake of not splitting more work.

Comm. To be honest I didn't know about this commant until recently (sorry about that). Turns out it's pretty darn useful, so if you're interested, here's the man page.

comm -3 file1 file2 Print lines in file1 not in file2, and vice versa.

That was just what we're looking for. Also by default comm will expect our files to be sorted, and we don't want them to be sorted. I could do it, but I won't. Thankfully there's a command to tell comm to skip the order checking part.

( comm -3 --nocheck-order ~/output.txt ~/old_output.txt ) > ~/messenger.txt

The messenger.txt file we just created will be storing all the non-matching IDs from the commit. Those non-matching IDs will be those commits we're interested on, since we already have the ones that have been commited in the past (because all of the IDs were at some point non-matching).

Now the extra step would be to actually build an irc/telegram/mail/(...) bot that would receieve the contents of messenger.txt file every X hours (job done with cronjobs) and send us an alert with the name, commit ID and link of the commited file if it does receieve contents at all (remember that messenger.txt will only have contents if the parser finds new contents, because if it doesn't, the messenger.txt file will be empty, and so generating alerts will be totally unecessary (although I would send an alert saying 'no matches found' or something similar just to verify that it is still working)).

For that, I would have to actually parse more than just the ID (the name and link of the commit). And that would consist on adding a few more lines to that monstruosity I just wrote.

The bot building part would be an entirely different matter, since it depends on what you want. I would totally chose Telegram because I carry my smartphone with me all the time, so I would receieve my alert instantly.

Up to you :)

Final Result:

cat ~/output.txt > ~/old_output.txt && echo "https://github.com/search?utf8=%E2%9C%93&q=remove+password&type=Commits" | wget --report-speed=bits -i- --rejected-log=/some/dir/file.log -O- | grep -E 'sha btn btn-outline BtnGroup-item' | grep -oE '\/commit\/[a-zA-Z00-9]{7}' > ~/output.txt && ( comm -3 --nocheck-order ~/output.txt ~/old_output.txt ) > ~/messenger.txt

Comments

Popular Posts