– last update 09/04/2007
http://www.spammimic.com is a free service providing steganography over spam. Spammimic generates a spam mail from a message, the user can then cut/paste he stegomessage from the browser and send to the recipient.
This is an example, from http://www.spammimic.com/encode.shtml, The user’s message:

hello test

Becames:

Dear Decision maker , We know you are interested in
receiving amazing intelligence . This is a one time
mailing there is no need to request removal if you
won't want any more . This mail is being sent in compliance
with Senate bill 1625 ; Title 4 ; Section 302 . THIS
IS NOT MULTI-LEVEL MARKETING ! Why work for somebody
else when you can become rich as few as 33 days . Have
you ever noticed people love convenience plus most
everyone has a cellphone ! Well, now is your chance
to capitalize on this ! We will help you decrease perceived
waiting time by 130% and turn your business into an
E-BUSINESS . You are guaranteed to succeed because
we take all the risk . But don't believe us ! Prof
Ames of Massachusetts tried us and says "My only problem
now is where to park all my cars" ! We are licensed
to operate in all states ! We beseech you - act now
. Sign up a friend and your friend will be rich too
! Thank-you for your serious consideration of our offer
!

OK?
Different words in the input message produce different spam as output, the site spammimic provides a section for spam decoding too. This as example, from http://www.spammimic.com/decode.shtml :

Your spam message "Dear Decision maker , We know you are in…” decodes to
“hello test”.

What is steganalysis ?
It is the steganography countermeasure, steganalysis tries to infer if an “innocent looking” data as been used as container for other hidden data. In this application the SPAM is the container, and steganalysis must detect the steganography spam between tons of real-spam messages.
I’ve seen spammimic’s generations, I tried some choosen message attacks. Choosen message attack is effective when the attacker could forge arbitrary message and analyze the steganography message.

Spammimic is a closed source tool, as explained in the FAQ: http://www.spammimic.com/feedback.shtml
they have got some good reasons for being closed source, but a security software is better open source as stated by Kerckhoff’s principle (http://en.wikipedia.org/wiki/Kerckhoffs_principle).

The analysis of the stegomessage is made after a large non-random message encoding. The stegomessages derived should be analyzed to highlight security proprieties of the system. Analysis should show redundancies, collisions, implementation problems, etc…

The vulnerability in spammimic is the redundancy of some patterns. Those patterns could be searched inside a spam archive. An email with zero or few pattern matches, is a real-spam email. An email with some spammimic’s characteristics is a stegomessage.
This is the vulnerability of spammimic: the use of a small and predictable dictionary.
SPAM, could be one of the best steganographic containers because it always include some pseudo-random content.

How steganalysis has been done?

I took 400 random real words (from a wordlist). I encoded each word as a single-word message with:

tail -400 new_york_times_most_used_words.txt > 400_words
x=`seq 1 400`
for i in $x; do word=`tail -$i 400_words | head -1` && curl -d "plaintext=$word" http://www.spammimic.com/encode.cgi>dumps/$i.output && sleep 2; done

I wrote a script able to convert the encode.cgi dumps to simple spam sections:

#!/bin/sh -x
# as first argument I require number of file in dumps
number=`seq 1 $1`


for i in $number;
do lines=`wc -l dumps/$i.output | cut -b -3`
tail -$(($lines - 41)) dumps/$i.output | grep -v "<" >> $i.spam
done

Now i had 400 spam blocks ready to be analyzed. I parsed the 400 stegospam messages with sort | uniq, generating a list of common segments:

debian:~/steganalysis# more spammimic_dict
!
.
,
because the internet operates on "internet time"
but don't believe us
...but it won't be sent until you click on send)
dear business person
dear colleague
dear cybercitizen
dear decision maker
dear e-commerce professional
[...]

Email messages have been checked for these patterns, then a result value has given according to the number of pattern matches. This value for spammimic messages (generated from one-word in input) was about 20-30.
Checking for these patterns in real spam messages returned these values:

0 pattern match (16113 messages), 1 (428 msg), 2 (133 msg), 3 (18 msg).

In 16.736 real spam messages from http://untroubled.org/spam (I used spam archive of March 2007) I found two strange mails:

one message (2007/03/1174971001.14272_162.txt) has 36 patterns occurrences and (2007/03/1173129552.10325_753.txt) 10 patterns.

To minimize false positives like these, I checked two other values:

1) The length of message, because rather than the absolute number of patterns, the percentage of patterns in message length is a better value for analysis.
2) The number of pattern delimiters: ” . “, ” ! “, ” , ” are the three pattern delimiters used in spammimic, and a lot of real spam messages could contain these.

I coded a simple script in python, reversed_spammimc.py The script simply outputs some values:
[number of patterns found][size of message][number of first Pattern Separator]
[number of second PS][number of third PS][file name].

These pieces of information are enough to discriminate steganographic message vs real spam.

EXTRA INFO

Another nice example of “pratical steganalysis” is a few tools to discover hidden data, simple steganalysis example with stegdetect, steganalysis@en.wikipedia, defense against statistical steganalysis, and with a deep web pratics you could find anything :)

WHAT IS SPAMMIMIC’S FAULT ?

Steganography could be divided in two large categories: software that generates cover and software that embeds data in existing cover. Usually the second definition is the commonly used (with the example of a message embedded into
an image). Steganography applications able to generate a cover have some advantages and are almost immune to known cover attacks, but the generation must be coherent with Internet analysis.

I don’t know how spam pattern is changing through the years, but at the moment the internal dictionary inside of spammimic is almost obsolete. It is not easy for spammimic coder to avoid this degradation of security, because changing the dictionary requires a definition of an internal “steganographic format” to let detect which version of dictionary was used at encoding time, in order to correctly decoding the message. Some years ago I found the same problem coding blastersteg, a steganographic communication system hiding between the random traffic generated from boxes infected by Blaster worm. At the moment Blaster is not easy anymore to find, so using Blastersteg is unsafe because it creates an anomaly in Internet traffic. An anomaly could be detected by the stegoanalyst and in the end this fact makes this software vulnerable to steganalysis.

Here the package for download: http://www.s0ftpj.org/docs/spammimic_steganalysis.tar.gz