Security Challenge

May 06, 2024

[ self math computer-science ]

Try doing the challenge yourself before you read all that lays before ye.

Start

A friend of mine, Euan, sent a post from Ziyad saying to go to a certain website and solve a challenge. So instead of studying for AP Literature, I spent about an hour on this, and this is a path to the solution.

First off, look at the website! Doing select all on the site gives you the comments:

// there is more here than meets the eye
// solve the puzzle for a prize

So it’s already something. Inspecting the page gives you some other comments

<!-- 
    somewhere on this page, there is the start of the puzzle.
    solve the puzzle to get your application to Anthropic fast-tracked.
    come chat with us at the Anthropic booth if you need help, ask for ziyad.

    this page was mostly generated by claude.
-->

<!-- sometimes, the answers you seek are in plain sight -->

So, plain sight. The answer then lies in the background image of the site.

It’s called stego! for stegosaurus. Sadly theres no stegosaurus emoji :(.

Stego!?

Stego here stands for stegonography, which is kind of the opposite of cryptography. Cryptography assumes the encoding method is public, which is why its so mathy (you want to break it).

Stegonography let’s you say “hey! what if I was deliberately suspicious in the way I hid it?” which reflects most small scale applications (texts between friends, etc.). I’ve played around with this, with de-noising algorithms as a method towards stegonography, leading me to very fuzzy and fun images of myself. Will maybe upload a post on this later.

Anyway, if you go to a website called Aperi'Solve and upload the stego.png, what happens is zsteg recovers the following information:

b1,a,lsb,xy .. text: "According to all known laws of aviation, there is no way a bee should be able to fly.\nIts wings are too small to get its fat little body off the ground.\nThe bee, of course, flies anyway because bees don't care what humans think is impossible.\nYellow, black"
b3,a,msb,xy .. file: Apple DiskCopy 4.2 image , 4045389825 bytes, 0xf8f103 tag size, 0xfc encoding, 0xc0 format
b3,rgba,msb,xy .. file: MPEG ADTS, AAC, v4 LTP, 8 kHz, surround + side
b4,rgb,msb,xy .. file: MPEG ADTS, layer I, v2, 112 kbps, Monaural

The bee movie! But why does it truncate? If you actually look at the files, they’re empty and do nothing. These are called “phantom files” and are sort of red herrings?

Numpy!?

The real way to approach this is with some dirty image manipulation. If you plug the image into PIL and convert to an array with numpy’s asarray, you can actually see that the alpha channel is a sequence of alternating 0’s and 1’s (although in the image its 0’s and 255s). Each pixel is of the form [0, 30, 31, 0 or 255]. So, this lets us convert each 8 pixels into a byte, and each byte into a character:

from numpy import asarray
from PIL import Image

image = Image.open('stego.png')
data = asarray(image)
seq = data.flatten()

val = []
for d in seq:
    if d == 0:
        val.append(d)
    elif d == 255:
        val.append(1)

byte_list = []
for i in range(0, len(val), 8):
    subseq = val[i:i+8]
    num = 0
    for ind, bit in enumerate(subseq[::-1]):
        num += bit * 2**ind
    byte_list.append(chr(num))

with open('stego.txt', 'w') as f:
    f.write("".join(byte_list))

This then gives you a text file which is the entire output. Going to lines 242 to 245 (you can run diff stego.txt actual_bee_movie_script.txt by using the text from this gist), you get the following message:

BREAKING OUT OF THE SCRIPT
the thing you are looking for is at the regular website the challenge is on slash 
8471c9e7c8e8e5722c2c41d68575b5f3 dot zip
END BREAKING OUT OF THE SCRIPT

Step Two

So this is the real part of the challenge. By going to the link from above, it downloads 3 files. You get a model.pkl, so you know you have to load something, a model.py which will tell you the architecture, and a README. The README is copied below:

So you did some steganography cracking, huh? Nice job.

The next and final part of this puzzle relies on some understanding of simple
multilayer perceptron behaviors. The other file in this ZIP archive is a Python
Pickle file that contains a PyTorch model:

1. The model has been trained to just repeat any lowercase ASCII you give it
2. Except it has also been trained to output a special "flag" given the right
   password

The input to the model is one-hot encoded and shaped (B, N, V) where:

- B is the batch size
- N is the length of the sequence (which is stored in `seq_length`)
- V is the vocabulary size (this dimension contains the one-hot encoding)

Your goal is to reverse engineer, crack, or otherwise manipulate the model to
extract the password.

Once you complete the puzzle, either find Ziyad at the BSidesSF Anthropic
booth, or reach out to ziyad@anthropic.com with "BSides Challenge Solved!"
in the subject of your email to claim your prize.

Looking at the code, we have model.py

import torch
import torch.nn as nn

import string


vocab = " " + string.ascii_lowercase


class ASCIIModel(nn.Module):
    def __init__(self, vocab_size: int, hidden_dim: int, seq_length: int):
        super(ASCIIModel, self).__init__()
        self.vocab_size = vocab_size
        self.seq_length = seq_length
        self.final = nn.Linear(seq_length * vocab_size, vocab_size * seq_length)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = x.view(-1, self.seq_length * self.vocab_size)

        logits = self.final.forward(x)

        logits = logits.view(-1, self.seq_length, self.vocab_size)
        return logits

Ok, so it’s a cute linear model, where each sequence of characters is just a sequence of one-hot vectors. Makes enough intuitive sense

Break it

First, we want to be able to convert from one-hot vectors to strings. Note the 32, 27. This is there because running it once will tell us that the dimensions must multiply to 864 (you’ll get an error otherwise), and len(vocab) = 27.

def decode_output(output):
    output = torch.reshape(output, (-1, 32, 27))
    indices = output.argmax(dim=2)
    decoded = ''.join(vocab[idx] for idx in indices[0])
    return decoded

Now to load/actually use the model, we use the code below:

model = torch.load('model.pkl')
input_tensor = torch.ones(1, 864 // len(vocab), len(vocab))
print(decode_output(model(input_tensor), vocab))

The output is… flag is damn nice training qataj. Wait, is that it?

Math?

Ok, so this is where I got confused. What the heck is going on with this one?

Looking at the original model, we can see that its mostly just a linear layer that looks like nn.Linear(864, 864). Note that this matrix wants to behave like the identity matrix, as most inputs should go to themselves.

However, for elements that are or are close to the key, they go to another value. Passing in the all ones tensor is similar to passing in all possible strings in at once. This means that some outputs cancel out EXCEPT for the ones that have been moved from the original identity mapping, precisely those close to the final message.

Sadly, this is not the exact answer, as all the vectors close enough to the key will also not map to themselves (I believe theres an argument on the number of eigenvectors as to why this happens).

So, this is why the all ones tensor is quite good, but not enough.

General Function Optimization

That’s where this stackoverflow post comes into play.

Essentially, we have a function, model.forward, which we want to fully understand / model ourselves. We know that model.forward(X) differs from X the most when X is the key.

Using the code from the overflow post, we have:

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.input_to_optimize = torch.nn.Parameter(torch.ones(864), requires_grad=True)

    def forward(self, x) -> torch.Tensor:
        output = x.forward(self.input_to_optimize)
        output = torch.softmax(output, dim=1)
        return output

Currently it just looks like we’re making a copy of the ASCIIModel except we’re not, as the only trainable thing in this model is the input itself.

The idea here is to run gradient descent on the input to maximize the distance between the input and output in hopes of making the input the key and the output the flag.

Cos Cosines

We then need to bring CosineEmbeddingLoss into play, as it includes a tensor label y which we will set to -1 to maximize the difference. Using this is also why we had to run torch.softmax in our Model code. Note that other losses do not have this label or would necessarily need the softmax if they had this label.

import torch.optim as optim

input_model = Model()
cos_loss = nn.CosineEmbeddingLoss()
optim = optim.SGD(input_model.parameters(), lr=0.05)

for _ in range(1000): 
    loss = cos_loss(input_model.input_to_optimize, input_model(model), torch.tensor(-1))
    loss.backward()
    optim.step()
    optim.zero_grad()

print(f'Key: {decode_output(input_model.input_to_optimize)} -> Flag: {decode_output(model(input_model.input_to_optimize))}')

You should get the following output:

Key: pfphjrte tidsymvvmunmarockuqikrg -> Flag: flag is damn nice training data

Further Notes

Now, some questions I have:

How much of this was Claude AI? I’d get the encoding into an alpha channel, but the idea feels human.
- I’d believe it if it trained the model in the pkl file. But how much else did it do?
- From the comment, it’s probably just the website, but… who knows?
What is the non training solution?
- From looking at the weights matrix, no idea to be honest.
How would you make MSE/Cross Entropy work with minimal effort?
- Cosine works here because of that label.

Thank you

Thank you to Ziyad for the challenge, Stephanie for us getting stuck on ‘qataj’, Euan for sending the challenge, Christian for nudging me in towards cosine, Martynas for helping me with the indexing for the alpha channel, and Eric for letting me Feynman technique this on him. Hope to visit the Anthropic office sometime soon!