Kuwaharian Noise

August 14, 2024

[ self math computer-science graphics ]

I wrote my first problem for a CTF, and it went… not as good as it should’ve been. Here’s the story.

Problem Inspiration/Description

There’s an algorithm called the Wave Function Collapse (top ten coolest algorithm names, right behind swarm intelligence). It’s not the quantum one, but it’s inspired by it. The understanding of the algorithm isn’t too important to the story, but I’ll recount the very rough stuff here.

WFC is an algorithm developed by Gumin utilized to make randomized tilesets using base tiles. The image below is from the github:

an example

Essentially, you take the tile on the left hand side, split it into smaller tiles and use the adajcent tiles as a ruleset for which smaller tiles can be next to each other. So, we have a set of tiles, each associated with a ruleset of which tiles can be next to it. In fact, if the original tile is large enough (or we split it into small enough tiles), we can instead have a ruleset where adjacent tiles are described by the number of times they appear in the source tile.

More mathematically, a source tile $S$ gives us a set of tiles $T$ and a function $p(t, s)$ which maps each tile $t$ and a side $s$ to a probability distribution over $T$.

The algorithm starts with a blank image composed of tiles, with each tile having an equal probability of being any tile from our tileset. We start at some tile and randomly pick one. Then, we go to the adjacent tile. Note that the probability distribution has changed to $p(t, s)$, depending on the original tile and side that we picked. Then you continue outwards until each tile has been set.

Note you may have the case where the product of two probability distributions may be zero over all possible tiles. There’s more nuance to the algorithm so this isn’t dealt with, so if someone wants to look more into it, they can check the source code or this video.

I think I watched this video around a year and a half ago, and thought it was very pretty.

Where’s the Encryption?

So, this isn’t done. I spent a good amount of time thinking, and there’s a way of generating a stegonographic message. The way you would do it is pretty simple, where instead of treading $p(t, s)$ as a probability distribution, you treat it as a sorted list of tiles. Then, to encode $x$ in a tile, you simply choose the $x$th tile in the sorted list. Do this for all tiles. Now, you have a sequence of numbers, which can be used to describe a larger number (cryptography is epic). This is, of course, not cryptographic because the knowledge of the algorithm is what makes it a “one way function.”

I tried this but ran into some bugs, but Lefebvre, Wei, & Barnes had done it about a year prior to me trying it! I’d really reccomend trying their solution out.

decode me!

Alternatives

I watched an Acerola youtube video on the Kuwahara Algorithm probably once or twice. Understanding this algorithm is more important. The Kuwahara algorithm is an imaging processing algorithm that looks at each pixel on an image and generates a list of possible pixel values for it. This is the most reduced way of explaining it, but it’s useful for the problem setting.

The ~~simple version of this~~ algorithm reduces to:

Go to pixel (i, j).
Compute the averages and standard deviation of the pixels in the 4 window_size squares that have (i, j) as one of their corners.
Sort these regions by the standard deviation.
Choose the first region’s average and set (i, j) to that color.

Note that just like the WFC-Steg method, this one associates each pixel with a sorted list of possible candidates. And so the idea was born that this cold be a method of stegonography!

How I wrote the problem

Precursor

To start, encode and decode were written by me a long time ago and they work, so I don’t touch them and I trust my past self. They convert each character into a base $N$ number which would have at most length $k$ in this representation, and so a sequence of characters is now a sequence of base $N$ distinguishable numbers after padding out shorter ones. It’s not block encoding or anything fancy but it got the job done.

Kuwahara Algorithm

I tried writing my own kuwahara algorithm using numpy, but what I found out was it took too long for me to be comfortable using it, so instead I turned to pykuwahara, a python library that did roughly what I wanted. I cut it down to size:

def kuwahara(original_image, window_size):
    """
    Kuwahara Algorithm. This function applies the Kuwahara algorithm to an image.
    
    Args:
        original_image (ndarray): The original image to apply the Kuwahara algorithm to.
        window_size (int): The size of the window used for calculating the local statistics.

    Returns:
        tuple: A tuple containing the following elements:
            - averages (ndarray): An array of shape (4, H, W, C)
                                containing the average values for each subregion.
            - stddevs (ndarray): An array of shape (4, H, W)
                                containing the standard deviations for each subregion.
    """
    image = original_image.astype(np.float32, copy=False)

    averages = np.empty((4, *image.shape), dtype=image.dtype)
    stddevs = np.empty((4, *image.shape[:2]), dtype=image.dtype)

    image_2d = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY).astype(image.dtype, copy=False)

    averages_2d = np.empty((4, *image.shape[:2]), dtype=image.dtype)
    squared_image = image_2d ** 2

    kxy = np.ones(window_size + 1, dtype=image.dtype) / (window_size + 1)
    shift = [(0, 0), (0, window_size), (window_size, 0), (window_size, window_size)]
    for k in range(4):
        kx = ky = kxy
        # sepFilter2D's are so awesome, really freaking fast
        cv2.sepFilter2D(image, -1, kx, ky, averages[k], shift[k])
        cv2.sepFilter2D(image_2d, -1, kx, ky, averages_2d[k], shift[k])
        cv2.sepFilter2D(squared_image, -1, kx, ky, stddevs[k], shift[k])
        stddevs[k] = stddevs[k] - averages_2d[k] ** 2
    return averages, stddevs

Now I had a method of converting images into their kuwahara’d siblings. To encrypt it, I would just do what I said before:

def kuwahara_encryption(original_image, message, window_size=5):
    """
    Applies the Kuwahara algorithm to an original image and encrypts a message within it.

    Parameters:
    - original_image (ndarray): The origxinal image to be processed.
    - message (str): The message to be encrypted within the image.
    - window_size (int): The size of the window used in the Kuwahara algorithm. Default is 5.

    Returns:
    - ndarray: The filtered image with the encrypted message.
    """
    averages, stddevs = kuwahara(original_image, window_size)
    indices = np.argmin(stddevs, axis=0)
    sorted_indices = np.argsort(stddevs, axis=0)

    # Start at (window_size, window_size) to avoid the border, where
    # the Kuwahara Algorithm might not get enough pixels.
    start_x = start_y = window_size
    for m in message:
        indices[start_x][start_y] = sorted_indices[m][start_x][start_y]
        start_y += 1
        if start_y == original_image.shape[1] - window_size:
            start_y = window_size
            start_x += 1

    # Second favorite function when writing this, take_along_axis. Epic.
    filtered = np.take_along_axis(averages, indices[None,...,None], 0).reshape(original_image.shape)
    return filtered.astype(original_image.dtype)

Now, for the decrypter, I thought I would be smart. I encoded a base-3 message into the image shifted up by one, so the only shifts I would have to look out for were 1, 2, or 3. This meant I could write it as:

def kuwahara_decryption(original_image, enc_image, window_size=5):
    """
    Decodes an encoded image using the Kuwahara algorithm.

    Parameters:
    original_image (ndarray): The original image used for encoding.
    enc_image (ndarray): The encoded image to be decoded.
    window_size (int, optional): The size of the window used
                                for the Kuwahara algorithm. Default is 5.

    Returns:
    list: The original message from the encoded image.
    """
    averages, stddevs = kuwahara(original_image, window_size)
    indices = np.argmin(stddevs, axis=0)
    sorted_indices = np.argsort(stddevs, axis=0)
    averages = averages.astype(original_image.dtype)
    differences = []
    for i in range(enc_image.shape[0]):
        for j in range(enc_image.shape[1]):
            if not np.array_equal(enc_image[i][j], averages[indices[i][j]][i][j]):
                for k in range(1, 4):
                    ind = sorted_indices[k][i][j]
                    if np.array_equal(enc_image[i][j], averages[ind][i][j]):
                        differences.append(k)
                        break
    return differences

For the challenge, a player would not have access to the kuwahara function or the kuwahara_decryption function. I thought that was a valid tradeoff. I tested this on the flag and it worked for me, so I sent it in.

No Bug Testing

One side note though: color is annoying and we don’t want to do this process 3 times, so we actually have to convert the image to grayscale beforehand, and compute the standard deviations on the grayscale regions and use the average of the best grayscale region. To make this more clear to the participant, cv2.COLOR_BGR2GRAY is imported at the top of the file, as cv2 also has other options.

The encryption step comes with the choosing, as we now have the ability to embed a base-4 number by choosing different regions. Note that we can only decrypt if we send the image with the image that we ran this algorithm on. Though that’s not quite the case:

What happens if there are regions of the same color/saturation? How do we order it?
What about on the edges, where the squares might go off the image? Does this affect when we can embed a number?
Floating point errors???
How do we distinguish between an actual encrypted message and offsets of 0 if the message is too short?

In the problem, I gave them a noisy and encoded image, and provided the following links:

Some resources:

https://github.com/yoch/pykuwahara/blob/main/src/pykuwahara/kuwahara.py

https://en.wikipedia.org/wiki/Kuwahara_filter

https://www.youtube.com/watch?v=LDhN-JK3U9g

And for a lot of people this seemed to catch their interest.

Bugs. Lots of them

So there’s a lot of problems with this.

.JPEG

The first one that was claimed was the file format was problematic, and that sending it over the internet had changed it. I tested it and this claim was false.

Wording

The next problem was in the wording. This is why doing a computer science degree is good for me, specifically an Oxfordian one, because you can talk directly to your tutor about your wording and hone it.

When I say a 5x5 window around a pixel, this is totally nonrigourous! Is that pixel in the center? The corner?

What I meant to say is let our current pixel be $(i, j)$. Then, we consider each 5x5 square of pixels such that $(i, j)$ is a corner of one of these squares.

This makes it more clear that the region I’m considering per pixel is 9x9. I spent a while trying to fix this so it was clearer.

But realisitcally I should’ve done something more formally pointing to the filter2D. I know some people just went and manually did it, and that might have had different problems.

Also the wording behind the grayscale should’ve been infinitely more clear. That’s on me. No one actually complained about this too deeply but I disliked it strongly.

Sneaky

So the real biggest problem had to do with the fact that using the map “[Int] -> [Float] -> Float -> Int” is not injective enough in general. Even though in testing the inverse worked over a lot of random strings, it didn’t actually test if the averages were unique.

I thought this was fine, but what happens is when you check the inverse, it becomes the case that you don’t know which index to pick!

Which is. Uh. Huge.

More formally, let us have the pairs [(.5, 0.0), (0.5, 1.0), (.5, 0.0), (0.5, 5.0)], which represent the (average, stddev) pairs for each window from upper left, upper right, bottom left, bottom right. For our sake, we label the list [1, 2, 3, 4].

We sort this by the standard deviations, preserving order if things precede already. We end up with [1, 3, 2, 4]. But even if we pick any of these to set the pixel color to, we have no idea! What! Pixel! It! Was!

This is why I tried to make the image noisy, but it still creates a conundrum. I have no real idea why it worked during testing. I guess I always got lucky and it would always pick the first index that was chosen for the encryption, and that was fine.

But this was an unforgivable blunder on my part.

People still solved it!

There’s this idea that in math competitions:

It’s too easy to write really hard problems. There’s no real point to have a question on a test if no one actually solves it.

The actual quote is on Evan Chen’s blog somewhere. I forget.

This generally holds somewhat true. Probably. It can be a cool learning opportunity at the end.

My favorite writeup for this is here: https://liminova.github.io/blog/litctf-2024-kuwahara.

I want to sincerely apologize to people who tried to solve it and ran into my bugs and issues.

My rough thoughts

Ultimately, I think I would’ve actually tried to test it out a ton beforehand. I think the idea was really cool. I know that people enjoyed the algorithm and thought it was cool. I personally think its a really cool edge detection algorithm if you wanted to use it for that.

I’m happy I got to teach people about an algorithm that I love, an algorithm that otherwise isn’t used much at all nowadays.