I've been working with my friend Will on some demos for our Arty FPGA board. It's a really nice system to develop on, and although we rely on the Xilinx software to design the hardware, we've managed to stick with agnostic code for now. I wanted to investigate some old-school computer graphics effects and what better place to start than with the metaball algorithm.

Metaballs are a good example of sampling a field - one of the simplest really. The effect looks something like this:

The idea is that when the moving balls get close to each other, they begin to melt into one another, a bit like a lava-lamp. It can be solved in any dimension, but in our case, we are sticking with 2D. There is a really good writeup of the algorithm on this page here. But in a nutshell, you are testing how far away from all of the circles a particular point in space is. We start with the equation for a point on a circle.

```
r = sqrt( (xs - xp) * (xs - xp) + (ys - yp) * (ys - yp) )
```

... where (xs,ys) is the centre of the circle and (xp,yp) is the point we are testing. This equation would only be satisfied for points that are exactly on the circle. Ideally, we'd like some sort of a range. Are we inside the circle or are we just outside. We can re-arrange the equation a bit...

```
r * r / ( (xs - xp) * (xs - xp) + (ys - yp) * (ys - yp) ) = 1
```

Of course, if our point (xp,yp) lies inside the circle, we get a number greater than one. If the point is just outside the circle we get a number that is less than 1. Often, this will be zero but crucially, it might not be when we get close. The trick is to perform this equation for every metaball in our system, adding up the results. If the result is over a threshold, we colour that pixel in, otherwise, we leave it blank. This gives us the look we are looking for. It's a bit like testing for a point inside several force fields. What is the summation of the field's effects at this point?

I've no idea how to implement floating point maths. Fortunately, fixed point math is also an option. Without going into it too much, we take our word (16, 32, 64 bits whatever) and split it into two parts. The first represents the integer number before the decimal point. The second half represents the fractional part after the decimal point. Sometimes, the first bit is used as a sign bit, although often two's compliment can be used to get a bit more space. Will has a pretty good writeup of this so you should check it out.

I used the opencores fixed point math library in my code. It seems to work pretty well although the division seems to accept integers and spits out fixed point, rather than taking in fixed point, which is odd. Never mind though. I suspect there is a reason for that, although I've no idea why. It caught me off guard for a while.

The original Doom used fixed point math to gain some speed as the CPU it was originally aimed at had no floating point unit! Hard to believe these days but that's how it was. So we are in good company.

Most of the fixed point operations are combinatorial but the division requires a clock. It takes quite a few cycles to compute a division, as some systems programmers will already know! In the verilog we need signal when we are done and ready to move on. This means our VGA module will be sampling the pixels much faster than we can update them. I've cut down the resolution so we can keep the framerate up.

By my estimate, we need roughly 518400 clock cycles for a single frame...

```
180 * 90 * (N+Q+1)
```

... where N and Q are the sizes of the integer and fractional part respectively. Our board is clocked at 10ns per clock cycle so that means we need 0.005184 seconds for this operation to work. We are under our 60fps limit so that's good news (although not by a lot).

The beauty of FPGA design though, is we can build one module, in hardware, for each metaball. That means we can keep adding metaballs and not slow down, although we will use more of our precious LUTs! Still, if you *absolutely, positively have to have 60fps metaballs at all times* this is the way to do it.

In our code we use double buffering. The sram module Will designed is pretty simple. I've added a clear function to it, so I can make double sure it's cleared before being written to. We don't strictly need double buffering, but it's a good thing to be looking into at the moment. Once the division for the metaball in question is complete, it sends a signal. We advance our *working pixel* by one and read the result of the adder, as the metaball modules will have already sent their results to the adder (at some point, I should add some flip-flops to make sure the metaballs are all running at the correct time).

We swap buffers every *frame* which is decided by the VGA driver - again, built by Will.

At the moment, there are a few bugs and many improvements. The centre of the metaballs has a tiny dot. This is actually what a divide by zero looks like! I thought I'd leave it in to remind me.

However, the fact that the screen is duplicated is a clear bug. That might be because of the low resolution not working with my monitor, or something else. Nevertheless, I need to fix that.

It's possible to go up to really high resolutions if you break down each metaball object into smaller, sub-objects and work on multiple *working pixels* at once. For instance, each metaball object keeps a single counter for the current pixel it's working on. There's no reason why we couldn't have two counters, or 4, or how ever many we want! Simply break the screen up into tiles and work it that way. Only only limit is the number of LUTs on our FPGA.

All the code for this example is available on my github page. Have fun!