its very slow (about 1-2 fps)
For i = 0 To kernelSize
re = ri + i
If ( re >= 0 And re < 800 ) Then
re += y*800
cb += b( re ) * Kernel( i )
sum += kernel( i )
EndIf
Next
*Temp = ( cb / sum )
For large filter-radii I wouldn't apply a gaussian at full resolution.
Downsample the image with a simple 2x2 box-filter to half the resolution, apply the gaussian (with half the radius) and upsample with a bilinear filter (which is very simple since only the "middle" colours are missing).
About the gaussian itself you shouldn't test each input-sample if it's inside the image boundaries.
Split each scanline into three sections: left, center and right.
On the left side (which is as wide as your filter-radius), samples can only be outside on the left side.
On the right side, samples can only be outside on the right side.
In the center region, all samples are inside the image boundaries, there's no need to test anything and "sum" stays constant!
You can now prescale your filter-kernel so that "sum" is a power of two (or accordingly bigger/smaller if you have intensity-scaling to do anyways) and the division can be replaced by a shift.
If you don't skip outside-samples (which changes "sum") but map these to the "boundary" sample (clamping), you can use the shifting-approach for the left/right-sections, too.
Let's assume your filter has a radius "r" and "kernel" ranges from 0 .. 2*r (inclusive) and kernel[r] corresponds to the center element.
When processing the left section, the number of outside samples decreases with each iteration:
For the first output-pixel, samples [0..r-1] are outside (access b(0) instead), sample [r..r*2] access b(0..r).
For the second output-pixel, samples [0..r-2] are outside, sample [r-1..r*2] access b(0..r+1) - and so on.
So you just have to keep track of the weight for b(0) which starts with sum(kernel[0..r]) (since you know the sum of the complete kernel this is for free, too) and decreases by kernel(x).
The right side works just the other way round. Use the same approach for the vertical pass with top/center/bottom.
Depending on the required precision you can process multiple samples at once (integers are 32bits wide).
Notice that you kernel function (pink) approximates a gaussian (green) rather well, but overestimates the center significantly:

(more of the unprocessed signal passes, less lowpass, more ringing).