The way I fixed it was to guarantee the size to be filled was a multiple of 4 (since it was 32bit pixels)
So I made it
char *dst = ...;
int *idst = (unsigned int *)dst;
size_t size = ...;
int val = ...;
val *= 0x01010101;
size>>=2;
while (size--)
*idst++ = val;
Like you say, it's very stupid and it doesn't recognise that as 'rep stosd'.
Jim