As you've probably spotted, the problem is knowing when one byte finishes and the next one starts. It occurred to me last night one way of doing this is to use 3 bits to encode how may bits follow. So each byte would encode to something between 4 and 10 bits. We can imply the leading 1 so we don't need to store it.
ie.
00000000 -> (1 bits), -> (000),0 -> 0000
00000001 -> (1 bits), -> (000),1 -> 0001
0000001x -> (2 bits),x -> (001),x -> 001x
000001xx -> (3 bits),xx -> (010),xx -> 010xx
00001xxx -> (4 bits),xxx -> (011),xxx -> 011xxx
0001xxxx -> (5 bits),xxxx -> (100),xxxx -> 100xxxx
001xxxxx -> (6 bits),xxxxx -> (101),xxxxx -> 101xxxxx
01xxxxxx -> (7 bits),xxxxxx -> (110),xxxxxx -> 110xxxxxx
1xxxxxxx -> (8 bits),xxxxxxx -> (111),xxxxxxx -> 111xxxxxxx
Will it compress?
Number of patterns for each bit length vs compressed length
bits -> occurences -> packed size
n-> 2^(n-1) -> (n-1)+3
0 -> 1 -> 4
1 -> 1 -> 4
2 -> 2 -> 4
3 -> 4 -> 5
4 -> 8 -> 6
5 -> 16 -> 7
6 -> 32 -> 8
7 -> 64 -> 9
8 -> 128 -> 10
So you can see that any values with more than 6 bits get bigger and any with fewer than 6 get smaller. So, 31 shrink, 32 stay the same, 193 get bigger.
But this doesn't mean it won't compress. No compression algorithm can compress every data stream, and no compression algorithm can ever compress completely random data - which is good news for us.
If the data we were compressing only ever had the lower 6 bits set then we would have a winner. ASCII data might compress using this algorithm, it never has the 7th bit set so the worst cases go out of the window.
Also, many compression algorithms go in stages - first, rearrange the data, then apply a Huffman compression. JPEG, MPEG1, MPEG2 all do this, the final bit stream has several layers of compression. It's possible that rearranging some streams using our algorithm (which will probably make the stream longer) will actually aid in compression with another algorithm like Huffman.
For instance, if all our bytes had 8 bits set, our output stream would be 10/8ths longer, but it now has some structure which might appeal to another compressor.
Jim