Quote:I know this code is already very optimized as It takes O(log n) time.
Indeed, but you take the 1 bit at the time approach. Your code runtime depend on position of leftmost bit set to 1.
Bitwise approach can lead to much more optimized code, because you can enable parallel operations which make an O(n) depending on size of longest run of 1's, no matter its position .
Tmp >>= 1;
n &= Tmp;
To get this algorithm, you need to master Boolean algebra and bitwise operations on processors.
If you want to follow operations, I recommend to add prints of binary values of variable.
This technique can be assimilated to SIMD instructions of the poor.
Single instruction, multiple data - Wikipedia