Count Zero Bytes and NonZero Bytes in a Bytestring (varbinary)

Hi all, I’ve been trying to track down a way to solve counting matching bytes in a bytestring, wondering if anyone has done this or has ideas:

I have bytestrings (varbinary datatype) that look like this:


I’m trying to count how many bytes (every 2 characters) are equal to ‘00’ and how many are equal to anything else. Note: This need to enforce that the ‘00’ is a byte, not any two 0s next to each other (i.e. 00aa = 1, a00a = 0)

Appreciate any help!

saw this on Dune too so I decided to take a stab at it.
got it to work by casting to VARCHAR and changing the zeros to capital O’s

Thanks, i’m trying to see if I can just do it in a one-line formula versus needing to cross join / unnest. Imagine running this on millions of rows.

In postgres, I was able to cast as escape, in spark I was able to unhex as US-ASCII. Maybe there’s some user-generated function that could work.

I ~think I found a solution, which is combination of things suggested in Dune as well.

Zero bytes = (length(from_utf8(data)) - length(replace(from_utf8(data), chr(0), '')))

However, (what threw me off before) the (length(from_utf8(data)) seems to be meaningless here (we lose bytes), so:

Nonzero bytes = bytearray_length(data) - [(length(from_utf8(data)) - length(replace(from_utf8(data), chr(0), '')))]