SQL Server Varbinary/Varchar Casting

Hi Everyone,

We have two columns:

  1. Varbinary (SQL Server Uniqueidentifier)
  2. Varchar

Is it possible in Dremio to cast one column data type into the other (both directions). If so, how do you do it without changing the content but just the data type? Since SQL does not support implicit casting for these two data types (fair enough) how do you explicitly do it? We gave it a try using the convert_from/to functions without success.

In the image below you can see some example values. We would like to be able to represent the same sequence of characters using varbinary or varchar.

Here’s an example of a SQL Server uniqueidentifier value: 5740191A-363C-4FFA-942B-A547131AFDFC

Casting

1 Like

Does anyone has a resolution to this issue?
We pull JSON data from Azure Data Lake containing unqueidentifiers. Dremio pulls these values as varchar data types. We want to join this data to data we pull from SQL Server (where uniqueidentifiers are pulled as varbinary).

How can we join these two data sources?

@akikax,

Something like convert_from(column_name,‘utf8’,‘x’) column_name should work. Did you get an error when you tried convert_from?

I don’t get an error, but I also don’t get a value that will match across sources.

I either need to be able to convert a UniqueIdentifier contained in a VARCHAR field to a BINARY value that will match what Dremio is creating for SQL Server UniqueIdentifier columns, or I need a way to convert the value Dremio returns from a SQL UniqueIdentifier column into a VARCHAR that matches the original UniqueIdentifier string (xxxxxxxx-xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).

Unless there is a way to modify how Dremio is interpreting a SQL UniqueIdentifier column. I could use a view on the SQL side to cast the UID to a VARCHAR and then pull the data from the view, but then any predicates pushed to SQL by Dremio would not be able to utilize any indexing on the UniqueIdentifier column.

Is there any solution for this? im using Micosoft SQL server and I cant use the VARBINARY produced from the uniqueidentifier to make basic queries.

From experimenting with different functions, for the uniqueidentifier ‘A831071E-E183-44D5-B597-0A87F2E3F3F8’ from the sql server dremio get the VARBINARY ‘HgcxqIPh1US1lwqH8uPz+A==’ or this is what i can see in the web ui.

If I use the to_hex function the resulting value is ‘1E0731A883E1D544B5970A87F2E3F3F8’
well if you observe closely it seems that its every character from the original uniqueidentifier expept sufled and with the ‘-’ character missing.

We solve this issue by implementing our own StringFunction and recompile the Dremio code with it.
We simply rearrange the characters in the string and add ‘-’ where is needed.
File: StringFunctions.java

@FunctionTemplate(name = "to_uuid", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
  public static class ToUUID implements SimpleFunction {
    @Param  VarBinaryHolder in;
    @Output VarCharHolder   out;
    @Workspace Charset charset;
    @Inject ArrowBuf buffer;

    @Override
    public void setup() {
      charset = java.nio.charset.Charset.forName("UTF-8");
    }

    @Override
    public void eval() {
      byte[] buf = com.dremio.common.util.DremioStringUtils.toBinaryStringNoFormat(in.buffer.asNettyBuffer(), in
        .start, in.end).getBytes(charset);

      java.nio.ByteBuffer byteBuffer = java.nio.ByteBuffer.wrap(buf);
      Long high = byteBuffer.getLong();
      Long low = byteBuffer.getLong();

      String uuidOriginal = new String(buf);
      String uuid = new String();

      uuid = uuidOriginal.substring(6, 8) + uuidOriginal.substring(4, 6) + uuidOriginal.substring(2, 4) + uuidOriginal.substring(0, 2) + "-" +
        uuidOriginal.substring(10, 12) + uuidOriginal.substring(8, 10) + "-" +
        uuidOriginal.substring(14, 16) + uuidOriginal.substring(12, 14) + "-" +
        uuidOriginal.substring(16, 20) + "-" + uuidOriginal.substring(20, 32);

      out.buffer = buffer = buffer.reallocIfNeeded(uuid.getBytes().length);
      buffer.setBytes(0, uuid.getBytes());
      buffer.setIndex(0, uuid.getBytes().length);

      out.start = 0;
      out.end = uuid.getBytes().length;
      out.buffer = buffer;
    }
  }
1 Like

Thank you for your response. I don’t have the option to add the function at this time, but your answer led me to the Dremio SQL version of it. Here it is if anyone else needs it. I’m sure it’s very inefficient, but it does the job. My opinion something like this function should be included, or an option to read them like this from the source. Our users are used to seeing the UUID’s in this format, and will be very confused if they don’t look like this.

CONCAT(substring(TO_HEX(uuid_column),7, 2), substring(TO_HEX(uuid_column),5, 2), substring(TO_HEX(uuid_column),3, 2), substring(TO_HEX(uuid_column),1, 2), '-',
    substring(TO_HEX(uuid_column),11, 2), substring(TO_HEX(uuid_column),9, 2), '-',
    substring(TO_HEX(uuid_column),15, 2), substring(TO_HEX(uuid_column),13, 2), '-',
    substring(TO_HEX(uuid_column),17, 4), '-', substring(TO_HEX(uuid_column),21, 12)) fixed