REGEXP_REPLACE substring replacement syntax

Having trouble with a simple substring replacement.

The documentation states to reference PostgreSQL and/or Oracle for syntax.

So I am following the normal easy stuff
http://www.postgresqltutorial.com/regexp_replace/

SELECT data, regexp_replace(data, '.*"xyz":"(.*)","abc".*', '\1') AS new_data_field
FROM (
  SELECT data FROM "production-data-analysis".dw_data WHERE data_type_id=9
) nested_0

This returns 1 in the field. If you swap it out with \2, it will return 2.

It should return the value found between the ( ) into the new_data_field, as it successfully matches on that pattern against that fields values.

Any thoughts? This should be simple REGEX stuff.

I think we follow the conventions of Java regex. There happens to be an example on our home page (I’m terrible at regex but I think this is correct):

Here’s another:

image

Thank You for your reply.

That example does not match the behavior I am looking for. That is for a static replacement with a fixed value ‘XXX-XX’ when a field matches a REGEX pattern. Dremio performs this in the UI and with a hand coded use of REGEXP_REPLACE with ease.

However, Dremio is not recognizing substitution with the standards found in JAVA, PostgreSQL or Oracle for the REGEXP_REPLACE function where any number of matches included between parenthesis (.*) or (matchme I am a literal) are not populating in the ouput of the function into \1 or # of matched value reference.

Yes, the literal replacement with the match works just fine. Which is what the example above provides.

Maybe @steven can help with this?

In Java regex, you refer to matched groups with $<group> instead of \<group>

In your example, you’d want to do this:
regexp_replace(data, '.*"xyz":"(.*)","abc".*', '$1')

You can mess with java regex here: https://www.regexplanet.com/advanced/java/index.html
and read the documentation here: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

2 Likes

Thank You.
That works great!

What would be helpful to change in our documentation? Any suggestions much appreciated.

The description in docs.dremio.com is correct

However, the function pane of the Dremio UI doesn’t specify that it’s java regex:

I don’t know if dealercrm saw either definition or just went with familiar regex syntax (it’s slightly different in practically every system), but it’d be nice if both places indicated that it’s java regex.

I saw both. Just not the daily JAVA engineer, so I went with what is used in most \1 not $1

To get me even more confused, I also referenced JAVA docs which stated groups and back references to be \1 instead of $1
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html

Then read desidero’s linked docs https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
, which also states the same.

However went with his expert suggestion and $1 … $9 works

Thank You

Ha, I guess you’re right. That’s pretty funny.

1 Like

One of the great things about standards is that there are so many of them!