# Arithmetic on gensub substitution in gawk

I wonder whether the following is possible:

``````echo -e "0@1 1@1 0@0\n0@0 1@1 0@1" | awk '{print gensub(/([01])@([01])/, "\\1" + "\\2", "g")}'
``````

It doesn't work the way it is; is that because the evaluation of "+" happens before the substitutions of "\1" and "\2"?

As output, I would expect 1, the result of arithmetic on \1 and \2, so for \1=0 and \2=1, the output should be 1.

Also, as per answer below, I am not looking for a solution on how to add 1 and 0 in "1@0"; this is just an example, I just wondered whether it is possible to do arithmetic on \1 and \2, since this works: `gensub(/blah blah/, 0 + 1, "g")` gives `1`.

-
better to advise what is the expected output. Is it `0+1` as a string, is it 0+1=1 as a number....? – George Vasiliou
It will be good if you could show us the output which you are looking for, please add the expected output in code tags in your post. – RavinderSingh13

You can't use `gensub()` for this, because it returns the captured groups as literal strings as its result.

For such a trivial requirement use `@` as the field separator and do the arithmetic computation as

``````echo "0@1" | awk -F@ '{print (\$1 + \$2)}'
``````

Or if you are worried about string values in the input string, force the numeric conversion using `int()` casting, or just add `+0` to each of the operands, i.e. use `(int(\$1) + int(\$2))` or `((\$1+0) + (\$2+0))`

As per the updated question/comments in the answer below, doing constant numeric arithmetic is not something `gensub()` is intended for, which is supposed to do a regexp based pattern search and replacement. The replacement part on most cases involves dealing with the captured groups from the search string and apply some modifications over it.

-

I think I understand what you want, and you can do it in Perl using the `e` modifier on a substitution which means it evaluates the replacement. Here's an example:

``````echo "7@302" | perl -nle 's/(\d+)@(\d+)/\$1+\$2/e && print'
309
``````

Or, slightly more fun:

``````echo "The 200@109 cats sat on the 7@302 mats" | perl -nle 's/(\d+)@(\d+)/\$1+\$2/ge && print'
The 309 cats sat on the 309 mats
``````
-
I don't know any `perl`, but I thought anything `perl` can do, `awk` can do as well? No? – A. Blizzard
Nice! Can't up vote no privilege. – A. Blizzard
I may get shot down in flames for this, but my perceived order of capability going from lowest to highest is: `sed`, `awk`, `Perl`. – Mark Setchell
@A.Blizzard Mark is correct. `awk` is a language for manipulating text that is all. `perl` can also manipulate text but additionally it can do the things you use other tools/languages for, e.g. manipulating files and processes like you'd use a shell for. The result is that awk is a very small, simple tool/language that does one thing and does it well while perl is something quite different (see zoitz.com/archives/13 :-) ). – Ed Morton

When you write `foo(bar())`, you'll find that `bar()` is executed first whether it's a function or any expression so `gensub(..., "\\1" + "\\2", ...)` calls `gensub()` using the result of adding the 2 strings which is `0`, i.e. `gensub(..., 0, ...)`.

This isn't semantically identical to the code you wrote but the approach to do what you want is to use the 3rd arg to `match()`:

``````\$ echo "0@1" | awk 'match(\$0,/([01])@([01])/,a){print a[1] + a[2]}'
1
``````

The above uses GNU awk for that 3rd arg to `match()` but you were already using that for `gensub()` anyway. If it's not clear how to use that on your real data then post a followup question that includes an example of your real data.

-
Thanks! This looks like it will get pretty cumbersome for more than one set of "0@1" in \$0. The above `Perl` version with "ge" looks very neat though. – A. Blizzard
You're welcome. Why would you think it'd get cumbersome though? – Ed Morton
I've never really used `match`, I know `awk` only superficially for basic things; say it was "0@1 1@1 0@0", wouldn't I need to write at least a loop over `a`? – A. Blizzard
`\$ echo "0@1 1@1 0@0" | awk -v RS='\\s+' 'match(\$0,/([01])@([01])/,a){print a[1] + a[2]}'` outputs `1 2 0`. Is that your desired output? If you want to know how to do whatever it is you're trying to do in awk and/or perl then take a few mins to figure out some truly representative sample input and expected output representing your real data and post a question using that so we can best help you. Going one brief snippet of isolated text at a time isn't useful. – Ed Morton
Yes, I really should not have been lazy and put up a proper example. I have modified the sample code in the question. The numbers should replace the "\d@\d" strings. – A. Blizzard
You didn't provide the exact expected output though and you've already accepted an answer so the number of people who'll look at your question now is limited which is why I suggested posting a new question. This time around provide sample input as a file and the associated output also consider if your input is really always just space-separated number-@-number pairs - what if those were embedded in test that included email addresses like `bill0@1way.com`. Also is it really just 1s and 0s? Just come up with realistic input and the associated output. – Ed Morton
Mark Setchell's answer (the second line with "ge" option) really does the exact job! I won't post another question as I am sure there are other better problems to look at. It was just something that occurred to me yesterday late night as I was converting genotype data of the form "0|1", "1|0", "0|0", "1|1" into counts 1, 1, 0, 2, I always used either condition statements or three `gsub` statements in `awk`. As I was trying to read up more on `awk` I thought the `gensub` could be a neat solution; although, I am sort of aware that it probably would have been slower than the three `gsub`s. – A. Blizzard
The files are about 12 million rows and 250 columns, the above `perl` solution ploughed through it fairly fast, and I am happy with it. I modified the separators etc. – A. Blizzard
That's fine, if you ever do want to see how to do the job in awk, just post a question. – Ed Morton