Replace comma in double quotation marks using python Regex.

Happy new year !
May the year of 2017 bring all readers a lot of happiness and smiles.

First topic of the year is regular expression. ;-)
I want to replace comma only in double quotation marks of a CSV files.
Maybe I can do it using regular expression. But, how to do it ?
I tried to write some code, and find answer to do it.
I used recursive function.
Code is following.

import re
def parser( string ):
    pat = re.compile(r'"([a-xA-Z0-9,]+),([a-xA-Z0-9,]+)"' )
    if len(pat.findall( string )) == 0:
        return string
    string = pat.sub( r'"\1\2"', string )
    return parser( string )

OK, let’s test it!
I made sample strings.

s1 = 'test,1,2'
s2 = 'test,1,2,"3,4"'
s3 = '"1,2,hoge",3,4,5'
s4 = '1,2,"hoge,hage","foo,bar,3",4,5'
print(parser(s1))
print(parser(s2))
print(parser(s3))
print(parser(s4))

>out
test,1,2
test,1,2,"34"
"12hoge",3,4,5
1,2,"hogehage","foobar3",4,5

Worked Fine! ;-)

Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: