Happy new year !
May the year of 2017 bring all readers a lot of happiness and smiles.
First topic of the year is regular expression. ;-)
I want to replace comma only in double quotation marks of a CSV files.
Maybe I can do it using regular expression. But, how to do it ?
I tried to write some code, and find answer to do it.
I used recursive function.
Code is following.
import re def parser( string ): pat = re.compile(r'"([a-xA-Z0-9,]+),([a-xA-Z0-9,]+)"' ) if len(pat.findall( string )) == 0: return string string = pat.sub( r'"\1\2"', string ) return parser( string )
OK, let’s test it!
I made sample strings.
s1 = 'test,1,2' s2 = 'test,1,2,"3,4"' s3 = '"1,2,hoge",3,4,5' s4 = '1,2,"hoge,hage","foo,bar,3",4,5'
print(parser(s1)) print(parser(s2)) print(parser(s3)) print(parser(s4)) >out test,1,2 test,1,2,"34" "12hoge",3,4,5 1,2,"hogehage","foobar3",4,5
Worked Fine! ;-)