Python questions

replace() returns a new string, it is not an in-place replacement.

@afre Consider your problem solved. Thanks to @Ofnuts

Try this. I added stop words from NLTK and it does something funky. Might have something to do with the mix of " and ' quotes.

Output

Mminmelting, minI gooremembering Python.
Pyth3 woulnice. Noreal pythplease.

Input

init_lines_of_text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines_of_text=init_lines_of_text.splitlines()
phrases=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
n=int(0)

for phrase in phrases:
    t=phrases[n]
    if t[-1]!=' ':
        phrases[n]=t+' '
    n+=1

for line in lines_of_text:
    for n in range(len(phrases)):
        line=line.replace(phrases[n],'')
    print(line)

With all of those phrases, that sound like a hard problem.

Edit:

Found out the solution.

I think sorting the phrases by the length of the string would solve the problem.

Err, I tested it with QPython 3L on Android. My theory unfortunately did not held correctly.

Here’s code:

init_lines_of_text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines_of_text=init_lines_of_text.splitlines()
phrases=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
phrases=sorted(phrases,key=len)
phrases=phrases[::-1]
print(phrases)

n=int(0)

for phrase in phrases:
    t=phrases[n]
    if t[-1]!=' ':
        phrases[n]=t+' '
    n+=1

for line in lines_of_text:
    for n in range(len(phrases)):
        line=line.replace(phrases[n],'')
    print(line)

My uninformed solution is as follows:

stop = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

text = '''My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.'''

l = '\\n'.join([line for line in text.splitlines()])
l = ' '.join([word for word in l.split() if word not in stop])
l = l.replace('\\n','\n')

print(l)

regex

#! /bin/env python3 

import re

text = """My mind is melting, mind you I am not good at remembering Python. Python 3 would be nice. Not a real python please."""

# Can likely be simplified... and maye some of these can be replaced by regexes as well
words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

# Sort the words longest first (technically order is by "contains()"
# But sorting on length also ensures this and is faster) 
# words.sort(key=len,reverse=True) # start with longest first
total=0
for word in sorted(words,key=len,reverse=True):
    # bracket the word between word boundaries markers
    # use re.sub() instead of str::replace because we can use IGNORECASE
    # while we are at it, use subn() instead of sub for statistics
    text,count=re.subn(r'\b'+word+r'\b','',text,flags=re.IGNORECASE)
    total+=count
text=re.sub('  +',' ',text) # cleanup (2 or more spaces to a single)
print(text)
print("---")
print(f'{total:3d} replacements made')

yields:

mind melting, mind good remembering Python. Python 3 would nice. real python please.
2 Likes

Ha ha, this self-talk is great:

My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.

turned into

Mminmelting, minI gooremembering Python.
Pyth3 woulnice. Noreal pythplease.

has become

mind melting, mind good remembering Python. Python 3 would nice. real python please.

Still not perfect as the newline disappeared and there is a space at the beginning.

Try this afre:

import re

text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

# Can likely be simplified... and maye some of these can be replaced by regexes as well
words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]


total=0

lines=text.splitlines()


for line in lines:
    # Sort the words longest first (technically order is by "contains()"
    # But sorting on length also ensures this and is faster) 
    # words.sort(key=len,reverse=True) # start with longest first
    for word in sorted(words,key=len,reverse=True):
        # bracket the word between word boundaries markers
        # use re.sub() instead of str::replace because we can use IGNORECASE
        # while we are at it, use subn() instead of sub for statistics
        line,count=re.subn(r'\b'+word+r'\b','',line,flags=re.IGNORECASE)
        total+=count
    line=re.sub('  +',' ',line) # cleanup (2 or more spaces to a single)
    line=line.strip()
    print(line)
print("---")
print(f'{total:3d} replacements made')

Output:

mind melting, mind good remembering Python.
Python 3 would nice. real python please.
---
 10 replacements made

If you need to capitalize, use this:

print(line.capitalize())

Output:

Mind melting, mind good remembering python.
Python 3 would nice. real python please.
---
 10 replacements made

Edit: If you need capitalization after ". ". That can be done too.

Gonna turn this thread into a generic Python help thread. Hope @afre don’t mind.

Here’s a code I’m working on:

def change_in_between_brackets(str_inp):
    current_str=[]
    temp_arr=[]
    ind=0
    use_current=bool(True)
    for char in str_inp:
        if use_current:
            if char!='{':
                current_str.append(char)
            else:
                use_current=bool(False)
                temp_arr.append(char)
                ind += 1
        else:
            if char=='{':
                ind += 1
                temp_arr.append(char)
            elif char=='}':
                ind -= 1
                if ind==0:
                    temp_arr.append(char)
                    new_string=""
                    for c in temp_arr:
                        new_string=new_string+c
                    new_string=new_string[1:-1]
                    new_string=new_string.replace('{','(')
                    new_string=new_string.replace('}',')')
                    new_string="{"+new_string+"}"
                    current_str.append(new_string)
                    temp_arr.clear()
                else:
                    temp_arr.append(char)
            else:
                temp_arr.append(char)
    print(''.join(map(str, current_str)))


str_inp="crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}"

change_in_between_brackets(str_inp)

# It is missing some commas it seem. That's not right.
# crop[$Index] {(($Wtemp)-($Htemp))/2}{0,(($Htemp)+((($Wtemp)-($Htemp))/2)-1}{(($Htemp)-1}

I would like every {} inside the furthest-most {} to be (), and every commas to be kept.

Never mind, the solution was to add use_current=bool(True) in the end of if ind==0:.

No, I don’t mind. That is the point of this thread. :wink:

1 Like

May I submit this somewhat simpler alternative:

#! /bin/env python3

import re

input="crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}"

def replaceInnerBraces(s):
    tokens=re.split('([{}])',input) # Capturing group keeps the {} in the output
    level=0
    output=[]
    for t in tokens:
        if t=='{':
            t='(' if level!=0 else t
            level+=1
        elif t=='}':
            level-=1
            t=')' if level!=0 else t
        output.append(t)
    return ''.join(output)
    
print(input)
print(replaceInnerBraces(input))

Yields:

crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}
crop[$Index] {(($Wtemp)-($Htemp))/2},0,{($Htemp)+((($Wtemp)-($Htemp))/2)-1},{($Htemp)-1}
1 Like

@Ofnuts I must admit, I’m impressed that you know Regex.

This being said, I haven’t found a non-regex solution to this issue.

Here is a incomplete code because I can’t figure what to do next due to the fact that strings are immutable.

import re

search_string="({"

def remove_unneeded_parenthesis_curly(str_inp):
    n=0
    pos_set=[]
    while n!=-1:
        n=str_inp.find("({", n+1)
        if n!=-1:
            pos_set.append(n)
    pos_set.reverse()


str_inp="X_Kiss_A_Imprimer={$X+{$Valeur_A*{cos({pi/180*$Angle})}}}"
str_inp_2="Nouveau_point_X={$centre_origine_X+{{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}}-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}"

remove_unneeded_parenthesis_curly(str_inp_2)

My goal is to do this:

Case 1) ({something here}) ->  (something here).
Case 2) ({({something}{something})}) -> (({something}{something})).
Case 3) ({something}{something}) -> ({something}{something})

If you noticed, it seems that these two events happen:

  1. ({ would change into (
  2. }) would change into )

And these two events happens as long as they have corresponding curly brackets.


Someone else come to a closer solution:

result = re.sub(r'\({([^}]+)}\)', '(\\1)', k)

Here’s test result:

Regex string: ({something here})
        Regex result:   (something here)
        Desired output: (something here)

Regex string: ({({something}{something})})
        Regex result:   ({({something}{something})})
        Desired output: (({something}{something}))

Regex string: ({something}{something})
        Regex result:   ({something}{something})
        Desired output: ({something}{something})

Regex string: ({({something})})
        Regex result:   (({something)})
        Desired output: ((something))

TBH what I posted was my fourth iteration, I tried three other ways with regexes but they all had problems. The reason is that you have operators like +, -, and / at various levels, and that Python’s regex dialect cannot express balancing groups (unlike the true PCRE, or even .Net regexes).

1 Like

Not convinced that in your case a ({ is balanced by an opposite }). Looking at your expressions with this:

import re

input="Nouveau_point_X={$centre_origine_X+{{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}}-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}"

#input="{ab{cd}ef}"

def indent(line,level,s):
    print("%02d - %2d %s'%s'" % (line,level,"  "*level,s))

def dumpInnerBraces(s):
    tokens=re.split(r'([{}()])',input) # Capturing group keeps the {}() in the output
    level=0
    line=0
    for t in tokens:
        line+=1
        if t=='':
            line-=1
            continue
        elif t in '{(':
            indent(line,level,t)
            level+=1            
        elif t in '})':
            level-=1
            indent(line,level,t)
        else:
            indent(line,level,t)
            
dumpInnerBraces(input)

Which yields:

01 -  0 'Nouveau_point_X='
02 -  0 '{'
03 -  1   '$centre_origine_X+'
04 -  1   '{'
05 -  2     '{'
06 -  3       '{'
07 -  4         '$Rayon_1+$Rayon_2'
08 -  3       '}'
09 -  3       '*'
10 -  3       '{'
11 -  4         'cos'
12 -  4         '('
13 -  5           'pi/180*$theta'
14 -  4         ')'
15 -  3       '}'
16 -  2     '}'
17 -  2     '-'
18 -  2     '{'
19 -  3       '{'
20 -  4         '$Rayon_2+$Position_Stylo'
21 -  3       '}'
22 -  3       '*'
23 -  3       '{'
24 -  4         'cos'
25 -  4         '('
26 -  5           '{'
27 -  6             '{'
28 -  7               '$Rayon_1+$Rayon_2'
29 -  6             '}'
30 -  6             '/$Rayon_2'
31 -  5           '}'
32 -  5           '*'
33 -  5           '{'
34 -  6             'pi/180*$theta'
35 -  5           '}'
36 -  4         ')'
37 -  3       '}'
38 -  2     '}'
39 -  1   '}'
40 -  0 '}'

You can only simplify the ({ }) when there is only one block of {} inside the () (and nothing else).

1 Like

Some more.

  • Parse the input into nested paren/braces blocks (consume() method)
  • Remove duplicate nested blocks (squash() method)
  • print the result (__str__() method)
#! /bin/env python3

import re

# Had to ad one extra level to check my code. This formula cannot normally be simplified
# If my criterion (single "Block" child) is used

input="Nouveau_point_X=({{$centre_origine_X+({{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}})-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}})"

class Block(object):
    def __init__(self,begin,end):
        self.begin=begin
        self.end=end
        self.children=[]
    
    def consume(self,level,tokens,index):
        ends={'{':'}','(':')'}
        #print("Level: %d, Index: %s, type: '%s'" % (level,index,self.begin))
        while index<len(tokens):
            #print("Consuming token %d" % index)
            t=tokens[index]
            if t==self.end:
                index+=1
                break;
            elif t in '{(':
                child=Block(t,ends[t])
                index=child.consume(level+1,tokens,index+1)
                self.children.append(child)
                #print("Continuing from token %d" % index)
            else:
                self.children.append(t)
                index+=1
        #print("Returning from level %d, %d children" % (level,len(self.children)))
        return index

    def dump(self,level):
        indent="  "*level
        for child in self.children:
            if isinstance(child,Block):
                print("%s %s" % (indent,child.begin))
                child.dump(level+1)
                print("%s %s" % (indent,child.end))
            else:
                print("%s %s" % (indent,child))

    def squash(self):
        # Full condition
        #if self.begin=='(' and len(self.children)==1 and isinstance(self.children[0],Block) and self.children[0].begin=='{':
        
        # Simplified conditions (any two blocks)
        if len(self.children)==1 and isinstance(self.children[0],Block):
            self.children=self.children[0].children
        
        for child in self.children:
             if isinstance(child,Block):
                 child.squash()
    
    def __str__(self):
        return self.begin+''.join(str(child) for child in self.children)+self.end
            
def tokenize(s):
    # Capturing group keeps the {}() in the output
    return [s for s in re.split(r'([{}()])',s) if len(s) > 0] 

def parse(s):
    tokens=tokenize(s)
    parsed=Block('','')
    parsed.consume(0,tokens,0)    
    return parsed
    
print(input)
parsed=parse(input)
print(str(parsed))
parsed.squash()
print(str(parsed))
1 Like

Thanks. I haven’t gotten around to testing those, but it’s too late for that. One last thing though, I believe to remove unnecessary parenthesis is to follow along order of operation, and then parenthesis when not needed, right? Hell, even individual variables are wrapped around with () when that isn’t needed.

Here’s what I did, it’s so closer to being cleaner. There’s some earlier commits I did as well.

Should be the same. If you find a block in parens that only contains another block in parens then you can squash them. Going further is risky (or you need to rewrite a whole language interpreter).

1 Like

I agree it carry a lot of risks and this is someone else code in a community project. As for rewriting, not needed. My goal was to eliminate some bad coding practices within the file. Years of bad coding practices in the sense of stylistic error. I don’t expect to get rid of all of them, but fixing most of them is absolutely feasible.

How about constraining repetition case insensitively, removing extra words and preserving lines? E.g., only let words repeat twice max.

One fish, Two fish, Red fish, Blue fish,
Black fish, Blue fish, Old fish, New fish.
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.