Python questions

Reptorian · March 13, 2022, 3:44am

Gonna turn this thread into a generic Python help thread. Hope @afre don’t mind.

Here’s a code I’m working on:

def change_in_between_brackets(str_inp):
    current_str=[]
    temp_arr=[]
    ind=0
    use_current=bool(True)
    for char in str_inp:
        if use_current:
            if char!='{':
                current_str.append(char)
            else:
                use_current=bool(False)
                temp_arr.append(char)
                ind += 1
        else:
            if char=='{':
                ind += 1
                temp_arr.append(char)
            elif char=='}':
                ind -= 1
                if ind==0:
                    temp_arr.append(char)
                    new_string=""
                    for c in temp_arr:
                        new_string=new_string+c
                    new_string=new_string[1:-1]
                    new_string=new_string.replace('{','(')
                    new_string=new_string.replace('}',')')
                    new_string="{"+new_string+"}"
                    current_str.append(new_string)
                    temp_arr.clear()
                else:
                    temp_arr.append(char)
            else:
                temp_arr.append(char)
    print(''.join(map(str, current_str)))


str_inp="crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}"

change_in_between_brackets(str_inp)

# It is missing some commas it seem. That's not right.
# crop[$Index] {(($Wtemp)-($Htemp))/2}{0,(($Htemp)+((($Wtemp)-($Htemp))/2)-1}{(($Htemp)-1}

I would like every {} inside the furthest-most {} to be (), and every commas to be kept.

Never mind, the solution was to add use_current=bool(True) in the end of if ind==0:.

afre · March 13, 2022, 4:21am

No, I don’t mind. That is the point of this thread.

Ofnuts · March 13, 2022, 10:47am

May I submit this somewhat simpler alternative:

#! /bin/env python3

import re

input="crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}"

def replaceInnerBraces(s):
    tokens=re.split('([{}])',input) # Capturing group keeps the {} in the output
    level=0
    output=[]
    for t in tokens:
        if t=='{':
            t='(' if level!=0 else t
            level+=1
        elif t=='}':
            level-=1
            t=')' if level!=0 else t
        output.append(t)
    return ''.join(output)
    
print(input)
print(replaceInnerBraces(input))

Yields:

crop[$Index] {{{$Wtemp}-{$Htemp}}/2},0,{{$Htemp}+{{{$Wtemp}-{$Htemp}}/2}-1},{{$Htemp}-1}
crop[$Index] {(($Wtemp)-($Htemp))/2},0,{($Htemp)+((($Wtemp)-($Htemp))/2)-1},{($Htemp)-1}

Reptorian · March 14, 2022, 12:54am

@Ofnuts I must admit, I’m impressed that you know Regex.

This being said, I haven’t found a non-regex solution to this issue.

Here is a incomplete code because I can’t figure what to do next due to the fact that strings are immutable.

import re

search_string="({"

def remove_unneeded_parenthesis_curly(str_inp):
    n=0
    pos_set=[]
    while n!=-1:
        n=str_inp.find("({", n+1)
        if n!=-1:
            pos_set.append(n)
    pos_set.reverse()


str_inp="X_Kiss_A_Imprimer={$X+{$Valeur_A*{cos({pi/180*$Angle})}}}"
str_inp_2="Nouveau_point_X={$centre_origine_X+{{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}}-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}"

remove_unneeded_parenthesis_curly(str_inp_2)

My goal is to do this:

Case 1) ({something here}) ->  (something here).
Case 2) ({({something}{something})}) -> (({something}{something})).
Case 3) ({something}{something}) -> ({something}{something})

If you noticed, it seems that these two events happen:

({ would change into (
}) would change into )

And these two events happens as long as they have corresponding curly brackets.

Someone else come to a closer solution:

result = re.sub(r'\({([^}]+)}\)', '(\\1)', k)

Here’s test result:

Regex string: ({something here})
        Regex result:   (something here)
        Desired output: (something here)

Regex string: ({({something}{something})})
        Regex result:   ({({something}{something})})
        Desired output: (({something}{something}))

Regex string: ({something}{something})
        Regex result:   ({something}{something})
        Desired output: ({something}{something})

Regex string: ({({something})})
        Regex result:   (({something)})
        Desired output: ((something))

Ofnuts · March 14, 2022, 8:11am

TBH what I posted was my fourth iteration, I tried three other ways with regexes but they all had problems. The reason is that you have operators like +, -, and / at various levels, and that Python’s regex dialect cannot express balancing groups (unlike the true PCRE, or even .Net regexes).

Ofnuts · March 14, 2022, 10:19am

Not convinced that in your case a ({ is balanced by an opposite }). Looking at your expressions with this:

import re

input="Nouveau_point_X={$centre_origine_X+{{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}}-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}"

#input="{ab{cd}ef}"

def indent(line,level,s):
    print("%02d - %2d %s'%s'" % (line,level,"  "*level,s))

def dumpInnerBraces(s):
    tokens=re.split(r'([{}()])',input) # Capturing group keeps the {}() in the output
    level=0
    line=0
    for t in tokens:
        line+=1
        if t=='':
            line-=1
            continue
        elif t in '{(':
            indent(line,level,t)
            level+=1            
        elif t in '})':
            level-=1
            indent(line,level,t)
        else:
            indent(line,level,t)
            
dumpInnerBraces(input)

Which yields:

01 -  0 'Nouveau_point_X='
02 -  0 '{'
03 -  1   '$centre_origine_X+'
04 -  1   '{'
05 -  2     '{'
06 -  3       '{'
07 -  4         '$Rayon_1+$Rayon_2'
08 -  3       '}'
09 -  3       '*'
10 -  3       '{'
11 -  4         'cos'
12 -  4         '('
13 -  5           'pi/180*$theta'
14 -  4         ')'
15 -  3       '}'
16 -  2     '}'
17 -  2     '-'
18 -  2     '{'
19 -  3       '{'
20 -  4         '$Rayon_2+$Position_Stylo'
21 -  3       '}'
22 -  3       '*'
23 -  3       '{'
24 -  4         'cos'
25 -  4         '('
26 -  5           '{'
27 -  6             '{'
28 -  7               '$Rayon_1+$Rayon_2'
29 -  6             '}'
30 -  6             '/$Rayon_2'
31 -  5           '}'
32 -  5           '*'
33 -  5           '{'
34 -  6             'pi/180*$theta'
35 -  5           '}'
36 -  4         ')'
37 -  3       '}'
38 -  2     '}'
39 -  1   '}'
40 -  0 '}'

You can only simplify the ({ }) when there is only one block of {} inside the () (and nothing else).

Ofnuts · March 15, 2022, 8:38am

Some more.

Parse the input into nested paren/braces blocks (consume() method)
Remove duplicate nested blocks (squash() method)
print the result (__str__() method)

#! /bin/env python3

import re

# Had to ad one extra level to check my code. This formula cannot normally be simplified
# If my criterion (single "Block" child) is used

input="Nouveau_point_X=({{$centre_origine_X+({{{$Rayon_1+$Rayon_2}*{cos(pi/180*$theta)}})-{{$Rayon_2+$Position_Stylo}*{cos({{$Rayon_1+$Rayon_2}/$Rayon_2}*{pi/180*$theta})}}}}})"

class Block(object):
    def __init__(self,begin,end):
        self.begin=begin
        self.end=end
        self.children=[]
    
    def consume(self,level,tokens,index):
        ends={'{':'}','(':')'}
        #print("Level: %d, Index: %s, type: '%s'" % (level,index,self.begin))
        while index<len(tokens):
            #print("Consuming token %d" % index)
            t=tokens[index]
            if t==self.end:
                index+=1
                break;
            elif t in '{(':
                child=Block(t,ends[t])
                index=child.consume(level+1,tokens,index+1)
                self.children.append(child)
                #print("Continuing from token %d" % index)
            else:
                self.children.append(t)
                index+=1
        #print("Returning from level %d, %d children" % (level,len(self.children)))
        return index

    def dump(self,level):
        indent="  "*level
        for child in self.children:
            if isinstance(child,Block):
                print("%s %s" % (indent,child.begin))
                child.dump(level+1)
                print("%s %s" % (indent,child.end))
            else:
                print("%s %s" % (indent,child))

    def squash(self):
        # Full condition
        #if self.begin=='(' and len(self.children)==1 and isinstance(self.children[0],Block) and self.children[0].begin=='{':
        
        # Simplified conditions (any two blocks)
        if len(self.children)==1 and isinstance(self.children[0],Block):
            self.children=self.children[0].children
        
        for child in self.children:
             if isinstance(child,Block):
                 child.squash()
    
    def __str__(self):
        return self.begin+''.join(str(child) for child in self.children)+self.end
            
def tokenize(s):
    # Capturing group keeps the {}() in the output
    return [s for s in re.split(r'([{}()])',s) if len(s) > 0] 

def parse(s):
    tokens=tokenize(s)
    parsed=Block('','')
    parsed.consume(0,tokens,0)    
    return parsed
    
print(input)
parsed=parse(input)
print(str(parsed))
parsed.squash()
print(str(parsed))

Reptorian · March 16, 2022, 4:02am

Thanks. I haven’t gotten around to testing those, but it’s too late for that. One last thing though, I believe to remove unnecessary parenthesis is to follow along order of operation, and then parenthesis when not needed, right? Hell, even individual variables are wrapped around with () when that isn’t needed.

Here’s what I did, it’s so closer to being cleaner. There’s some earlier commits I did as well.

Ofnuts · March 16, 2022, 7:32am

Should be the same. If you find a block in parens that only contains another block in parens then you can squash them. Going further is risky (or you need to rewrite a whole language interpreter).

Reptorian · March 16, 2022, 12:34pm

I agree it carry a lot of risks and this is someone else code in a community project. As for rewriting, not needed. My goal was to eliminate some bad coding practices within the file. Years of bad coding practices in the sense of stylistic error. I don’t expect to get rid of all of them, but fixing most of them is absolutely feasible.

afre · March 23, 2022, 12:51am

How about constraining repetition case insensitively, removing extra words and preserving lines? E.g., only let words repeat twice max.

One fish, Two fish, Red fish, Blue fish,
Black fish, Blue fish, Old fish, New fish.
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.

Ofnuts · March 23, 2022, 11:30am

Shooting from the hip:

input=[
"One fish, Two Fish, Red fish, Blue Fish, this is a fishy story",
"Black fish, Blue fish, Old Fish, New fish, fishing fishes out in the sea",
"This one has a littlecar.",
"This one has a little star.",
"Say! What a lot of fish there are.",
"Yes. Some are red, and some are blue.",
"Some are old and some are new.",
"Some are sad, and some are glad,",
"And some are very, very bad.",
]

for l in input:
    count=-1
    replaced=l
    while count!=0:
        replaced,count=re.subn(r'(\b\w+?\b)(.+)(\b\1\b)(.+)(\b\1\b)(.*)',r'\1\2\3\4***\6',replaced,flags=re.IGNORECASE)
    print(replaced)

Yields:

One fish, Two Fish, Red ***, Blue ***, this is a fishy story
Black fish, Blue fish, Old ***, New ***, fishing fishes out in the sea
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.

You can of course omit the ***.

afre · March 23, 2022, 7:13pm

The *** is a nice addition. How about preserved lines but repetition denied for input as a whole? I.e., we remove all the fish except for the first two of the first line. I am asking lots about lines because most online help considers single line or individual bag of words interpretation.

Reptorian · March 23, 2022, 8:12pm

You could join them, and apply the re or regex function.

Reptorian · March 24, 2022, 12:16am

@afre, I have a question for you. What is your end-goal with Python? In my case, I use it to automate G’MIC scripting, and it is mostly completed as a goal itself.

afre · March 24, 2022, 12:20am

Just getting a sense of it. That is all. Seems like there are many ways to do the same thing. What got me going was the lack of tools for text in G’MIC.

Reptorian · March 24, 2022, 3:23am

@afre Regex is another language. It’ll take a while for one to explain. Better yet, try to learn it if you want to do this. I want to learn Regex, but already have too much on my hand.

afre · March 24, 2022, 3:25am

I only know very, very simple regex, usually relying on online sources to solve my problems. The issue is though that there are different forms of regex, which aren’t compatible.

Reptorian · March 24, 2022, 3:30am

Here’s a manual for regex. Doesn’t help a lot in deciphering that much, but anyways, it’s a start. Only practice helps. Lots of it.

afre · March 24, 2022, 4:33am

I did it, using my previous strategy. It didn’t work at first because of a typo.

Input:

import re

text = '''One fish, Two fish, Red fish, Blue fish,
Black fish, Blue fish, Old fish, New fish.
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.
'''

l = '::'.join([line for line in text.splitlines()])

count=-1
replaced=l
while count!=0:
  replaced,count=re.subn(r'(\b\w+?\b)(.+)(\b\1\b)(.+)(\b\1\b)(.*)',r'\1\2\3\4***\6',replaced,flags=re.IGNORECASE)

l = replaced.replace('::','\n')

Result:

One fish, Two fish, Red ***, Blue ***,
Black ***, Blue ***, Old ***, New ***.
This one has a littlecar.
This *** has a little star.
Say! What *** lot of *** there are.
Yes. Some are red, and some *** ***.
*** *** old and *** *** new.
*** *** sad, *** *** *** glad,
*** *** *** very, very bad.