Python questions

afre · March 3, 2022, 8:58pm

Me:

I have lines of text. I want to remove duplicate words per line. Help. Thanks!

My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.

Ofnuts · March 3, 2022, 9:12pm

Example?

Reptorian · March 3, 2022, 9:15pm

I’m gonna try to solve this problem. Is this for G’MIC scripting? I have some python files that can help.

Also, I found your solution via googling since I’m bad at python:

def unique_list(text_str):
    l = text_str.split()
    temp = []
    for x in l:
        if x not in temp:
            temp.append(x)
    return ' '.join(temp)

lines_of_text="""My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines=lines_of_text.splitlines()
new_lines=[]
for line_index in lines:
    print(unique_list(line_index))

This gives:

My mind is melting, you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.

afre · March 3, 2022, 10:03pm

Sure, this could eventually be in G’MIC. How about making it case insensitive? I would use lower() but that would affect the sentence case.

kofa · March 3, 2022, 10:14pm

The simple solution: you could have two arrays, one lower case, the other unchanged, and use pretty much the same algorithm, using the lower-cased version for the search.

Reptorian · March 3, 2022, 10:14pm

Case insensitive solution here.

def unique_list(text_str):
    lower_case=text_str.lower()
    lower_case=lower_case.split()
    l = text_str.split()
    temp = []
    new_lines=[]
    for n in range(len(l)):
        x = lower_case[n]
        t = l[n]
        if x not in temp:
            temp.append(x)
            new_lines.append(t)
    return ' '.join(new_lines)

lines_of_text="""My mind is melting, Mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines=lines_of_text.splitlines()
new_lines=[]
for line_index in lines:
    print(unique_list(line_index))

Output:

My mind is melting, you I am not good at remembering Python.
Python 3 would be nice. Not a real please.

Ofnuts · March 3, 2022, 10:36pm

If you keep two lists, better make temp a set, the performance will be better.

Just for fun, the nearly one-liner (no support for case):

from functools import reduce

lines_of_text="""My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

print(" ".join(reduce(lambda ul,i: ul if i in ul else ul+[i], lines_of_text.split(),[])))

ilmioalias · March 3, 2022, 11:36pm

another one liner without support for case:

l = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

print("\n".join([" ".join(dict.fromkeys(row.split())) for row in l.splitlines()]))

afre · March 4, 2022, 1:48am

Thanks everyone and @ilmioalias for becoming a member.

afre · March 4, 2022, 5:15pm

How about removing certain words or phrases? E.g., mind you, a, please.

My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.

Feel free to write an isolated example and then a one-liner to combine this with the previous task.

Reptorian · March 4, 2022, 7:20pm

This doesn’t work, but I think @ofnuts or @ilmioalias can fix it.

init_lines_of_text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines_of_text=init_lines_of_text.splitlines()
phrases=['mind you','a','please']
n=int(0)

for phrase in phrases:
    t=phrases[n]
    if t[-1]!=' ':
        phrases[n]=t+' '
    n+=1
    
for line in lines_of_text:
    for n in range(len(phrases)):
        line=line.replace(phrases[n],'')
    print(line)

Ofnuts · March 5, 2022, 12:47am

replace() returns a new string, it is not an in-place replacement.

Reptorian · March 5, 2022, 1:06am

@afre Consider your problem solved. Thanks to @Ofnuts

afre · March 5, 2022, 5:48am

Try this. I added stop words from NLTK and it does something funky. Might have something to do with the mix of " and ' quotes.

Output

Mminmelting, minI gooremembering Python.
Pyth3 woulnice. Noreal pythplease.

Input

init_lines_of_text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines_of_text=init_lines_of_text.splitlines()
phrases=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
n=int(0)

for phrase in phrases:
    t=phrases[n]
    if t[-1]!=' ':
        phrases[n]=t+' '
    n+=1

for line in lines_of_text:
    for n in range(len(phrases)):
        line=line.replace(phrases[n],'')
    print(line)

Reptorian · March 5, 2022, 5:50am

With all of those phrases, that sound like a hard problem.

Edit:

Found out the solution.

I think sorting the phrases by the length of the string would solve the problem.

Reptorian · March 6, 2022, 3:31am

Err, I tested it with QPython 3L on Android. My theory unfortunately did not held correctly.

Here’s code:

init_lines_of_text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

lines_of_text=init_lines_of_text.splitlines()
phrases=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
phrases=sorted(phrases,key=len)
phrases=phrases[::-1]
print(phrases)

n=int(0)

for phrase in phrases:
    t=phrases[n]
    if t[-1]!=' ':
        phrases[n]=t+' '
    n+=1

for line in lines_of_text:
    for n in range(len(phrases)):
        line=line.replace(phrases[n],'')
    print(line)

afre · March 6, 2022, 4:23am

My uninformed solution is as follows:

stop = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

text = '''My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.'''

l = '\\n'.join([line for line in text.splitlines()])
l = ' '.join([word for word in l.split() if word not in stop])
l = l.replace('\\n','\n')

print(l)

Ofnuts · March 6, 2022, 9:47am

regex

#! /bin/env python3 

import re

text = """My mind is melting, mind you I am not good at remembering Python. Python 3 would be nice. Not a real python please."""

# Can likely be simplified... and maye some of these can be replaced by regexes as well
words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

# Sort the words longest first (technically order is by "contains()"
# But sorting on length also ensures this and is faster) 
# words.sort(key=len,reverse=True) # start with longest first
total=0
for word in sorted(words,key=len,reverse=True):
    # bracket the word between word boundaries markers
    # use re.sub() instead of str::replace because we can use IGNORECASE
    # while we are at it, use subn() instead of sub for statistics
    text,count=re.subn(r'\b'+word+r'\b','',text,flags=re.IGNORECASE)
    total+=count
text=re.sub('  +',' ',text) # cleanup (2 or more spaces to a single)
print(text)
print("---")
print(f'{total:3d} replacements made')

yields:

mind melting, mind good remembering Python. Python 3 would nice. real python please.

afre · March 6, 2022, 3:31pm

Ha ha, this self-talk is great:

My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please.

turned into

Mminmelting, minI gooremembering Python.
Pyth3 woulnice. Noreal pythplease.

has become

mind melting, mind good remembering Python. Python 3 would nice. real python please.

Still not perfect as the newline disappeared and there is a space at the beginning.

Reptorian · March 6, 2022, 10:24pm

Try this afre:

import re

text = """My mind is melting, mind you I am not good at remembering Python.
Python 3 would be nice. Not a real python please."""

# Can likely be simplified... and maye some of these can be replaced by regexes as well
words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]


total=0

lines=text.splitlines()


for line in lines:
    # Sort the words longest first (technically order is by "contains()"
    # But sorting on length also ensures this and is faster) 
    # words.sort(key=len,reverse=True) # start with longest first
    for word in sorted(words,key=len,reverse=True):
        # bracket the word between word boundaries markers
        # use re.sub() instead of str::replace because we can use IGNORECASE
        # while we are at it, use subn() instead of sub for statistics
        line,count=re.subn(r'\b'+word+r'\b','',line,flags=re.IGNORECASE)
        total+=count
    line=re.sub('  +',' ',line) # cleanup (2 or more spaces to a single)
    line=line.strip()
    print(line)
print("---")
print(f'{total:3d} replacements made')

Output:

mind melting, mind good remembering Python.
Python 3 would nice. real python please.
---
 10 replacements made

If you need to capitalize, use this:

print(line.capitalize())

Output:

Mind melting, mind good remembering python.
Python 3 would nice. real python please.
---
 10 replacements made

Edit: If you need capitalization after ". ". That can be done too.