Python questions

Should be the same. If you find a block in parens that only contains another block in parens then you can squash them. Going further is risky (or you need to rewrite a whole language interpreter).

1 Like

I agree it carry a lot of risks and this is someone else code in a community project. As for rewriting, not needed. My goal was to eliminate some bad coding practices within the file. Years of bad coding practices in the sense of stylistic error. I don’t expect to get rid of all of them, but fixing most of them is absolutely feasible.

How about constraining repetition case insensitively, removing extra words and preserving lines? E.g., only let words repeat twice max.

One fish, Two fish, Red fish, Blue fish,
Black fish, Blue fish, Old fish, New fish.
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.

Shooting from the hip:

input=[
"One fish, Two Fish, Red fish, Blue Fish, this is a fishy story",
"Black fish, Blue fish, Old Fish, New fish, fishing fishes out in the sea",
"This one has a littlecar.",
"This one has a little star.",
"Say! What a lot of fish there are.",
"Yes. Some are red, and some are blue.",
"Some are old and some are new.",
"Some are sad, and some are glad,",
"And some are very, very bad.",
]

for l in input:
    count=-1
    replaced=l
    while count!=0:
        replaced,count=re.subn(r'(\b\w+?\b)(.+)(\b\1\b)(.+)(\b\1\b)(.*)',r'\1\2\3\4***\6',replaced,flags=re.IGNORECASE)
    print(replaced)

Yields:

One fish, Two Fish, Red ***, Blue ***, this is a fishy story
Black fish, Blue fish, Old ***, New ***, fishing fishes out in the sea
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.

You can of course omit the ***.

1 Like

The *** is a nice addition. How about preserved lines but repetition denied for input as a whole? I.e., we remove all the fish except for the first two of the first line. I am asking lots about lines because most online help considers single line or individual bag of words interpretation.

You could join them, and apply the re or regex function.

@afre, I have a question for you. What is your end-goal with Python? In my case, I use it to automate G’MIC scripting, and it is mostly completed as a goal itself.

Just getting a sense of it. That is all. Seems like there are many ways to do the same thing. What got me going was the lack of tools for text in G’MIC.

@afre Regex is another language. It’ll take a while for one to explain. Better yet, try to learn it if you want to do this. I want to learn Regex, but already have too much on my hand.

I only know very, very simple regex, usually relying on online sources to solve my problems. The issue is though that there are different forms of regex, which aren’t compatible.

Here’s a manual for regex. Doesn’t help a lot in deciphering that much, but anyways, it’s a start. Only practice helps. Lots of it.

I did it, using my previous strategy. It didn’t work at first because of a typo.

Input:

import re

text = '''One fish, Two fish, Red fish, Blue fish,
Black fish, Blue fish, Old fish, New fish.
This one has a littlecar.
This one has a little star.
Say! What a lot of fish there are.
Yes. Some are red, and some are blue.
Some are old and some are new.
Some are sad, and some are glad,
And some are very, very bad.
'''

l = '::'.join([line for line in text.splitlines()])

count=-1
replaced=l
while count!=0:
  replaced,count=re.subn(r'(\b\w+?\b)(.+)(\b\1\b)(.+)(\b\1\b)(.*)',r'\1\2\3\4***\6',replaced,flags=re.IGNORECASE)

l = replaced.replace('::','\n')

Result:

One fish, Two fish, Red ***, Blue ***,
Black ***, Blue ***, Old ***, New ***.
This one has a littlecar.
This *** has a little star.
Say! What *** lot of *** there are.
Yes. Some are red, and some *** ***.
*** *** old and *** *** new.
*** *** sad, *** *** *** glad,
*** *** *** very, very bad.

Take the “Owl book”. The first edition did have a chapter on Python, which was removed in later editions. But you don’t care, because these are small implementation/usage details, and the book is very good at explaining the basics (and also some not so basic stuff).

This is a very useful skill, and I have several times replaced several dozens of lines of bad code with a Regex (in Python, Java, Bash, C++…).

1 Like

Any idea how I can convert this to c++ code?

def xelf_factors(n):
    pf = primefactors(n)
    af = { reduce(mul,x) for z in range(1,len(pf)) for x in combinations(pf,z) }
    return sorted({1,n}|af)

def primefactors(n):
    factors,d = [],2
    while n > 1:
        while n%d==0:
            factors.append(d)
            n//=d
        d+=1
    return factors

That is the fastest way to find factors of number.

My C++ is rusty, and this is about Python.

If instead of keeping the factors as a plain sequence you create a (factor,power) sequence, then the output of xelf_factors is just a matter of iterating all factors:

def primespowers(n):
    primesWithPower=[]
    d=2
    while n > 1:
        power=0
        while n%d==0:
            n//=d
            power+=1
        if power!=0:
            primesWithPower.append((d,power))
        d+=1
    return primesWithPower

def products(primesWithPower):
    if len(primesWithPower)==0:
        return [1]
    result=[]
    divisor,maxpower=primesWithPower[0]
    others=products(primesWithPower[1:])
    for pow in range(0,maxpower+1):
        result.extend([x*(divisor**pow) for x in others])
    return result
    
def divisors(n):
    primesWithPower=primespowers(n)
    return products(primesWithPower)

You can recurse without fear, if your max integer is 2^64-1, you cannot have more that 63 divisors (this also means that you can allocate an array of 63 factors at the beginning).

In this form the conversion to C or C++ is a lot more straightforward.

Well, I found the closest to manual conversion:

 all possible combinations of those prime factors.

Here's a implementation without much optimization using uint64_t instead of multiprecision that completes within 305 ms for input 10,000,000,000,000,000 on my machine.

Note that the preformance will get significantly worse for a larger number of distinct prime factors. (12132 ms for the product of the smallest 14 primes). This is caused by the fact that there are just more combinations to calculate/print.

#include <chrono>
#include <iostream>
#include <utility>
#include <vector>

using PrimeFactors = std::vector<std::pair<uint64_t, uint64_t>>;

std::vector<std::pair<uint64_t, uint64_t>> FindFactors(uint64_t n)
{
    PrimeFactors primeFactors;

    uint64_t square = static_cast<uint64_t>(std::sqrt(n));
    for (uint64_t i = 2; i <= square && i <= n; ++i)
    {
        bool isPrime = true;
        for (auto [prime, exponent] : primeFactors)
        {
            if (prime * prime > i)
            {
                break;
            }
            if (i % prime == 0u)
            {
                isPrime = false;
                break;
            }
        }

        if (isPrime)
        {
            uint64_t count = 0;
            while (n % i == 0)
            {
                ++count;
                n /= i;
            }
            primeFactors.emplace_back(i, count);
            if (count != 0)
            {
                square = static_cast<uint64_t>(std::sqrt(n));
            }
        }
    }
    if (n != 1)
    {
        primeFactors.emplace_back(n, 1);
    }
    return primeFactors;
}

void PrintFactors(uint64_t factor, PrimeFactors::const_iterator pos, PrimeFactors::const_iterator const end)
{
    while (pos != end)
    {
        while (pos != end && pos->second == 0)
        {
            ++pos;
        }
        auto newFactor = factor;
        for (auto count = pos->second; count != 0; --count)
        {
            newFactor *= pos->first;
            std::cout << newFactor << '\n';
            PrintFactors(newFactor, pos + 1, end);
        }
        ++pos;
    }
}

int main()
{
    using Clock = std::chrono::steady_clock;

    uint64_t const input = 10'000'000'000'000'000ull;
    //uint64_t const input = 2ull * 3ull * 5ull * 7ull *11ull * 13ull *17ull * 19ull * 23ull * 29ull *31ull*37ull * 41ull*43ull;

    auto start = Clock::now();
    auto factors = FindFactors(input);

    // print
    std::cout << 1 << '\n';
    PrintFactors(1, factors.begin(), factors.end());
    auto end = Clock::now();
    std::cout << "took " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms\n";
}

My main goal is to convert the fastest solution to G’MIC. Will try yours too.

LOL - I will add a C++ tag to this thread.

To answer @Reptorian’s previous question of why I am interested in Python (again; I used Python 2 for some image processing a long time ago but forgot most of it): I enjoy the Jupyter notebook concept where I can sort of see the code in action.

I would love to see some of those work, but not on this thread.

Ok, looks like I will improve upon the current python script I use for incrementing number.

Here’s the code:

import re
test_string="variable_a=$7 variable_a=$18 variable_b,variable_8=${19-20} variable_d=$21 blur 10 ${10=}"
y = re.split(r'(?:\${?)(\d+)\-*(?:(\d+)(?:\}))*', test_string)

print(y)

# ['variable_a=', '7', None, ' variable_a=', '18', None, ' variable_b,variable_c=', '19', '20', ' variable_d=', '21', None, ' blur 10']
# Note that '$'s are gone, and ${ - }s are gone as well. 19 and 20 is supppose to be in the form of ${19-20}. 7 is suppose to be in the form of $7.

The problem is in the comment. The goal is to change all numbers that match the specified regex case as long as they are greater than a number, and the change would be adding by a number (subtraction too).

Also, I will be keeping these two regex code here for use later:

# These will be used for verifying a string is in this form, so I can extract number and add numbers.
\$(\d+)
\$\{\d+\-+\d+\}

I don’t know if these two regex cases are needed.

Do you want the result as a list or do you want the initial string with the numbers updated?