Keyword challenge

Cover letter and resume writing has been boring. I have been distracting myself with similarity algorithms. I wonder what the G’MIC community could do to help in this area. :slight_smile: It does not necessarily have to be G’MIC: I am just more comfortable with it. :wink:

Basic method:

- Separate text into words or phrases (n-grams).
- Calculate similarity between texts (input vs reference).

Considerations:

- Ignore words of low semantic value (stop words).
- Consider words with a common root the same or similar (lemma).
- Calculate actual use of reference or related words in input text

I would use Python for this. I know that the syntax is not your thing. G’MIC is terrible for working with existing texts.

Yes, you are quite right about Python.

So far, I separated my resume into chunks. With Python code from GitHub, I scored them against the job posting. However, it is not useful at this point due to the considerations listed in the OP. Rather than making compelling applications, I am wasting my time battling this silly code. :roll_eyes:

I wouldn’t say that. There are plenty of G’MIC code, even in the stdlib that proves G’MIC is able to manage text files (parsing and generation) quite well. There is even a Markdown interpreter written in G’MIC !
But, yes, the language has not been designed for text manipulation, so it may take time to learn about how to use it correctly for that purpose.

Hence, my foolhardy association with G’MIC. For example,

gmic dude
# *** Error *** Unknown command or filename 'dude'; did you mean 'done'?

Could I rejigger this to scan text for the word dude but also account for done because it is similar? They do not have a common root (semantic meaning) but it is a start.

PS - lemmatisation is what I would like to do.

That doesn’t seem easy to do in G’MIC.

I’ll just refer you to a Python example - Python | Lemmatization with NLTK - GeeksforGeeks

However, some of things you want to do in the list is doable. Breaking texts is something I will attempt.

rep_attempt_2:
('"I want to do this in GMIC. Separate words in gmic"')

nm. words

number_of_words=0

repeat w#-1,n
 if !$n
  ({i(#"$words","$n",0,0)})
 else
  if i(#"$words","$n")!=32
   ({i(#"$words","$n",0,0)}) a[-2,-1] x
   if i(#-1,0)==32
    r. {w#-1-1},1,1,1,0,0,1
   fi
  else
   number_of_words+=1
   (32)
  fi
 fi
done

rm[words]

start_point={$!-($number_of_words+1)}

repeat $number_of_words+1 
 l[$start_point]
  echo {t}
 endl start_point+=1 
done

The above script separate words and call them out.

Result:

C:\Windows\System32>gmic rep_attempt_2
[gmic]-0./ Start G'MIC interpreter.
[gmic]-1./rep_attempt_2/*repeat/*local/ I
[gmic]-1./rep_attempt_2/*repeat/*local/ want
[gmic]-1./rep_attempt_2/*repeat/*local/ to
[gmic]-1./rep_attempt_2/*repeat/*local/ do
[gmic]-1./rep_attempt_2/*repeat/*local/ this
[gmic]-1./rep_attempt_2/*repeat/*local/ in
[gmic]-1./rep_attempt_2/*repeat/*local/ GMIC.
[gmic]-1./rep_attempt_2/*repeat/*local/ Separate
[gmic]-1./rep_attempt_2/*repeat/*local/ words
[gmic]-1./rep_attempt_2/*repeat/*local/ in
[gmic]-1./rep_attempt_2/*repeat/*local/ gmic

For similarity between text. A simple loop comparing two images of text will do.

Text from my previous post. It is simple, so not much for hacked Python code to reduce.

Could I rejigger this to scan text for the word dude but also account for done because it is similar? They do not have a common root (semantic meaning) but it is a start.

Lemmatisation - could and rejigger could become can and rejig. Other than that, a good result.

could I rejigger this to scan text for the word dude but also account for do because it be similar ? they do not have a common root ( semantic meaning ) but it be a start .

Keywords - Remove stop words, keep nouns and find ngrams with semantic meaning in tact. That last part is important because we do not want nonsense words or phrases.

start, meaning, word, root, text, dude, semantic meaning

PS meaning is listed twice. The first one is not necessary since it is already part of semantic meaning. If meaning were by itself somewhere else, then it would have been appropriate.

You can scan ‘dude’ in text.

('"$1"')
pos_dude:=find(crop(#-1),'dude',0,1)
echo $pos_dude
rm.

As far as similarity go, you can create a image, and invoke the use of crop.