One direction linear blur?

It is a lowercase L: for length?

Oh, I did not see that. I think I might be able to find a theoretical solution without modifying gaussian now coming to think of it. Probably won’t look good as a gaussian version of it with smooth interpolation between stretched and non-stretched pixel. I can use the distance away from center to determine value between 1 and 0. Then use convolve_fft. At center it is 1, and when the distance threshold is reached, it is 0.

From the image it looks like smears in different directions, seeding from bright loci. The brighter they are the later the set of directional smear is drawn. The interesting part is that in the goal image, the smears aren’t just single lines in every direction. They are parallel sets in 5 directions.

I don’t know if this observation helps in your planning. Hope it does.

Compare with this image

The value are almost right, however the 2 direction motion blur is what makes it look off. Hence why I compared motion blur with splinter with 1 splinter.

I think I just solved it. Using the code as blur_linear code as base. I may be able to do splinter plugin for PDN.

Behold, one direction linear blur.

Good: half kernel does work. Don’t know why I didn’t keep that snippet. Oh yes, that data loss event. :blush: Ahem, anyway, I would like to see how you modified the stdlib code because I didn’t go that route myself. Bet it is much more efficient because it is based on David’s code.

Well, here’s the code

#@cli rep_splinter_blur: _length,_thickness,_duplicates,_angle,_sharpen_multiplier,-1<=_balance<=1,_boundary={ 0=None | 1=Neumann | 2=Periodic | 3=Mirror },_bisided={ 0=one-line | two-line }
rep_splinter_blur:
skip ${1=10%},${2=5%},${3=5},${4=0},${5=0},${6=0},${7=1},${8=0}

start_ang={$4}
angs_per_dups={360/$3}

m "average_output: ti=$! add / $ti"
if $6==-1 m "output_splinter : min"
elif $6<0&&$6>-1 m "output_splinter : +average_output +min[^-1] f. lerp(i#-2,i,abs($6)) k."
elif $6==0 m "output_splinter : average_output"
elif $6>0&&$6<1 m "output_splinter : +average_output +max[^-1] f. lerp(i#-2,i,$6) k."
elif $6==1 m "output_splinter : max"
else error (-1<=\$\6<=1)=F
fi



repeat $! l[$>]
 hypo={(sqrt(w^2+h^2)/2)}
 if ${is_percent\ $1} length={round($1*$hypo)} 
 else length={round($1)} 
 fi
 
 if ${is_percent\ $2} thickness={round($2*$hypo)} 
 else thickness={round($2)} 
 fi
 
 ds={d},{s}
 
 if $7!=2 expand_xy $length,$7 fi
 repeat $3
  ang={$start_ang+$angs_per_dups*$>}
  +rep_splinter_blur_convolve_map. $length,$thickness,$ang,$5,$8
  +convolve_fft[0] [-1]
  rm..
 done
 rm[0]
 output_splinter
 if $7!=2 shrink_xy $length fi
endl done

um average_output,output_splinter,splinter_convolve
#@cli rep_splinter_blur_convolve_map: _length,_thickness,_angle,_sharpen_multiplier,_bisided={ 0=one-line | two-line }
#@cli : Create a convolve map for directional blur. This enables one to create a convolve map for one-direction motion blur.
#@cli : Default values: '_length=10%','_thickness=5%','_angle=0','_bisided=1'
rep_splinter_blur_convolve_map : skip ${1=10%},${2=5%},${3=0},${5=1}
repeat $! l[$>]
 hypo={(sqrt(w^2+h^2)/2)}
 if ${is_percent\ $1} length={round($1*$hypo)} else length={round($1)} fi
 if ${is_percent\ $2} thickness={round($2*$hypo)} else thickness={round($2)} fi
 ds={d},{s} 
 rm
 {$length},{$length},1,1
 f "begin(
  const sides=$5;
  const thickness="$thickness";
  const hw=(w-1)/2;
  const hh=(h-1)/2;
  const ang=($3/180)*pi*-1;
  const cos_ang=cos(ang);
  const sin_ang=sin(ang);
  rot_x(a,b)=a*cos_ang-b*sin_ang;
  rot_y(a,b)=a*sin_ang+b*cos_ang;
  cutval(v)=v<0?0:v;
  maxcutval(v)=v>1?1:v;
 );
 xx=x/w-.5;
 yy=y/h-.5;
 lx=x-hw;
 ly=y-hh;
 radial_grad=1-sqrt(xx^2+yy^2)*2;
 radial_grad=cutval(radial_grad);
 line=1-maxcutval(abs(rot_x(lx,ly))/thickness);
 sides?(line?radial_grad*line):(rot_y(lx,ly)<=0?(line?radial_grad*line));
 "
 / {is}
 if $4
  avgstat={ia}
  +f (i*2-$avgstat)
  f.. lerp(i,i#1,min(1,$4))
  k[0]
 fi
 r 100%,100%,$ds,0,1
endl done

It looks like this.

The second one is more convincing, but targeting the dark regions.

Yep, now all I have to do is to rearrange variables until it is comfortable for cli users. Thickness to 1 looks the best.

The efficiency comes from the fact it uses convolve_fft to convolve the image. It’s always faster to use this when convolution kernels are large (convolution is just a complex multiplication in Fourier space).

When to use convolve_fft vs regular convolve?

I’d say that for masks larger than 9x9 or 11x11, convolve_fft is probably faster.
Anyway, dealing with FFT has the property to consider periodic boundary conditions, which may not be desired. It’s still possible to simulate different boundary conditions by padding the original image, but if the mask is really large, this can be expensive (you have to add borders that are large as the half-size of the convolution kernel).

Is there a way to make the script faster? Duplicates of 50 takes forever for a convolution of 100 px. PDN plugin Splinter Blur version is a lot faster, but that may has to do with applied multi-threading, and I can’t find out about that because it’s not open source.

Since I changed variables, here’s the current code.

Splinter Blur

The part I want to make faster is the part close to ang={} line.

#@cli rep_splinter_blur: _length,_duplicates,_angle,_thickness,_sharpen_multiplier,-1<=_balance<=1,_boundary={ 0=None | 1=Neumann | 2=Periodic | 3=Mirror },_bisided={ 0=one-line | two-line }
#@cli : Apply Splinter Blur to Image. Based off observation from using Splinter Blur plugin within Paint.NET made by Ed Harvey. Note that convolution result is different.
rep_splinter_blur:
skip ${1=10%},${2=5},${3=0},${4=0},${5=0},${6=0},${7=1},${8=0}

start_ang={$3}
angs_per_dups={360/$2}

m "average_output: ti=$! add / $ti"
if $6==-1 m "output_splinter : min"
elif $6<0&&$6>-1 m "output_splinter : +average_output +min[^-1] f. lerp(i#-2,i,abs($6)) k."
elif $6==0 m "output_splinter : average_output"
elif $6>0&&$6<1 m "output_splinter : +average_output +max[^-1] f. lerp(i#-2,i,$6) k."
elif $6==1 m "output_splinter : max"
else error (-1<=\$\6<=1)=F
fi



repeat $! l[$>]
 cutval={im},{iM}
 hypo={(sqrt(w^2+h^2)/2)}
 if ${is_percent\ $1} length={round($1*$hypo)} 
 else length={round($1)} 
 fi
 
 if ${is_percent\ $4} thickness={round($4*$hypo)} 
 else thickness={round($4)} 
 fi
 
 ds={d},{s}
 
 if $7!=2 expand_xy {round($length/2)},$7 fi
 repeat $2
  ang={$start_ang+$angs_per_dups*$>}
  +rep_splinter_blur_convolve_map. $length,$thickness,$ang,$5,$8
  +convolve_fft[0] [-1]
  rm..
 done
 rm[0]
 output_splinter
 cut $cutval
 if $7!=2 shrink_xy {round($length/2)} fi
endl done

um average_output,output_splinter,splinter_convolve
#@cli rep_splinter_blur_convolve_map: _length,_thickness,_angle,_sharpen_multiplier,_bisided={ 0=one-line | two-line }
#@cli : Create a convolve map for directional blur. This enables one to create a convolve map for one-direction motion blur.
#@cli : Default values: '_length=10%','_thickness=5%','_angle=0','_bisided=1'
rep_splinter_blur_convolve_map : skip ${1=10%},${2=5%},${3=0},${5=1}
repeat $! l[$>]
 hypo={(sqrt(w^2+h^2)/2)}
 if ${is_percent\ $1} length={round($1*$hypo)} else length={round($1)} fi
 if ${is_percent\ $2} thickness={max(round($2*$hypo),1)} else thickness={max(round($2),1)} fi
 ds={d},{s} 
 rm
 {$length},{$length},1,1
 f "begin(
  const sides=$5;
  const thickness="$thickness";
  const hw=(w-1)/2;
  const hh=(h-1)/2;
  const ang=($3/180)*pi*-1;
  const cos_ang=cos(ang);
  const sin_ang=sin(ang);
  rot_x(a,b)=a*cos_ang-b*sin_ang;
  rot_y(a,b)=a*sin_ang+b*cos_ang;
  cutval(v)=v<0?0:v;
  maxcutval(v)=v>1?1:v;
 );
 xx=x/w-.5;
 yy=y/h-.5;
 lx=x-hw;
 ly=y-hh;
 radial_grad=1-sqrt(xx^2+yy^2)*2;
 radial_grad=cutval(radial_grad);
 line=1-maxcutval(abs(rot_x(lx,ly))/thickness);
 sides?(line?radial_grad*line):(rot_y(lx,ly)<=0?(line?radial_grad*line));
 "
 / {is}
 if $4
  avgstat={ia}
  +f (i*2-$avgstat)
  f.. lerp(i,i#1,min(1,$4))
  k[0]
 fi
 r 100%,100%,$ds,0,1
endl done

Note to self, try to create radial duplicates on the fill block. That might be the solution.

EDIT: Just tested the note to self idea. Nope.

Also if you can factorize your kernel and separate the 2D convolution into a 1D vertical and a 1D horizontal one, it’s better to avoid FFT.

But then, trying to limit the number of operations is not always good, it depends on how the memory is accessed vs. cache misses vs. I/O speed vs. computation speed. I have seen GPU benchmarks where FFT starts being worth it only for 64×64 kernels and up. It’s difficult to predict the performance without benchmarking.

Some Python libs actually run various convolutions for various kernels and images sizes, and cache the runtimes, so the code later switch to the fastest path depending on sizes.

Thanks for your input. I am doing that already.

Speaking of which…
I’ve noticed that the convolve_fft command did not produce the same result as the convolve command with the same input image and same kernel.
This is now fixed.

Also, I’ve added an additional argument boundary_conditions to command convolve_fft, to allow choosing between different boundary conditions, not only the periodic one.
Should be good now (after a $ gmic update).

Another thing: I’ve tested the convolution of a 1024x1024 image with square kernels of increasing sizes, with convolve and convolve_fft and measured the timings.
On my 24-cores machine, it appears that convolve_fft becomes faster for different sizes, depending on the boundary conditions chosen.
Here is what I get here:

Input image is 1024x1024.

Convolution 3x3:
  > 'convolve' (dirichlet): 0.016 s.
  > 'convolve' (neumann): 0.002 s.
  > 'convolve' (periodic): 0.054 s.
  > 'convolve_fft' (dirichlet): 0.403 s.
  > 'convolve_fft' (neumann): 0.431 s.
  > 'convolve_fft' (periodic): 0.393 s.

Convolution 4x4:
  > 'convolve' (dirichlet): 0.05 s.
  > 'convolve' (neumann): 0.044 s.
  > 'convolve' (periodic): 0.085 s.
  > 'convolve_fft' (dirichlet): 0.403 s.
  > 'convolve_fft' (neumann): 0.413 s.
  > 'convolve_fft' (periodic): 0.359 s.

Convolution 5x5:
  > 'convolve' (dirichlet): 0.022 s.
  > 'convolve' (neumann): 0.02 s.
  > 'convolve' (periodic): 0.117 s.
  > 'convolve_fft' (dirichlet): 0.718 s.
  > 'convolve_fft' (neumann): 0.707 s.
  > 'convolve_fft' (periodic): 0.384 s.

Convolution 6x6:
  > 'convolve' (dirichlet): 0.085 s.
  > 'convolve' (neumann): 0.082 s.
  > 'convolve' (periodic): 0.13 s.
  > 'convolve_fft' (dirichlet): 0.735 s.
  > 'convolve_fft' (neumann): 0.73 s.
  > 'convolve_fft' (periodic): 0.415 s.

Convolution 7x7:
  > 'convolve' (dirichlet): 0.124 s.
  > 'convolve' (neumann): 0.094 s.
  > 'convolve' (periodic): 0.181 s.
  > 'convolve_fft' (dirichlet): 0.419 s.
  > 'convolve_fft' (neumann): 0.408 s.
  > 'convolve_fft' (periodic): 0.385 s.

Convolution 8x8:
  > 'convolve' (dirichlet): 0.119 s.
  > 'convolve' (neumann): 0.104 s.
  > 'convolve' (periodic): 0.202 s.
  > 'convolve_fft' (dirichlet): 0.444 s.
  > 'convolve_fft' (neumann): 0.433 s.
  > 'convolve_fft' (periodic): 0.377 s.

Convolution 9x9:
  > 'convolve' (dirichlet): 0.155 s.
  > 'convolve' (neumann): 0.119 s.
  > 'convolve' (periodic): 0.254 s.
  > 'convolve_fft' (dirichlet): 2.529 s.
  > 'convolve_fft' (neumann): 2.751 s.
  > 'convolve_fft' (periodic): 0.433 s.

Convolution 10x10:
  > 'convolve' (dirichlet): 0.175 s.
  > 'convolve' (neumann): 0.135 s.
  > 'convolve' (periodic): 0.284 s.
  > 'convolve_fft' (dirichlet): 2.483 s.
  > 'convolve_fft' (neumann): 2.592 s.
  > 'convolve_fft' (periodic): 0.433 s.

Convolution 11x11:
  > 'convolve' (dirichlet): 0.196 s.
  > 'convolve' (neumann): 0.148 s.
  > 'convolve' (periodic): 0.365 s.
  > 'convolve_fft' (dirichlet): 0.62 s.
  > 'convolve_fft' (neumann): 0.588 s.
  > 'convolve_fft' (periodic): 0.415 s.

Convolution 12x12:
  > 'convolve' (dirichlet): 0.225 s.
  > 'convolve' (neumann): 0.177 s.
  > 'convolve' (periodic): 0.425 s.
  > 'convolve_fft' (dirichlet): 0.591 s.
  > 'convolve_fft' (neumann): 0.601 s.
  > 'convolve_fft' (periodic): 0.451 s.

Convolution 13x13:
  > 'convolve' (dirichlet): 0.281 s.
  > 'convolve' (neumann): 0.228 s.
  > 'convolve' (periodic): 0.532 s.
  > 'convolve_fft' (dirichlet): 0.431 s.
  > 'convolve_fft' (neumann): 0.751 s.
  > 'convolve_fft' (periodic): 0.436 s.

Convolution 14x14:
  > 'convolve' (dirichlet): 0.299 s.
  > 'convolve' (neumann): 0.276 s.
  > 'convolve' (periodic): 0.627 s.
  > 'convolve_fft' (dirichlet): 0.432 s.
  > 'convolve_fft' (neumann): 0.405 s.
  > 'convolve_fft' (periodic): 0.426 s.

Convolution 15x15:
  > 'convolve' (dirichlet): 0.359 s.
  > 'convolve' (neumann): 0.302 s.
  > 'convolve' (periodic): 0.685 s.
  > 'convolve_fft' (dirichlet): 0.343 s.
  > 'convolve_fft' (neumann): 0.33 s.
  > 'convolve_fft' (periodic): 0.435 s.

Convolution 16x16:
  > 'convolve' (dirichlet): 0.409 s.
  > 'convolve' (neumann): 0.332 s.
  > 'convolve' (periodic): 0.767 s.
  > 'convolve_fft' (dirichlet): 0.326 s.
  > 'convolve_fft' (neumann): 0.29 s.
  > 'convolve_fft' (periodic): 0.407 s.

Convolution 17x17:
  > 'convolve' (dirichlet): 0.399 s.
  > 'convolve' (neumann): 0.331 s.
  > 'convolve' (periodic): 0.779 s.
  > 'convolve_fft' (dirichlet): 0.432 s.
  > 'convolve_fft' (neumann): 0.396 s.
  > 'convolve_fft' (periodic): 0.411 s.

Convolution 18x18:
  > 'convolve' (dirichlet): 0.457 s.
  > 'convolve' (neumann): 0.37 s.
  > 'convolve' (periodic): 0.934 s.
  > 'convolve_fft' (dirichlet): 0.491 s.
  > 'convolve_fft' (neumann): 0.461 s.
  > 'convolve_fft' (periodic): 0.455 s.

Convolution 19x19:
  > 'convolve' (dirichlet): 0.519 s.
  > 'convolve' (neumann): 0.396 s.
  > 'convolve' (periodic): 0.905 s.
  > 'convolve_fft' (dirichlet): 0.948 s.
  > 'convolve_fft' (neumann): 0.952 s.
  > 'convolve_fft' (periodic): 0.402 s.

Convolution 20x20:
  > 'convolve' (dirichlet): 0.583 s.
  > 'convolve' (neumann): 0.429 s.
  > 'convolve' (periodic): 1.017 s.
  > 'convolve_fft' (dirichlet): 0.929 s.
  > 'convolve_fft' (neumann): 0.939 s.
  > 'convolve_fft' (periodic): 0.431 s.

@David_Tschumperle Thanks for adding boundary argument. As for speed, it seems that one would benefit from a dynamic switch to convolve. I haven’t done a rigorous test like that, but I did note at on 10x10 and higher, fft becomes faster than regular convolve in my 6-cores machine.

For the earlier question, I now realized that I can code in dynamic size depending on angle. This will reduce computation time, and in theory should be much faster than pdn version.

EDIT: Mistaken threads for core.

2 cores. :roll_eyes:

Actually, I mistaken threads for core. I have 6 cores here. 3.6 Ghz 6 Cores.