Indeed, but both are related (motion causes blur).
The point remains. Sensor-shifting makes the sensor shift (no joke !) by a known amount over the sensor plane, so you can correlate all images on your computer by just shifting pixels coordinates. No biggie.
Now, with hand-held “shift”, you have no idea of the amount of shifting you got. Well, we have autocorrelation methods, using 2D Fourier transforms and convolving the image on top of itself to get the dephasing, so that could be easy. Unfortunately, your hand moves in 3D, that is, the shifting is not done over a plane but over a moving sphere, so you would have to correct 3 translations and 3 rotations before you are able to correlate the pictures, even possibly a slight defocusing. And, unless you got an accelerometer and a gyroscope to record the motion in-camera, you have to guess the motion direction from the pictures. So, you stack approximations on top of guesses and try to compute a correction with all of that, before you can actually stack and average anything. Hugin knows how to do that, but with some errors, and the result is far from perfect and not compatible with a “super resolution” purpose.
Also, the pixel-averaging/pixel median thing is as old as digital cameras : Noise Reduction By Image Averaging (see the Windows 98 screenshots ?). Noise is high-frequency, details are high-frequency, filtering one will filter the other too, but noise is random and details are constant, so average shots and you will dilute the randomness into the constantness… That’s statistics 101.
Photographers are like dogs who rediscover every day they have a tail, so they keep chasing it. Pixel averaging works great, costs nothing and you can do it in every software that works with layers. If your noise is roughly gaussian and you have enough shots, median == average (again, stats), so you could just stack and blend layers at low opacity. But for God’s sake, do yourself a favour and use a tripod.
Fstoppers and Petapixels are full of photogeeks that pose as engineers and speak of technics as if they understood it: optics, image processing, electronics, computers… you name it. Well, there are 2 kinds of people on the Internet: those who can solve an integral of convolution, and those who would better keep taking pictures and stop trying to educate others because they have no knowledge to share.