Gut the ICC and load the untransformed pixels as they would be fed in Linear. The ICC will be the largest performance hurdle for the load.
The output will be slower if you are set to the sRGB OETF due to the lookup nature of the from_reference direction. Change that to any well defined transform with a to_reference and from_reference and it will claw the cycles down to milliseconds.
A simple matrix transform can be made for the Nikon, via the Adobe matrices in dcraw. The ACES transforms are large due to the 3D LUTs.