Towards optimized tensor code generation for deep learning on sunway many-core processor