Merge uploaded image and audio inputs into one video:
