https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md
https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html
If the model is really big, it may require model sharding because the Qualcomm DSP is a 32bit system and has a 4GB size limit .
- For example for Llama 3 8B models, we need to shard the model into 4, but ExecuTorch still packages it into one PTE file.

Passes or transformation

Quantization

QNN backend currently supports exporting to these data types: fp32, int4/ int8 with PTQ, int4 with SpinQuant (Llama 3 only).