Model Poisoning

Model poisoning can occur during fine-tuning, post-training modification, or distribution through public registries. Affected models can pass standard evaluations and only misbehave under attacker-defined triggers.

Model poisoning vectors include:

  • Compromised weights: Modified .safetensors or checkpoint files on public registries.
  • Backdoored fine-tuning: Targeted training runs that embed hidden behavior.
  • Distillation attacks: Poisoning that propagates from a teacher to a student model.
  • Federated poisoning: Malicious participants in federated learning runs.
  • Post-training edits: Manipulating models in transit or at deployment time.

Model poisoning is often discovered only after deployment, sometimes long after, so prevention has to be proactive. Provenance verification, isolated evaluation, and adversarial probing during pre-deployment review are the most reliable controls.

Programs that mature fastest also build trust baselines for each foundation model in use, so deviations under specific triggers are easier to spot during evaluation.

Programs that mature fastest also share scanning signals with peer organizations and registries, because supply chain defense scales when discovery is collective rather than siloed.

How PointGuard AI Helps

PointGuard's AI Supply Chain Security scans models for backdoor patterns and validates provenance, and AI Red Teaming probes models with trigger-based adversarial inputs before they reach production. The combination shifts model risk discovery left, surfacing issues before models reach customers or regulators.

Learn More

Watch Blog Video

Follow us on LikedIn

Our Newsletter

Subscribe

Ready to get started?

Our expert team can assess your needs, show you a live demo, and recommend a solution that will save you time and money.