Synthetic human-like fakes: Difference between revisions

→‎Digital sound-alikes: various improvements
(moved the Jan 2021 $35 mln fraud case againt with a digital sound-alike to == Digital sound-alikes ==, added subheadings for the 2019 and 2021 fraud cases and transcluded it with {{#lst:Synthetic human-like fakes|2021 digital sound-alike enabled fraud}} into the timeline)
(→‎Digital sound-alikes: various improvements)
Line 109: Line 109:


== Digital sound-alikes ==
== Digital sound-alikes ==
[[File:Helsingin-Sanomat-2012-David-Martin-Howard-of-University-of-York-on-apporaching-digital-sound-alikes.jpg|right|thumb|338px|A picture of a cut-away titled "''Voice-terrorist could mimic a leader''" from a 2012 [[w:Helsingin Sanomat]] warning that the sound-like-anyone machines are approaching. Thank you to homie [https://pure.york.ac.uk/portal/en/researchers/david-martin-howard(ecfa9e9e-1290-464f-981a-0c70a534609e).html Prof. David Martin Howard] of the [[w:University of York]], UK and the anonymous editor for the heads-up.]]
[[File:Helsingin-Sanomat-2012-David-Martin-Howard-of-University-of-York-on-apporaching-digital-sound-alikes.jpg|right|thumb|338px|A picture of a cut-away titled "''Voice-terrorist could mimic a leader''" from a 2012 [[w:Helsingin Sanomat]] warning that the sound-like-anyone machines are approaching. Thank you to homie [https://pure.york.ac.uk/portal/en/researchers/david-martin-howard(ecfa9e9e-1290-464f-981a-0c70a534609e).html Prof. David Martin Howard] of the [[w:University of York]], UK and the anonymous editor for the heads-up.]]
The first English speaking digital sound-alikes were first introduced in 2016 by Adobe and Deepmind, but neither of them were made publicly available.
<section begin=GoogleTransferLearning2018 />
Then in '''2018''' at the '''[[w:Conference on Neural Information Processing Systems]]''' (NeurIPS) the work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented. The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results
Observe how good the "VCTK p240" system is at deceiving to think that it is a person that is doing the talking.
{{#Widget:Iframe - Audio samples from Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis by Google Research}}
The Iframe above is transcluded from [https://google.github.io/tacotron/publications/speaker_adaptation/ 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"' at google.gituhub.io], the audio samples of a sound-like-anyone machine presented as at the 2018 [[w:NeurIPS]] conference by Google researchers.
<section end=GoogleTransferLearning2018 />
The to the right [https://www.youtube.com/watch?v=0sR1rU3gLzQ video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube] describes the voice thieving machine presented by Google Research in [[w:NeurIPS|w:NeurIPS]] 2018.
{{#ev:youtube|0sR1rU3gLzQ|640px|right|Video [https://www.youtube.com/watch?v=0sR1rU3gLzQ video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube] describes the voice thieving machine by Google Research in [[w:NeurIPS|w:NeurIPS]] 2018.}}
=== What should we do about digital sound-alikes? ===


Living people can defend<ref group="footnote" name="judiciary maybe not aware">Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice.</ref> themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.  
Living people can defend<ref group="footnote" name="judiciary maybe not aware">Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice.</ref> themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.  


For these reasons the bannable '''raw materials''' i.e. covert voice models '''[[Law proposals to ban covert modeling|should be prohibited by law]]''' in order to protect humans from abuse by criminal parties.
For these reasons the bannable '''raw materials''' i.e. covert voice models '''[[Law proposals to ban covert modeling|should be prohibited by law]]''' in order to protect humans from abuse by criminal parties.


=== Documented digital sound-alike attacks ===
=== Documented digital sound-alike attacks ===
In 2019 reports of crimes being committed with digital sound-alikes started surfacing. As of Jan 2022 no reports of other types of attack than fraud have been found.
==== 2019 digital sound-alike enabled fraud  ====
==== 2019 digital sound-alike enabled fraud  ====
* Sound like anyone technology found its way to the hands of criminals as in '''2019''' [[w:NortonLifeLock|Symantec]] researchers knew of 3 cases where technology has been used for '''[[w:crime]]'''
* Sound like anyone technology found its way to the hands of criminals as in '''2019''' [[w:NortonLifeLock|Symantec]] researchers knew of 3 cases where technology has been used for '''[[w:crime]]'''
Line 148: Line 169:
** [https://www.unite.ai/deepfaked-voice-enabled-35-million-bank-heist-in-2020/ '''''Deepfaked Voice Enabled $35 Million Bank Heist in 2020''''' at unite.ai]<ref group="1st seen in">https://www.reddit.com/r/VocalSynthesis/</ref> reporting updated on 2021-10-15
** [https://www.unite.ai/deepfaked-voice-enabled-35-million-bank-heist-in-2020/ '''''Deepfaked Voice Enabled $35 Million Bank Heist in 2020''''' at unite.ai]<ref group="1st seen in">https://www.reddit.com/r/VocalSynthesis/</ref> reporting updated on 2021-10-15
<section end=2021 digital sound-alike enabled fraud />
<section end=2021 digital sound-alike enabled fraud />
----
=== 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis' 2018 by Google Research (external transclusion) ===
<section begin=GoogleTransferLearning2018 />
* In the '''2018''' at the '''[[w:Conference on Neural Information Processing Systems]]''' (NeurIPS) the work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented. The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results
Observe how good the "VCTK p240" system is at deceiving to think that it is a person that is doing the talking.
{{#Widget:Iframe - Audio samples from Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis by Google Research}}
The Iframe above is transcluded from [https://google.github.io/tacotron/publications/speaker_adaptation/ 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"' at google.gituhub.io], the audio samples of a sound-like-anyone machine presented as at the 2018 [[w:NeurIPS]] conference by Google researchers.
<section end=GoogleTransferLearning2018 />
The to the right [https://www.youtube.com/watch?v=0sR1rU3gLzQ video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube] describes the voice thieving machine presented by Google Research in [[w:NeurIPS|w:NeurIPS]] 2018.
{{#ev:youtube|0sR1rU3gLzQ|640px|right|Video [https://www.youtube.com/watch?v=0sR1rU3gLzQ video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube] describes the voice thieving machine by Google Research in [[w:NeurIPS|w:NeurIPS]] 2018.}}
----


=== Example of a hypothetical 4-victim digital sound-alike attack ===
=== Example of a hypothetical 4-victim digital sound-alike attack ===