Synthetic human-like fakes: Difference between revisions
Juho Kunsola (talk | contribs) m (Juho Kunsola moved page Digital look-alikes to Digital look-alikes and sound-alikes: restructuring to join article with digital sound-alike) |
Juho Kunsola (talk | contribs) (copy-pasted content from digital sound-alike and adjusted headings) |
||
Line 28: | Line 28: | ||
When the camera does not exist, but the subject being imaged with a simulation of a (movie) camera deceives the watcher to believe it is some living or dead person it is a '''digital look-alike'''. | When the camera does not exist, but the subject being imaged with a simulation of a (movie) camera deceives the watcher to believe it is some living or dead person it is a '''digital look-alike'''. | ||
= Introduction to digital look-alikes = | = Digital look-alikes = | ||
== Introduction to digital look-alikes == | |||
In the cinemas we have seen digital look-alikes for over 15 years. These digital look-alikes have "clothing" (a simulation of clothing is not clothing) or "superhero costumes" and "superbaddie costumes", and they don't need to care about the laws of physics, let alone laws of physiology. It is generally accepted that digital look-alikes made their public debut in the sequels of The Matrix i.e. [[w:The Matrix Reloaded]] and [[w:The Matrix Revolutions]] released in 2003. It can be considered almost certain, that it was not possible to make these before the year 1999, as the final piece of the puzzle to make a (still) digital look-alike that passes human testing, the [[Glossary#Reflectance capture|reflectance capture]] over the human face, was made for the first time in 1999 at the [[w:University of Southern California]] and was presented to the crème de la crème | In the cinemas we have seen digital look-alikes for over 15 years. These digital look-alikes have "clothing" (a simulation of clothing is not clothing) or "superhero costumes" and "superbaddie costumes", and they don't need to care about the laws of physics, let alone laws of physiology. It is generally accepted that digital look-alikes made their public debut in the sequels of The Matrix i.e. [[w:The Matrix Reloaded]] and [[w:The Matrix Revolutions]] released in 2003. It can be considered almost certain, that it was not possible to make these before the year 1999, as the final piece of the puzzle to make a (still) digital look-alike that passes human testing, the [[Glossary#Reflectance capture|reflectance capture]] over the human face, was made for the first time in 1999 at the [[w:University of Southern California]] and was presented to the crème de la crème | ||
Line 51: | Line 52: | ||
{{Q|Do you think that was [[w:Hugo Weaving|Hugo Weaving]]'s left cheekbone that [[w:Keanu Reeves|Keanu Reeves]] punched in with his right fist?|Trad|The Matrix Revolutions}} | {{Q|Do you think that was [[w:Hugo Weaving|Hugo Weaving]]'s left cheekbone that [[w:Keanu Reeves|Keanu Reeves]] punched in with his right fist?|Trad|The Matrix Revolutions}} | ||
= The problems with digital look-alikes = | == The problems with digital look-alikes == | ||
Extremely unfortunately for the humankind, organized criminal leagues, that posses the '''weapons capability''' of making believable looking '''synthetic pornography''', are producing on industrial production pipelines '''synthetic terror porn'''<ref group="footnote" name="About the term synthetic terror porn">It is terminologically more precise, more inclusive and more useful to talk about 'synthetic terror porn', if we want to talk about things with their real names, than 'synthetic rape porn', because also synthesizing recordings of consentual looking sex scenes can be terroristic in intent.</ref> by animating digital look-alikes and distributing it in the murky Internet in exchange for money stacks that are getting thinner and thinner as time goes by. | Extremely unfortunately for the humankind, organized criminal leagues, that posses the '''weapons capability''' of making believable looking '''synthetic pornography''', are producing on industrial production pipelines '''synthetic terror porn'''<ref group="footnote" name="About the term synthetic terror porn">It is terminologically more precise, more inclusive and more useful to talk about 'synthetic terror porn', if we want to talk about things with their real names, than 'synthetic rape porn', because also synthesizing recordings of consentual looking sex scenes can be terroristic in intent.</ref> by animating digital look-alikes and distributing it in the murky Internet in exchange for money stacks that are getting thinner and thinner as time goes by. | ||
Line 60: | Line 61: | ||
---- | ---- | ||
= Digital sound-alikes = | |||
When it cannot be determined by human testing whether some fake voice is a synthetic fake of some person's voice, or is it an actual recording made of that person's actual real voice, it is a '''digital sound-alike'''. | |||
Living people can defend¹ themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability. | |||
[[File:Spectrogram-19thC.png|thumb|right|640px|A [[w:spectrogram|spectrogram]] of a male voice saying 'nineteenth century']] | |||
---- | |||
== Timeline of digital sound-alikes == | |||
* In '''2016''' [[w:Adobe Inc.]]'s [[w:Adobe Voco|Voco]], an unreleased prototype, was publicly demonstrated in 2016. ([https://www.youtube.com/watch?v=I3l4XLZ59iw&t=5s View and listen to Adobe MAX 2016 presentation of Voco]) | |||
* In '''2016''' [[w:DeepMind]]'s [[w:WaveNet]] owned by [[w:Google]] also demonstrated ability to steal people's voices | |||
* In '''2018''' [[w:Conference on Neural Information Processing Systems|Conference on Neural Information Processing Systems]] the work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented. The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results | |||
** Listen [https://google.github.io/tacotron/publications/speaker_adaptation/ 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"'] | |||
** View [https://www.youtube.com/watch?v=0sR1rU3gLzQ Video summary of the work at YouTube: 'This AI Clones Your Voice After Listening for 5 Seconds'] | |||
* As of '''2019''' Symantec research knows of 3 cases where digital sound-alike technology '''has been used for crimes'''.<ref name="WaPo2019"> | |||
https://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-mimicking-software-reportedly-used-major-theft/</ref> | |||
---- | |||
== Examples of speech synthesis software not quite able to fool a human yet == | |||
Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer. | |||
* '''[https://lyrebird.ai/ Lyrebird.ai]''' [https://www.youtube.com/watch?v=xxDBlZu__Xk (listen)] | |||
* '''[https://candyvoice.com/ CandyVoice.com]''' [https://candyvoice.com/demos/voice-conversion (test with your choice of text)] | |||
* '''[https://cstr-edinburgh.github.io/merlin/ Merlin]''', a [[w:neural network]] based speech synthesis system by the Centre for Speech Technology Research at the [[w:University of Edinburgh]] | |||
== Documented digital sound-alike attacks == | |||
* [https://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-mimicking-software-reportedly-used-major-theft/?noredirect=on 'An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft'], a 2019 Washington Post article | |||
---- | |||
== Example of a hypothetical digital sound-alike attack == | |||
A very simple example of a digital sound-alike attack is as follows: | |||
Someone puts a digital sound-alike to call somebody's voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims: | |||
# Victim #1 - The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes | |||
# Victim #2 - The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1 | |||
# Victim #3 - It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1 | |||
# Victim #4 - Our judiciary which prosecutes and possibly convicts the innocent victim #1. | |||
Thus it is high time to act and to '''[[Law proposals to ban covert modeling|criminalize the covert modeling of human appearance and voice!]]''' | |||
---- | |||
Footnote 1. Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice. | |||
---- | |||
= See also in Stop Synthetic Filth! wiki = | = See also in Stop Synthetic Filth! wiki = | ||
* [[How to protect yourself and others from covert modeling]] | * [[How to protect yourself and others from covert modeling]] |
Revision as of 12:16, 2 April 2020
When the camera does not exist, but the subject being imaged with a simulation of a (movie) camera deceives the watcher to believe it is some living or dead person it is a digital look-alike.
Digital look-alikes
Introduction to digital look-alikes
In the cinemas we have seen digital look-alikes for over 15 years. These digital look-alikes have "clothing" (a simulation of clothing is not clothing) or "superhero costumes" and "superbaddie costumes", and they don't need to care about the laws of physics, let alone laws of physiology. It is generally accepted that digital look-alikes made their public debut in the sequels of The Matrix i.e. w:The Matrix Reloaded and w:The Matrix Revolutions released in 2003. It can be considered almost certain, that it was not possible to make these before the year 1999, as the final piece of the puzzle to make a (still) digital look-alike that passes human testing, the reflectance capture over the human face, was made for the first time in 1999 at the w:University of Southern California and was presented to the crème de la crème of the computer graphics field in their annual gathering SIGGRAPH 2000.[1]
“Do you think that was Hugo Weaving's left cheekbone that Keanu Reeves punched in with his right fist?”
The problems with digital look-alikes
Extremely unfortunately for the humankind, organized criminal leagues, that posses the weapons capability of making believable looking synthetic pornography, are producing on industrial production pipelines synthetic terror porn[footnote 1] by animating digital look-alikes and distributing it in the murky Internet in exchange for money stacks that are getting thinner and thinner as time goes by.
These industrially produced pornographic delusions are causing great humane suffering, especially in their direct victims, but they are also tearing our communities and societies apart, sowing blind rage, perceptions of deepening chaos, feelings of powerlessness and provoke violence. This hate illustration increases and strengthens hate thinking, hate speech, hate crimes and tears our fragile social constructions apart and with time perverts humankind's view of humankind into an almost unrecognizable shape, unless we interfere with resolve.
For these reasons the bannable raw materials i.e. covert models, needed to produce this disinformation terror on the information-industrial production pipelines, should be prohibited by law in order to protect humans from arbitrary abuse by criminal parties.
Digital sound-alikes
When it cannot be determined by human testing whether some fake voice is a synthetic fake of some person's voice, or is it an actual recording made of that person's actual real voice, it is a digital sound-alike.
Living people can defend¹ themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.
Timeline of digital sound-alikes
- In 2016 w:Adobe Inc.'s Voco, an unreleased prototype, was publicly demonstrated in 2016. (View and listen to Adobe MAX 2016 presentation of Voco)
- In 2016 w:DeepMind's w:WaveNet owned by w:Google also demonstrated ability to steal people's voices
- In 2018 Conference on Neural Information Processing Systems the work 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis' (at arXiv.org) was presented. The pre-trained model is able to steal voices from a sample of only 5 seconds with almost convincing results
- As of 2019 Symantec research knows of 3 cases where digital sound-alike technology has been used for crimes.[2]
Examples of speech synthesis software not quite able to fool a human yet
Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.
- Lyrebird.ai (listen)
- CandyVoice.com (test with your choice of text)
- Merlin, a w:neural network based speech synthesis system by the Centre for Speech Technology Research at the w:University of Edinburgh
Documented digital sound-alike attacks
- 'An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft', a 2019 Washington Post article
Example of a hypothetical digital sound-alike attack
A very simple example of a digital sound-alike attack is as follows:
Someone puts a digital sound-alike to call somebody's voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:
- Victim #1 - The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
- Victim #2 - The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
- Victim #3 - It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
- Victim #4 - Our judiciary which prosecutes and possibly convicts the innocent victim #1.
Thus it is high time to act and to criminalize the covert modeling of human appearance and voice!
Footnote 1. Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice.
See also in Stop Synthetic Filth! wiki
Footnotes
- ↑ It is terminologically more precise, more inclusive and more useful to talk about 'synthetic terror porn', if we want to talk about things with their real names, than 'synthetic rape porn', because also synthesizing recordings of consentual looking sex scenes can be terroristic in intent.
References
- ↑ Debevec, Paul (2000). "Acquiring the reflectance field of a human face". Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH '00. ACM. pp. 145–156. doi:10.1145/344779.344855. ISBN 978-1581132083. Retrieved 2017-05-24.
- ↑ https://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-mimicking-software-reportedly-used-major-theft/