Synthetic human-like fakes: Difference between revisions

Synthetic human-like fakes (edit)

Revision as of 18:27, 13 July 2020

83 bytes added , 3 years ago

→‎Media perhaps about synthetic human-like fakes: moved factual events under == Timeline of synthetic human-like fakes ==

Juho Kunsola

Bureaucrats, Interface administrators, Administrators

3,839

edits

@@ Line 145: / Line 145: @@
 [[File:Spectrogram-19thC.png|thumb|right|640px|A [[w:spectrogram|spectrogram]] of a male voice saying 'nineteenth century']]
+== Timeline of synthetic human-like fakes ==
+=== 1770's ===
+[[File:Kempelen Speakingmachine.JPG|right|thumb|300px|A replica of [[w:Wolfgang von Kempelen|Kempelen]]'s [[w:Wolfgang von Kempelen's Speaking Machine|speaking machine]], built 2007–09 at the Department of [[w:Phonetics|Phonetics]], [[w:Saarland University|Saarland University]], [[w:Saarbrücken|Saarbrücken]], Germany. This machine added models of the tongue and lips, enabling it to produce [[w:consonant|consonant]]s as well as [[w:vowel|vowel]]s]]
+* '''1779''' | science / discovery | [[w:Christian Gottlieb Kratzenstein]] won the first prize in a competition announced by the [[w:Russian Academy of Sciences]] for '''models''' he built of the '''human [[w:vocal tract]]''' that could produce the five long '''[[w:vowel]]''' sounds.<ref name="Helsinki">
+[http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html History and Development of Speech Synthesis], Helsinki University of Technology, Retrieved on November 4, 2006
+</ref> (Based on [[w:Speech synthesis#History]])
+* '''1791''' | science | '''[[w:Wolfgang von Kempelen's Speaking Machine]]''' of [[w:Wolfgang von Kempelen]] of [[w:Pressburg]], [[w:Hungary]], described in a 1791 paper was [[w:bellows]]-operated.<ref>''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine'' ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien).</ref> This machine added models of the tongue and lips, enabling it to produce [[w:consonant]]s as well as [[w:vowel]]s. (based on [[w:Speech synthesis#History]])
+=== 1970's ===
+* '''1971''' | science | '''[https://interstices.info/images-de-synthese-palme-de-la-longevite-pour-lombrage-de-gouraud/ 'Images de synthèse : palme de la longévité pour l’ombrage de Gouraud' (still photos)]'''. [[w:Henri Gouraud (computer scientist)]] made the first [[w:Computer graphics]] [[w:geometry]] [[w:digitization]] and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple [[w:wire-frame model]] and he applied [[w:Gouraud shading]] to produce the '''first known representation''' of '''human-likeness''' on computer. <ref>{{cite web|title=Images de synthèse : palme de la longévité pour l'ombrage de Gouraud|url=http://interstices.info/jcms/c_25256/images-de-synthese-palme-de-la-longevite-pour-lombrage-de-gouraud}}</ref>
+* '''1972''' | entertainment | '''[https://vimeo.com/59434349 'A Computer Animated Hand' on Vimeo]'''. [[w:A Computer Animated Hand]] by [[w:Edwin Catmull]] and [[w:Fred Parke]]. Relevancy: This was the '''first time''' that [[w:computer-generated imagery|computer-generated imagery]] was used in film to '''animate''' moving '''human-like appearance'''.
+=== 1990's ===
+[[File:BSSDF01_400.svg|thumb|left|300px|Traditional [[w:Bidirectional reflectance distribution function|BRDF]] vs. [[w:subsurface scattering|subsurface scattering]] inclusive BSSRDF i.e. [[w:Bidirectional scattering distribution function#Overview of the BxDF functions|Bidirectional scattering-surface reflectance distribution function]]. An analytical BRDF must take into account the subsurface scattering, or the end result '''will not pass human testing'''.]]
+* <font color="red">'''1999'''</font> | <font color="red">'''science'''</font> | '''[http://dl.acm.org/citation.cfm?id=344855 'Acquiring the reflectance field of a human face' paper at dl.acm.org ]''' [[w:Paul Debevec]] et al. of [[w:University of Southern California]] did the '''first known reflectance capture''' over '''the human face''' with their extremely simple [[w:light stage]]. They presented their method and results in [[w:SIGGRAPH]] 2000. The scientific breakthrough required finding the [[w:subsurface scattering|w:subsurface light component]] (the simulation models are glowing from within slightly) which can be found using knowledge that light that is reflected from the oil-to-air layer retains its [[w:Polarization (waves)]] and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.<ref name="Deb2000"/>
+=== 2010's ===
+* '''2013''' | demonstration | A '''[https://ict.usc.edu/pubs/Scanning%20and%20Printing%20a%203D%20Portrait%20of%20President%20Barack%20Obama.pdf 'Scanning and Printing a 3D Portrait of President Barack Obama' at ict.usc.edu]'''.  A 7D model and a 3D bust was made of President Obama with his consent. Relevancy: <font color="green">'''Relevancy: certain'''</font>
+* '''2016''' | science | '''[http://www.niessnerlab.org/projects/thies2016face.html 'Face2Face: Real-time Face Capture and Reenactment of RGB Videos' at Niessnerlab.org]''' A paper (with videos) on the semi-real-time 2D video manipulation with gesture forcing and lip sync forcing synthesis by Thies et al, Stanford. <font color="green">'''Relevancy: certain'''</font>
+[[File:Adobe Corporate Logo.png|thumb|right|300px|[[w:Adobe Inc.]]'s logo. We can thank Adobe for publicly demonstrating their sound-like-anyone-machine in '''2016''' before an implementation was sold to criminal organizations.]]
+* '''2017''' | science | '''[http://grail.cs.washington.edu/projects/AudioToObama/ 'Synthesizing Obama: Learning Lip Sync from Audio' at grail.cs.washington.edu]'''. In SIGGRAPH 2017 by Supasorn Suwajanakorn et al. of the [[w:University of Washington]] presented an audio driven digital look-alike of upper torso of Barack Obama. It was driven only by a voice track as source data for the animation after the training phase to acquire [[w:lip sync]] and wider facial information from [[w:training material]] consisting 2D videos with audio had been completed.<ref name="Suw2017">{{Citation
+ | last = Suwajanakorn | first = Supasorn
+ | author-link =
+ | last2 = Seitz | first2 = Steven
+ | author2-link =
+ | last3 = Kemelmacher-Shlizerman | first3 = Ira
+ | author3-link =
+ | title = Synthesizing Obama: Learning Lip Sync from Audio
+ | publisher = [[University of Washington]]
+ | year = 2017
+ | url = http://grail.cs.washington.edu/projects/AudioToObama/
+ | access-date = 2020-06-26 }}
+</ref> <font color="green">'''Relevancy: certain'''</font>
+* '''<font color="red">2018</font>''' | <font color="red">science</font> and demonstration | '''[[w:Adobe Inc.]]''' publicly demonstrates '''[[w:Adobe Voco]]''', a '''sound-like-anyone machine''' [https://www.youtube.com/watch?v=I3l4XLZ59iw '#VoCo. Adobe Audio Manipulator Sneak Peak with Jordan Peele | Adobe Creative Cloud' on Youtube]. THe original Adobe Voco required '''20 minutes''' of sample '''to thieve a voice'''. <font color="green">'''Relevancy: certain'''</font>.
+[[File:GoogleLogoSept12015.png|thumb|right|300px|[[w:Google|Google]]'s logo. Google Research demonstrated their '''[https://google.github.io/tacotron/publications/speaker_adaptation/ sound-like-anyone-machine]''' at the '''2018''' [[w:Conference on Neural Information Processing Systems|Conference on Neural Information Processing Systems]] (NeurIPS). It requires only 5 seconds of sample to steal a voice.]]
+* '''<font color="red">2018</font>''' | <font color="red">science</font> and <font color="red">demonstration</font> | The work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented at the 2018 [[w:Conference on Neural Information Processing Systems]] (NeurIPS). The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results.
+* '''2019''' | demonstration | '''[https://www.thispersondoesnotexist.com/ 'Thispersondoesnotexist.com']''' (since February 2019) by Philip Wang. It showcases a [[w:StyleGAN]] at the task of making an endless stream of pictures that look like no-one in particular, but are eerily human-like. <font color="green">'''Relevancy: certain'''</font>
+* '''2019''' | demonstration | '''[http://whichfaceisreal.com/ 'Which Face is real?' at whichfaceisreal.com]''' is an easily unnerving game by [http://ctbergstrom.com/ Carl Bergstrom] and [https://jevinwest.org/ Jevin West] where you need to '''try to distinguish''' from a pair of photos '''which is real and which is not'''. A part of the "tools" of the [https://callingbullshit.org/ Calling Bullshit] course taught at the [[w:University of Washington]]. <font color="green">'''Relevancy: certain'''</font>
+----
 == Media perhaps about synthetic human-like fakes ==
@@ Line 175: / Line 232: @@
 *'''1st century''' | scripture | '''[[w:Book of Revelation]]'''. The task of writing down and smuggling out this early warning of what is to come is given by God to his servant John, who was imprisoned on the island of [[w:Patmos]].  See [[Biblical explanation - The books of Daniel and Revelations#Revelation 13|Biblical explanation - The books of Daniel and Revelations § Revelation 13]]. '''Caution''' to reader: contains '''explicit''' written information about the beasts.
-=== 1770's ===
-[[File:Kempelen Speakingmachine.JPG|right|thumb|300px|A replica of [[w:Wolfgang von Kempelen|Kempelen]]'s [[w:Wolfgang von Kempelen's Speaking Machine|speaking machine]], built 2007–09 at the Department of [[w:Phonetics|Phonetics]], [[w:Saarland University|Saarland University]], [[w:Saarbrücken|Saarbrücken]], Germany. This machine added models of the tongue and lips, enabling it to produce [[w:consonant|consonant]]s as well as [[w:vowel|vowel]]s]]
-* '''1779''' | science / discovery | [[w:Christian Gottlieb Kratzenstein]] won the first prize in a competition announced by the [[w:Russian Academy of Sciences]] for '''models''' he built of the '''human [[w:vocal tract]]''' that could produce the five long '''[[w:vowel]]''' sounds.<ref name="Helsinki">
-[http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html History and Development of Speech Synthesis], Helsinki University of Technology, Retrieved on November 4, 2006
-</ref> (Based on [[w:Speech synthesis#History]])
-* '''1791''' | science | '''[[w:Wolfgang von Kempelen's Speaking Machine]]''' of [[w:Wolfgang von Kempelen]] of [[w:Pressburg]], [[w:Hungary]], described in a 1791 paper was [[w:bellows]]-operated.<ref>''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine'' ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien).</ref> This machine added models of the tongue and lips, enabling it to produce [[w:consonant]]s as well as [[w:vowel]]s. (based on [[w:Speech synthesis#History]])
-=== 1970's ===
-* '''1971''' | science | '''[https://interstices.info/images-de-synthese-palme-de-la-longevite-pour-lombrage-de-gouraud/ 'Images de synthèse : palme de la longévité pour l’ombrage de Gouraud' (still photos)]'''. [[w:Henri Gouraud (computer scientist)]] made the first [[w:Computer graphics]] [[w:geometry]] [[w:digitization]] and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple [[w:wire-frame model]] and he applied [[w:Gouraud shading]] to produce the '''first known representation''' of '''human-likeness''' on computer. <ref>{{cite web|title=Images de synthèse : palme de la longévité pour l'ombrage de Gouraud|url=http://interstices.info/jcms/c_25256/images-de-synthese-palme-de-la-longevite-pour-lombrage-de-gouraud}}</ref>
-* '''1972''' | entertainment | '''[https://vimeo.com/59434349 'A Computer Animated Hand' on Vimeo]'''. [[w:A Computer Animated Hand]] by [[w:Edwin Catmull]] and [[w:Fred Parke]]. Relevancy: This was the '''first time''' that [[w:computer-generated imagery|computer-generated imagery]] was used in film to '''animate''' moving '''human-like appearance'''.
 === 1980's ===
@@ Line 208: / Line 248: @@
 * '''1998''' |  music video | '''[https://www.youtube.com/watch?v=FC-Kos_b1sE 'The Dope Show' by Marilyn Manson (lyric video) on Youtube]''' [https://www.youtube.com/watch?v=5R682M3ZEyk (official music video)] by [[w:Marilyn Manson (band)]] from the album [[w:Mechanical Animals]]. Relevancy: '''lyrics'''
-[[File:BSSDF01_400.svg|thumb|left|300px|Traditional [[w:Bidirectional reflectance distribution function|BRDF]] vs. [[w:subsurface scattering|subsurface scattering]] inclusive BSSRDF i.e. [[w:Bidirectional scattering distribution function#Overview of the BxDF functions|Bidirectional scattering-surface reflectance distribution function]]. An analytical BRDF must take into account the subsurface scattering, or the end result '''will not pass human testing'''.]]
-* <font color="red">'''1999'''</font> | <font color="red">'''science'''</font> | '''[http://dl.acm.org/citation.cfm?id=344855 'Acquiring the reflectance field of a human face' paper at dl.acm.org ]''' [[w:Paul Debevec]] et al. of [[w:University of Southern California]] did the '''first known reflectance capture''' over '''the human face''' with their extremely simple [[w:light stage]]. They presented their method and results in [[w:SIGGRAPH]] 2000. The scientific breakthrough required finding the [[w:subsurface scattering|w:subsurface light component]] (the simulation models are glowing from within slightly) which can be found using knowledge that light that is reflected from the oil-to-air layer retains its [[w:Polarization (waves)]] and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.<ref name="Deb2000"/>
 === 2000's ===
@@ Line 247: / Line 284: @@
 * '''2013''' | music video | '''[https://www.youtube.com/watch?v=ZWrUEsVrdSU 'Before Your Very Eyes' by Atoms For Peace (official music video) on Youtube]''' by [[w:Atoms for Peace (band)]] from their album [[w:Amok (Atoms for Peace album)]]. Relevancy: Watch the video
-* '''2013''' | demonstration | A '''[https://ict.usc.edu/pubs/Scanning%20and%20Printing%20a%203D%20Portrait%20of%20President%20Barack%20Obama.pdf 'Scanning and Printing a 3D Portrait of President Barack Obama' at ict.usc.edu]'''.  A 7D model and a 3D bust was made of President Obama with his consent. Relevancy: <font color="green">'''Relevancy: certain'''</font>
-* '''2016''' | science | '''[http://www.niessnerlab.org/projects/thies2016face.html 'Face2Face: Real-time Face Capture and Reenactment of RGB Videos' at Niessnerlab.org]''' A paper (with videos) on the semi-real-time 2D video manipulation with gesture forcing and lip sync forcing synthesis by Thies et al, Stanford. <font color="green">'''Relevancy: certain'''</font>
-[[File:Adobe Corporate Logo.png|thumb|right|300px|[[w:Adobe Inc.]]'s logo. We can thank Adobe for publicly demonstrating their sound-like-anyone-machine in '''2016''' before an implementation was sold to criminal organizations.]]
-* '''<font color="red">2018</font>''' | <font color="red">science</font> and demonstration | '''[[w:Adobe Inc.]]''' publicly demonstrates '''[[w:Adobe Voco]]''', a '''sound-like-anyone machine''' [https://www.youtube.com/watch?v=I3l4XLZ59iw '#VoCo. Adobe Audio Manipulator Sneak Peak with Jordan Peele | Adobe Creative Cloud' on Youtube]. THe original Adobe Voco required '''20 minutes''' of sample '''to thieve a voice'''. <font color="green">'''Relevancy: certain'''</font>.
 * '''2016''' | music video |'''[https://www.youtube.com/watch?v=ElvLZMsYXlo 'Voodoo In My Blood' (official music video) by Massive Attack on Youtube]''' by [[w:Massive Attack]] and featuring [[w:Tricky]] from the album [[w:Ritual Spirit]]. Relevancy: '''How many machines''' can you see in the same frame at times? If you answered one, look harder and make a more educated guess.
@@ Line 261: / Line 290: @@
 * '''2016''' | movie | '''[[w:Rogue One]]''' is a Star Wars film for which digital look-alikes of actors [[w:Peter Cushing]] and [[w:Carrie Fisher]] were made. In the film their appearance would appear to be of same age as the actors were during the filming of the original 1977 ''[[w:Star Wars (film)]]'' film.
-* '''2017''' | science | '''[http://grail.cs.washington.edu/projects/AudioToObama/ 'Synthesizing Obama: Learning Lip Sync from Audio' at grail.cs.washington.edu]'''. In SIGGRAPH 2017 by Supasorn Suwajanakorn et al. of the [[w:University of Washington]] presented an audio driven digital look-alike of upper torso of Barack Obama. It was driven only by a voice track as source data for the animation after the training phase to acquire [[w:lip sync]] and wider facial information from [[w:training material]] consisting 2D videos with audio had been completed.<ref name="Suw2017">{{Citation
- | last = Suwajanakorn | first = Supasorn
- | author-link =
- | last2 = Seitz | first2 = Steven
- | author2-link =
- | last3 = Kemelmacher-Shlizerman | first3 = Ira
- | author3-link =
- | title = Synthesizing Obama: Learning Lip Sync from Audio
- | publisher = [[University of Washington]]
- | year = 2017
- | url = http://grail.cs.washington.edu/projects/AudioToObama/
- | access-date = 2020-06-26 }}
-</ref> <font color="green">'''Relevancy: certain'''</font>
-[[File:GoogleLogoSept12015.png|thumb|right|300px|[[w:Google|Google]]'s logo. Google Research demonstrated their '''[https://google.github.io/tacotron/publications/speaker_adaptation/ sound-like-anyone-machine]''' at the '''2018''' [[w:Conference on Neural Information Processing Systems|Conference on Neural Information Processing Systems]] (NeurIPS). It requires only 5 seconds of sample to steal a voice.]]
-* '''<font color="red">2018</font>''' | <font color="red">science</font> and <font color="red">demonstration</font> | The work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented at the 2018 [[w:Conference on Neural Information Processing Systems]] (NeurIPS). The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results.
 * '''2018''' | music video | '''[https://www.youtube.com/watch?v=X8f5RgwY8CI&list=PLxKHVMqMZqUTgHYRSXfZN_JjItBzVTCau 'Simulation Theory' album by Muse on Youtube]''' by [[w:Muse (band)]] from the [[w:Simulation Theory (album)]]. '''Obs.''' "The Pause," "Watch What I Do" and "The Interlude" are not part of the album. Relevancy: Whole album
-* '''2019''' | demonstration | '''[https://www.thispersondoesnotexist.com/ 'Thispersondoesnotexist.com']''' (since February 2019) by Philip Wang. It showcases a [[w:StyleGAN]] at the task of making an endless stream of pictures that look like no-one in particular, but are eerily human-like. <font color="green">'''Relevancy: certain'''</font>
-* '''2019''' | demonstration | '''[http://whichfaceisreal.com/ 'Which Face is real?' at whichfaceisreal.com]''' is an easily unnerving game by [http://ctbergstrom.com/ Carl Bergstrom] and [https://jevinwest.org/ Jevin West] where you need to '''try to distinguish''' from a pair of photos '''which is real and which is not'''. A part of the "tools" of the [https://callingbullshit.org/ Calling Bullshit] course taught at the [[w:University of Washington]]. <font color="green">'''Relevancy: certain'''</font>
 === 2020's ===