Synthetic human-like fakes

From Stop Synthetic Filth! wiki
(Redirected from Digital sound-alikes)
Jump to navigation Jump to search

When the camera does not exist, but the subject being imaged with a simulation of a (movie) camera deceives the watcher to believe it is some living or dead person it is a digital look-alike.

When it cannot be determined by human testing or media forensics whether some fake voice is a synthetic fake of some person's voice, or is it an actual recording made of that person's actual real voice, it is a pre-recorded digital sound-alike.


Image 2 (low resolution rip)
(1) Sculpting a morphable model to one single picture
(2) Produces 3D approximation
(4) Texture capture
(3) The 3D model is rendered back to the image with weight gain
(5) With weight loss
(6) Looking annoyed
(7) Forced to smile Image 2 by Blanz and Vettel – Copyright ACM 1999 – http://dl.acm.org/citation.cfm?doid=311535.311556 – Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
See Biblical explanation - The books of Daniel and Revelation to see the advance warning for our time that we were given in 6th century BC and then again in 1st century.

'Saint John on Patmos' pictures w:John of Patmos on w:Patmos writing down the visions to make the w:Book of Revelation. Picture from folio 17 of the w:Très Riches Heures du Duc de Berry (1412-1416) by the w:Limbourg brothers. Currently located at the w:Musée Condé 40km north of Paris, France.

Digital look-alikes[edit]

It is recommended that you watch In Event of Moon Disaster - FULL FILM (2020) at the moondisaster.org project website (where it has interactive portions) by the Center for Advanced Virtuality of the w:MIT


Introduction to digital look-alikes[edit]

Image 1: Separating specular and diffuse reflected light

(a) Normal image in dot lighting

(b) Image of the diffuse reflection which is caught by placing a vertical polarizer in front of the light source and a horizontal in the front the camera

(c) Image of the highlight specular reflection which is caught by placing both polarizers vertically

(d) Subtraction of c from b, which yields the specular component

Images are scaled to seem to be the same luminosity.

Original image by Debevec et al. – Copyright ACM 2000 – https://dl.acm.org/citation.cfm?doid=311779.344855 – Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Subtraction of the diffuse reflection from the specular reflection yields the specular component of the model's reflectance.

Original picture by w:Paul Debevec et al. - Copyright ACM 2000 https://dl.acm.org/citation.cfm?doid=311779.344855

In the cinemas we have seen digital look-alikes for over 15 years. These digital look-alikes have "clothing" (a simulation of clothing is not clothing) or "superhero costumes" and "superbaddie costumes", and they don't need to care about the laws of physics, let alone laws of physiology. It is generally accepted that digital look-alikes made their public debut in the sequels of The Matrix i.e. w:The Matrix Reloaded and w:The Matrix Revolutions released in 2003. It can be considered almost certain, that it was not possible to make these before the year 1999, as the final piece of the puzzle to make a (still) digital look-alike that passes human testing, the reflectance capture over the human face, was made for the first time in 1999 at the w:University of Southern California and was presented to the crème de la crème of the computer graphics field in their annual gathering SIGGRAPH 2000.[1]


“Do you think that was w:Hugo Weaving's left cheekbone that w:Keanu Reeves punched in with his right fist?”

~ Trad on The Matrix Revolutions



The problems with digital look-alikes[edit]

Extremely unfortunately for the humankind, organized criminal leagues, that posses the weapons capability of making believable looking synthetic pornography, are producing on industrial production pipelines synthetic terror porn[footnote 1] by animating digital look-alikes and distributing it in the murky Internet in exchange for money stacks that are getting thinner and thinner as time goes by.

These industrially produced pornographic delusions are causing great humane suffering, especially in their direct victims, but they are also tearing our communities and societies apart, sowing blind rage, perceptions of deepening chaos, feelings of powerlessness and provoke violence. This hate illustration increases and strengthens hate thinking, hate speech, hate crimes and tears our fragile social constructions apart and with time perverts humankind's view of humankind into an almost unrecognizable shape, unless we interfere with resolve.

List of possible naked digital look-alike attacks[edit]

  • The classic "portrayal of as if in involuntary sex"-attack. (Digital look-alike "cries")
  • "Sexual preference alteration"-attack. (Digital look-alike "smiles")
  • "Cutting / beating"-attack (Constructs a deceptive history for genuine scars)
  • "Mutilation"-attack (Digital look-alike "dies")
  • "Unconscious and injected"-attack (Digital look-alike gets "disease")

Age analysis and rejuvenating and aging syntheses[edit]

Temporal limit of digital look-alikes[edit]

A picture of the 1895 w:Cinematograph

w:History of film technology has information about where the border is.

Digital look-alikes cannot be used to attack people who existed before the technological invention of film. For moving pictures the breakthrough is attributed to w:Auguste and Louis Lumière's w:Cinematograph premiered in Paris on 28 December 1895, though this was only the commercial and popular breakthrough, as even earlier moving pictures exist. (adapted from w:History of film)

The w:Kinetoscope is an even earlier motion picture exhibition device. A prototype for the Kinetoscope was shown to a convention of the National Federation of Women's Clubs on May 20, 1891.[2] The first public demonstration of the Kinetoscope was held at the Brooklyn Institute of Arts and Sciences on May 9, 1893. (Wikipedia)[2]



Digital sound-alikes[edit]

A picture of a cut-away titled "Voice-terrorist could mimic a leader" from a 2012 w:Helsingin Sanomat warning that the sound-like-anyone machines are approaching. Thank you to homie Prof. David Martin Howard of the w:University of York, UK and the anonymous editor for the heads-up.

Living people can defend[footnote 2] themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.

For these reasons the bannable raw materials i.e. covert voice models should be prohibited by law in order to protect humans from abuse by criminal parties.

Documented digital sound-alike attacks[edit]


'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis' 2018 by Google Research (external transclusion)[edit]

Observe how good the "VCTK p240" system is at deceiving to think that it is a person that is doing the talking.

The Iframe above is transcluded from 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"' at google.gituhub.io, the audio samples of a sound-like-anyone machine presented as at the 2018 w:NeurIPS conference by Google researchers.

Digital sing-alikes[edit]

The to the right video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube describes the voice thieving machine presented by Google Research in w:NeurIPS 2018.

Video video 'This AI Clones Your Voice After Listening for 5 Seconds' by '2 minute papers' at YouTube describes the voice thieving machine by Google Research in w:NeurIPS 2018.

As of 2020 the digital sing-alikes may not yet be here, but when we hear a faked singing voice and we cannot hear that it is fake, then we will know. An ability to sing does not seem to add much hostile capabilities compared to the ability to thieve spoken word.



Example of a hypothetical 4-victim digital sound-alike attack[edit]

A very simple example of a digital sound-alike attack is as follows:

Someone puts a digital sound-alike to call somebody's voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:

  1. Victim #1 - The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
  2. Victim #2 - The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
  3. Victim #3 - It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
  4. Victim #4 - Our judiciary which prosecutes and possibly convicts the innocent victim #1.

Thus it is high time to act and to criminalize the covert modeling of human voice!

Examples of speech synthesis software not quite able to fool a human yet[edit]

Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.

Reporting on the sound-like-anyone-machines[edit]

Temporal limit of digital sound-alikes[edit]

w:Thomas Edison and his early w:phonograph. Cropped from w:Library of Congress copy, ca. 1877, (probably 18 April 1878)

The temporal limit of whom, dead or living, the digital sound-alikes can attack is defined by the w:history of sound recording.

The article starts by mentioning that the invention of the w:phonograph by w:Thomas Edison in 1877 is considered the start of sound recording.

The phonautograph is the earliest known device for recording w:sound. Previously, tracings had been obtained of the sound-producing vibratory motions of w:tuning forks and other objects by physical contact with them, but not of actual sound waves as they propagated through air or other media. Invented by Frenchman W:Édouard-Léon Scott de Martinville, it was patented on March 25, 1857.[5]

Apparently, it did not occur to anyone before the 1870s that the recordings, called phonautograms, contained enough information about the sound that they could, in theory, be used to recreate it. Because the phonautogram tracing was an insubstantial two-dimensional line, direct physical playback was impossible in any case. Several phonautograms recorded before 1861 were successfully played as sound in 2008 by optically scanning them and using a computer to process the scans into digital audio files. (Wikipedia)

A w:spectrogram of a male voice saying 'nineteenth century'

Text syntheses[edit]

w:Chatbots have existed for a longer time, but only now armed with AI they are becoming more deceiving.

In w:natural language processing development in w:natural-language understanding leads to more cunning w:natural-language generation AI.

w:OpenAI's w:Generative Pre-trained Transformer (GPT) is a left-to-right w:transformer (machine learning model)-based text generation model succeeded by w:GPT-2 and w:GPT-3

Reporting / announcements

External links

Countermeasures against synthetic human-like fakes[edit]

Organizations against synthetic human-like fakes[edit]

The Defense Advanced Research Projects Agency, better known as w:DARPA has been active in the field of countering synthetic fake video for longer than the public has been aware of the problems existing.
w:California w:Senator w:Connie Leyva introduced California Senate Bill SB 564 in Feb 2019. It has been endorsed by SAG-AFTRA, but has not yet passed.

Events against synthetic human-like fakes[edit]

  • 2018 | w:NIST NIST 'Media Forensics Challenge 2018' at nist.gov was the second annual evaluation to support research and help advance the state of the art for image and video forensics technologies – technologies that determine the region and type of manipulations in imagery (image/video data) and the phylogenic process that modified the imagery.

Studies against synthetic human-like fakes[edit]

Search for more

Companies against synthetic human-like fakes[edit]


SSF! wiki proposed countermeasure to synthetic porn: Adequate Porn Watcher AI (transcluded)[edit]

Transcluded main contents from Adequate Porn Watcher AI (concept)

Adequate Porn Watcher AI (APW_AI) is an w:AI and w:computer vision concept to search for any and all porn that should not be by watching and modeling all porn ever found on the w:Internet thus effectively protecting humans by exposing covert naked digital look-alike attacks and also other contraband.

The method and the effect

The method by which APW_AI would be providing safety and security to its users, is that they can briefly upload a model they've gotten of themselves and then the APW_AI will either say nothing matching found or it will be of the opinion that something matching found.

If people are able to check whether there is synthetic porn that looks like themselves, this causes synthetic hate-illustration industrialists' product lose destructive potential and the attacks that happen are less destructive as they are exposed by the APW_AI and thus decimate the monetary value of these disinformation weapons to the criminals.

If you feel comfortable to leave your model with the good people at the benefactor for safekeeping you get alerted and help if you ever get attacked with a synthetic porn attack.

Rules

Looking up if matches are found for anyone else's model is forbidden and this should probably be enforced with a facial w:biometric w:facial recognition system app that checks that the model you want checked is yours and that you are awake.

Definition of adequacy

An adequate implementation should be nearly free of false positives, very good at finding true positives and able to process more porn than is ever uploaded.

What about the people in the porn-industry?

People who openly do porn can help by opting-in to help in the development by providing training material and material to test the AI on. People and companies who help in training the AI naturally get credited for their help.

There are of course lots of people-questions to this and those questions need to be identified by professionals of psychology and social sciences.

History

The idea of APW_AI occurred to User:Juho Kunsola on Friday 2019-07-12. Subsequently (the next day) this discovery caused the scrapping of the plea to ban convert modeling of human appearance as that would have rendered APW_AI legally impossible.


Possible legal response: Outlawing digital sound-alikes (transcluded)[edit]

Transcluded from Juho's proposal on banning digital sound-alikes


§1 Covert modeling of a human voice[edit]

Acquiring such a model of a human's voice, that deceptively resembles some dead or living person's voice model of human voice, possession, purchase, sale, yielding, import and export without the express consent of the target is punishable.

§2 Application of covert voice models[edit]

Producing and making available media from a covert voice model is punishable.

§3 Aggravated application of covert voice models[edit]

If produced media is for a purpose to

  • frame a human target or targets for crimes
  • to attempt extortion or
  • to defame the target,

the crime should be judged as aggravated.


Timeline of synthetic human-like fakes[edit]

2020's synthetic human-like fakes[edit]

  • 2020 | Chinese legislation | On January 1 2020 Chinese law requiring that synthetically faked footage should bear a clear notice about its fakeness came into effect. Failure to comply could be considered a w:crime the w:Cyberspace Administration of China stated on its website. China announced this new law in November 2019.[12] The Chinese government seems to be reserving the right to prosecute both users and w:online video platforms failing to abide by the rules. [13]


2010's synthetic human-like fakes[edit]

  • 2019 | US state law | Since September 1 2019 w:Texas senate bill SB 751 w:amendments to the election code came into effect, giving w:candidates in w:elections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. The law text defines the subject of the law as "a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality"[14]



  • 2019 | demonstration | 'Thispersondoesnotexist.com' (since February 2019) by Philip Wang. It showcases a w:StyleGAN at the task of making an endless stream of pictures that look like no-one in particular, but are eerily human-like. Relevancy: certain
w:Google's logo. Google Research demonstrated their sound-like-anyone-machine at the 2018 w:Conference on Neural Information Processing Systems (NeurIPS). It requires only 5 seconds of sample to steal a voice.
  • 2018 | controversy / demonstration | The w:deepfakes controversy surfaces where porn videos were doctored utilizing w:deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting.
w:Adobe Inc.'s logo. We can thank Adobe for publicly demonstrating their sound-like-anyone-machine in 2016 before an implementation was sold to criminal organizations.
#w:Adobe Voco. Adobe Audio Manipulator Sneak Peak with w:Jordan Peele (at Youtube.com). November 2016 demonstration of a Adobe's unreleased sound-like-anyone-machine, the w:Adobe Voco at the w:Adobe MAX 2016 event in w:San Diego, w:California. The original Adobe Voco required 20 minutes of sample to thieve a voice.
  • 2013 | demonstration | At the 2013 SIGGGRAPH w:Activision and USC presented a w:real time computing "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist,[23] utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture.[24] The end result both precomputed and real-time rendering with the modernest game w:GPU shown here and looks fairly realistic.

2000's synthetic human-like fakes[edit]

  • 2009 | movie | A digital look-alike of a younger w:Arnold Schwarzenegger was made for the movie w:Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger.
  • 2009 | demonstration | Paul Debevec: 'Animating a photo-realistic face' at ted.com Debevec et al. presented new digital likenesses, made by w:Image Metrics, this time of actress w:Emily O'Brien whose reflectance was captured with the USC light stage 5. At 00:04:59 you can see two clips, one with the real Emily shot with a real camera and one with a digital look-alike of Emily, shot with a simulation of a camera - Which is which is difficult to tell. Bruce Lawmen was scanned using USC light stage 6 in still position and also recorded running there on a w:treadmill. Many, many digital look-alikes of Bruce are seen running fluently and natural looking at the ending sequence of the TED talk video. [25] Motion looks fairly convincing contrasted to the clunky run in the w:Animatrix: Final Flight of the Osiris which was w:state-of-the-art in 2003 if photorealism was the intention of the w:animators.
Traditional w:BRDF vs. subsurface scattering inclusive BSSRDF i.e. w:Bidirectional scattering-surface reflectance distribution function.

An analytical BRDF must take into account the subsurface scattering, or the end result will not pass human testing.
Music video for Bullet by w:Covenant from 2002. Here you can observe the classic "skin looks like cardboard"-bug that stopped the pre-reflectance capture era versions from passing human testing.
  • 2002 | music video | 'Bullet' by Covenant on Youtube by w:Covenant (band) from their album w:Northern Light (Covenant album). Relevancy: Contains the best upper-torso digital look-alike of Eskil Simonsson (vocalist) that their organization could procure at the time. Here you can observe the classic "skin looks like cardboard"-bug (assuming this was not intended) that thwarted efforts to make digital look-alikes that pass human testing before the reflectance capture and dissection in 1999 by w:Paul Debevec et al. at the w:University of Southern California and subsequent development of the "Analytical w:BRDF" (quote-unquote) by ESC Entertainment, a company set up for the sole purpose of making the cinematography for the 2003 films Matrix Reloaded and Matrix Revolutions possible, lead by George Borshukov.

1990's synthetic human-like fakes[edit]

1970's synthetic human-like fakes[edit]

w:A Computer Animated Hand is a 1972 short film by w:Edwin Catmull and w:Fred Parke. This was the first time that w:computer-generated imagery was used in film to animate likenesses of moving human appearance.
  • 1976 | movie | w:Futureworld reused parts of A Computer Animated Hand on the big screen.

1770's synthetic human-like fakes[edit]

A replica of w:Wolfgang von Kempelen's w:Wolfgang von Kempelen's Speaking Machine, built 2007–09 at the Department of w:Phonetics, w:Saarland University, w:Saarbrücken, Germany. This machine added models of the tongue and lips, enabling it to produce w:consonants as well as w:vowels

Footnotes[edit]

  1. It is terminologically more precise, more inclusive and more useful to talk about 'synthetic terror porn', if we want to talk about things with their real names, than 'synthetic rape porn', because also synthesizing recordings of consentual looking sex scenes can be terroristic in intent.
  2. Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice.

1st seen in[edit]

References[edit]

  1. 1.0 1.1 Debevec, Paul (2000). "Acquiring the reflectance field of a human face". Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH '00. ACM. pp. 145–156. doi:10.1145/344779.344855. ISBN 978-1581132083. Retrieved 2020-06-27.
  2. 2.0 2.1 "Inventing Entertainment: The Early Motion Pictures and Sound Recordings of the Edison Companies". Memory.loc.gov. w:Library of Congress. Retrieved 2020-12-09.
  3. "Fake voices 'help cyber-crooks steal cash'". w:bbc.com. w:BBC. 2019-07-08. Retrieved 2020-07-22.
  4. Drew, Harwell (2020-04-16). "An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft". w:washingtonpost.com. w:Washington Post. Retrieved 2019-07-22.
  5. Flatow, Ira (April 4, 2008). "1860 'Phonautograph' Is Earliest Known Recording". NPR. Retrieved 2012-12-09.
  6. https://web.archive.org/web/20160630154819/https://www.darpa.mil/program/media-forensics
  7. https://web.archive.org/web/20191108090036/https://www.darpa.mil/program/semantic-forensics November
  8. https://venturebeat.com/2020/06/12/facebook-detection-challenge-winners-spot-deepfakes-with-82-accuracy/
  9. https://www.partnershiponai.org/aiincidentdatabase/
  10. Johnson, R.J. (2019-12-30). "Here Are the New California Laws Going Into Effect in 2020". KFI. iHeartMedia. Retrieved 2021-01-23.
  11. Mihalcik, Carrie (2019-10-04). "California laws seek to crack down on deepfakes in politics and porn". w:cnet.com. w:CNET. Retrieved 2021-01-23.
  12. "China seeks to root out fake news and deepfakes with new online content rules". w:Reuters.com. w:Reuters. 2019-11-29. Retrieved 2021-01-23.
  13. Statt, Nick (2019-11-29). "China makes it a criminal offense to publish deepfakes or fake news without disclosure". w:The Verge. Retrieved 2021-01-23.
  14. "Relating to the creation of a criminal offense for fabricating a deceptive video with intent to influence the outcome of an election". w:Texas. 2019-06-14. Retrieved 2021-01-23. In this section, "deep fake video" means a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality
  15. "New state laws go into effect July 1".
  16. 16.0 16.1 "§ 18.2-386.2. Unlawful dissemination or sale of images of another; penalty". w:Virginia. Retrieved 2021-01-23.
  17. "NVIDIA Open-Sources Hyper-Realistic Face Generator StyleGAN". Medium.com. 2019-02-09. Retrieved 2020-07-13.
  18. Harwell, Drew (2018-12-30). "Fake-porn videos are being weaponized to harass and humiliate women: 'Everybody is a potential target'". w:The Washington Post. Retrieved 2020-07-13. In September [of 2018], Google added “involuntary synthetic pornographic imagery” to its ban list
  19. Kuo, Lily (2018-11-09). "World's first AI news anchor unveiled in China". Retrieved 2020-07-13.
  20. Hamilton, Isobel Asher (2018-11-09). "China created what it claims is the first AI news anchor — watch it in action here". Retrieved 2020-07-13.
  21. Suwajanakorn, Supasorn; Seitz, Steven; Kemelmacher-Shlizerman, Ira (2017), Synthesizing Obama: Learning Lip Sync from Audio, University of Washington, retrieved 2020-07-13
  22. Giardina, Carolyn (2015-03-25). "'Furious 7' and How Peter Jackson's Weta Created Digital Paul Walker". The Hollywood Reporter. Retrieved 2020-07-13.
  23. ReForm - Hollywood's Creating Digital Clones (youtube). The Creators Project. 2020-07-13.
  24. Debevec, Paul. "Digital Ira SIGGRAPH 2013 Real-Time Live". Retrieved 2017-07-13.
  25. In this TED talk video at 00:04:59 you can see two clips, one with the real Emily shot with a real camera and one with a digital look-alike of Emily, shot with a simulation of a camera - Which is which is difficult to tell. Bruce Lawmen was scanned using USC light stage 6 in still position and also recorded running there on a w:treadmill. Many, many digital look-alikes of Bruce are seen running fluently and natural looking at the ending sequence of the TED talk video.
  26. Pighin, Frédéric. "Siggraph 2005 Digital Face Cloning Course Notes" (PDF). Retrieved 2020-06-26.
  27. https://ict.usc.edu/about/
  28. "Images de synthèse : palme de la longévité pour l'ombrage de Gouraud".
  29. Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien).
  30. History and Development of Speech Synthesis, Helsinki University of Technology, Retrieved on November 4, 2006