When the camera does not exist, but the subject being imaged with a simulation of a (movie) camera deceives the watcher to believe it is some living or dead person it is a digital look-alike.
In 2017-2018 this started to be referred to as w:deepfake, even though altering video footage of humans with a computer with a deceiving effect is actually 20 yrs older than the name "deep fakes" or "deepfakes".[1][2]
When it cannot be determined by human testing or media forensics whether some fake voice is a synthetic fake of some person's voice, or is it an actual recording made of that person's actual real voice, it is a pre-recorded digital sound-alike. This is now commonly referred to as w:audio deepfake.
Real-time digital look-and-sound-alike in a video call was used to defraud a substantial amount of money in 2023.[3]
Click on the picture or Obama's appearance thieved - a public service announcement digital look-alike by Monkeypaw Productions and Buzzfeed to view an April 2018 public service announcement moving digital look-alike made to appear Obama-like. The video is accompanied with imitator sound-alike, and was made by w:Monkeypaw Productions(.com) in conjunction with w:BuzzFeed(.com). You can also View the same video at YouTube.com.[4]Image 2 (low resolution rip) shows a 1999 technique for sculpting a morphable model, till it matches the target's appearance. (1) Sculpting a morphable model to one single picture (2) Produces 3D approximation (4) Texture capture (3) The 3D model is rendered back to the image with weight gain (5) With weight loss (6) Looking annoyed (7) Forced to smile Image 2 by Blanz and Vettel – Copyright ACM 1999 – http://dl.acm.org/citation.cfm?doid=311535.311556 – Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Image 1: Separating specular and diffuse reflected light
(a) Normal image in dot lighting
(b) Image of the diffuse reflection which is caught by placing a vertical polarizer in front of the light source and a horizontal in the front the camera
(c) Image of the highlight specular reflection which is caught by placing both polarizers vertically
(d) Subtraction of c from b, which yields the specular component
Images are scaled to seem to be the same luminosity.
Original image by Debevec et al. – Copyright ACM 2000 – https://dl.acm.org/citation.cfm?doid=311779.344855 – Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Subtraction of the diffuse reflection from the specular reflection yields the specular component of the model's reflectance.
In the cinemas we have seen digital look-alikes for over 20 years. These digital look-alikes have "clothing" (a simulation of clothing is not clothing) or "superhero costumes" and "superbaddie costumes", and they don't need to care about the laws of physics, let alone laws of physiology. It is generally accepted that digital look-alikes made their public debut in the sequels of The Matrix i.e. w:The Matrix Reloaded and w:The Matrix Revolutions released in 2003. It can be considered almost certain, that it was not possible to make these before the year 1999, as the final piece of the puzzle to make a (still) digital look-alike that passes human testing, the reflectance capture over the human face, was made for the first time in 1999 at the w:University of Southern California and was presented to the crème de la crème
of the computer graphics field in their annual gathering SIGGRAPH 2000.[5]
Extremely unfortunately for the humankind, organized criminal leagues, that posses the weapons capability of making believable looking synthetic pornography, are producing on industrial production pipelines terroristic synthetic pornography[footnote 1] by animating digital look-alikes and distributing it in the murky Internet in exchange for money stacks that are getting thinner and thinner as time goes by.
These industrially produced pornographic delusions are causing great human suffering, especially in their direct victims, but they are also tearing our communities and societies apart, sowing blind rage, perceptions of deepening chaos, feelings of powerlessness and provoke violence.
These kinds of hate illustration increases and strengthens hate feeling, hate thinking, hate speech and hate crimes and tears our fragile social constructions apart and with time perverts humankind's view of humankind into an almost unrecognizable shape, unless we interfere with resolve.
Fixing the problems from digital look-alikes[edit | edit source]
We need to move on 3 fields: legal, technological and cultural.
Technological: Computer vision system like FacePinPoint.com for seeking unauthorized pornography / nudes used to exist 2017-2021 and could be revived if funding is found. It was a service practically identical with SSFWIKI original concept Adequate Porn Watcher AI (concept).
Legal: Legislators around the planet have been waking up to this reality that not everything that seems a video of people is a video of people and various laws have been passed to protect humans and humanity from the menaces of synthetic human-like fakes, mostly digital look-alikes so far, but hopefully humans will be protected also fro other aspects of synthetic human-like fakes by laws. See Laws against synthesis and other related crimes
Age analysis and rejuvenating and aging syntheses[edit | edit source]
Digital look-alikes cannot be used to attack people who existed before the technological invention of film. For moving pictures the breakthrough is attributed to w:Auguste and Louis Lumière's w:Cinematograph premiered in Paris on 28 December 1895, though this was only the commercial and popular breakthrough, as even earlier moving pictures exist. (adapted from w:History of film)
The w:Kinetoscope is an even earlier motion picture exhibition device. A prototype for the Kinetoscope was shown to a convention of the National Federation of Women's Clubs on May 20, 1891.[6] The first public demonstration of the Kinetoscope was held at the Brooklyn Institute of Arts and Sciences on May 9, 1893. (Wikipedia)[6]
The university's foundation has applied for a patent and let us hope that they will w:copyleft the patent as this protective method needs to be rolled out to protect the humanity.
This work was done by PhD student Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler and Professor Patrick Traynor.
On known history of digital sound-alikes[edit | edit source]
A picture of a cut-away titled "Voice-terrorist could mimic a leader" from a 2012 w:Helsingin Sanomat warning that the sound-like-anyone machines are approaching. Thank you to homie Prof. David Martin Howard of the w:University of York, UK and the anonymous editor for the heads-up.
The first English speaking digital sound-alikes were first introduced in 2016 by Adobe and Deepmind, but neither of them were made publicly available.
The researchers state Fugatto is a versatile audio synthesis and transformation model capable of following
free-form text instructions with optional audio inputs. [9]
Documented crimes with digital sound-alikes[edit | edit source]
In 2019 reports of crimes being committed with digital sound-alikes started surfacing. As of Jan 2022 no reports of other types of attack than fraud have been found.
By 2019 digital sound-alike anyone technology found its way to the hands of criminals. In 2019Symantec researchers knew of 3 cases where digital sound-alike technology had been used for w:crime.[10]
Of these crimes the most publicized was a fraud case in March 2019 where 220,000€ were defrauded with the use of a real-time digital sound-alike.[11] The company that was the victim of this fraud had bought some kind of cyberscam insurance from French insurer w:Euler Hermes and the case came to light when Mr. Rüdiger Kirsch of Euler Hermes informed w:The Wall Street Journal about it.[12]
Reporting on the 2019 digital sound-alike enabled fraud
In June 2020 fraud was attempted with a poor quality pre-recorded digital sound-alike with delivery method was voicemail. (Listen to a redacted clip at soundcloud.com) The recipient in a tech company didn't believe the voicemail to be real and alerted the company and they realized that someone tried to scam them. The company called in Nisos to investigate the issue. Nisos analyzed the evidence and they were certain it was a fake, but had aspects of a cut-and-paste job to it. Nisos prepared a report titled "The Rise of Synthetic Audio Deepfakes" at nisos.com on the issue and shared it with Motherboard, part of w:Vice (magazine) prior to its release.[14]
The 2nd publicly known fraud done with a digital sound-alike[1st seen in 1] took place on Friday 2021-01-15. A bank in Hong Kong was manipulated to wire money to numerous bank accounts by using a voice stolen from one of the their client company's directors. They managed to defraud $35 million of the U.A.E. based company's money.[15]. This case came into light when Forbes saw a document where the U.A.E. financial authorities were seeking administrative assistance from the US authorities towards recovering a small portion of the defrauded money that had been sent to bank accounts in the USA.[15]
Reporting on the 2021 digital sound-alike enabled fraud
Example of a hypothetical 4-victim digital sound-alike attack[edit | edit source]
A very simple example of a digital sound-alike attack is as follows:
Someone puts a digital sound-alike to call somebody's voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:
Victim #1 - The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
Victim #2 - The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
Victim #3 - It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
Victim #4 - Our judiciary which prosecutes and possibly convicts the innocent victim #1.
Examples of speech synthesis software not quite able to fool a human yet[edit | edit source]
Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.
The temporal limit of whom, dead or living, the digital sound-alikes can attack is defined by the w:history of sound recording.
The article starts by mentioning that the invention of the w:phonograph by w:Thomas Edison in 1877 is considered the start of sound recording.
The phonautograph is the earliest known device for recording w:sound. Previously, tracings had been obtained of the sound-producing vibratory motions of w:tuning forks and other objects by physical contact with them, but not of actual sound waves as they propagated through air or other media. Invented by Frenchman W:Édouard-Léon Scott de Martinville, it was patented on March 25, 1857.[16]
Apparently, it did not occur to anyone before the 1870s that the recordings, called phonautograms, contained enough information about the sound that they could, in theory, be used to recreate it. Because the phonautogram tracing was an insubstantial two-dimensional line, direct physical playback was impossible in any case. Several phonautograms recorded before 1861 were successfully played as sound in 2008 by optically scanning them and using a computer to process the scans into digital audio files. (Wikipedia)
A w:spectrogram of a male voice saying 'nineteenth century'
What should we do about digital sound-alikes?[edit | edit source]
Living people can defend[footnote 2] themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.
For these reasons the bannable raw materials i.e. covert voice models should be prohibited by law in order to protect humans from abuse by criminal parties.
AI Text Classifier at platform.openai.com- The AI Text Classifier is a fine-tuned GPT model that predicts how likely it is that a piece of text was generated by AI from a variety of sources, such as ChatGPT. (free account required)
ZeroGPT at zerogpt.com[1st seen in 2] - GPT-4 And ChatGPT detector by ZeroGPT: detect OpenAI text - ZeroGPT the most Advanced and Reliable Chat GPT and GPT-4 detector tool (try for free)
Defensively, to hide one's handwriting style from public view
Offensively, to thieve somebody else's handwriting style
If the handwriting-like synthesis passes human and media forensics testing, it is a digital handwrite-alike.
Here we find a possible risk similar to that which became a reality, when the w:speaker recognition systems turned out to be instrumental in the development of digital sound-alikes. After the knowledge needed to recognize a speaker was w:transferred into a generative task in 2018 by Google researchers, we no longer cannot effectively determine for English speakers which recording is human in origin and which is from a machine origin.
Calligrapher.ai - Realistic computer-generated handwriting - The user may control parameters: speed, legibility, stroke width and style. The domain is registered by some organization in Iceland and the website offers no about-page[1st seen in 4]. According to this reddit post Calligrapher.ai is based on Graves' 2013 work, but "adds an w:inference model to allow for sampling latent style vectors (similar to the VAE model used by SketchRNN)".[18]
Handwriting recognition
w:Handwriting recognition (HWR), also known as Handwritten Text Recognition (HTR), is the ability of a computer to receive and interpret intelligible w:handwritten input (Wikipedia)
As of 2020 the digital sing-alikes may not yet be here, but when we hear a faked singing voice and we cannot hear that it is fake, then we will know. An ability to sing does not seem to add much hostile capabilities compared to the ability to thieve spoken word.
2023 | Real-time digital look-and-sound-alike crime | In April a man in northern China was defrauded of 4.3 million yuan by a criminal employing a digital look-and-sound-alike pretending to be his friend on a video call made with a stolen messaging service account.[3]
"Ahead of the election in Turkey, President Recep Tayyip Erdogan showed a video linking his main challenger Kemal Kilicdaroglu to the militant Kurdish organization PKK." [...] "Research by DW's fact-checking team in cooperation with DW's Turkish service shows that the video at the campaign rally was manipulated by combining two separate videos with totally different backgrounds and content." reports dw.com
2023 | January 1st | Law | Law on sexual offences in Finland 2023 is found in Chapter 20 of the Finnish Criminal Code titled "Seksuaalirikoksista" ("Sexual offences") and came into effect on Sunday 2023-01-01.[21]
The new law in Finland protects adults against sexual image based abuse be it real or synthetic in origin.
7 § Non-consensual dissemination of a sexual image criminalizes distribution of unauthorized real and synthetic sexual images without permission. (7 § Seksuaalisen kuvan luvaton levittäminen[21])
19 § Distribution of an image depicting a child in a sexual manner[21] criminalizes the distribution of real and syntheticchild sexual abuse material (CSAM). Attempting this crime is also punishable. (19 § Lasta seksuaalisesti esittävän kuvan levittäminen[21])
This 2023 upgrade and gather-together of the Finnish Criminal Code on sexual offences was made upon the initiative of the 2019-2023 w:Marin Cabinet, was voted into law by the w:Members of the Parliament of Finland, 2019–2023 and it came into effect on Sunday 2023-01-01.
2022 | science and demonstration | w:OpenAI(.com) published w:ChatGPT, a discutational AI accessible with a free account at chat.openai.com. Initial version was published on 2022-11-30.
This work was done by PhD student Logan Blue, Kevin Warren, Hadi Abdullah, Cassidy Gibson, Luis Vargas, Jessica O’Dell, Kevin Butler and Professor Patrick Traynor.
2022 | disinformation attack | In June 2022 a fake digital look-and-sound-alike in the appearance and voice of w:Vitali Klitschko, mayor of w:Kyiv, held fake video phone calls with several European mayors. The Germans determined that the video phone call was fake by contacting the Ukrainian officials. This attempt at covert disinformation attack was originally reported by w:Der Spiegel.[22][23]
2022 | science | w:DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles" was published in April 2022.[24] (Wikipedia)
2022 | counter-measure | Protecting President Zelenskyy against deep fakes'Protecting President Zelenskyy against Deep Fakes' at arxiv.org[25] by Matyáš Boháček of Johannes Kepler Gymnasium and w:Hany Farid, the dean and head of of w:Berkeley School of Information at the University of California, Berkeley. This brief paper describes their automated digital look-alike detection system and evaluate its efficacy and reliability in comparison to humans with untrained eyes. Their work provides automated evaluation tools to catch so called "deep fakes" and their motivation seems to have been to find automation armor against disinformation warfare against humans and the humanity. Automated digital media forensics is a very good idea explored by many. Boháček and Farid 2022 detection system works by evaluating both facial mannerisms as well as gestural mannerisms to detect the non-human ones from the ones that are human in origin. Preprint published in February 2022 and submitted to w:arXiv in June 2022
2021 | crime / fraud | The 2nd publicly known fraud done with a digital sound-alike[1st seen in 1] took place on Friday 2021-01-15. A bank in Hong Kong was manipulated to wire money to numerous bank accounts by using a voice stolen from one of the their client company's directors. They managed to defraud $35 million of the U.A.E. based company's money.[15]. This case came into light when Forbes saw a document where the U.A.E. financial authorities were seeking administrative assistance from the US authorities towards recovering a small portion of the defrauded money that had been sent to bank accounts in the USA.[15]
Reporting on the 2021 digital sound-alike enabled fraud
2021 | science and demonstration | DALL-E, a w:deep learning model developed by w:OpenAI to generate digital images from w:natural language descriptions, called "prompts" was published in January 2021. DALL-E uses a version of w:GPT-3 modified to generate images. (Adapted from Wikipedia)
In Dec 2020 Channel 4 aired a Queen-like fake i.e. they had thieved the appearance of Queen Elizabeth II using deepfake methods.
2020 | Controversy / Public service announcement | Channel 4 thieved the appearance of Queen Elizabeth II using deepfake methods. The product of synthetic human-like fakery originally aired on Channel 4 on 25 December at 15:25 GMT.[29]View in YouTube
2020 | Chinese legislation | On Wednesday January 1 2020 Chinese law requiring that synthetically faked footage should bear a clear notice about its fakeness came into effect. Failure to comply could be considered a w:crime the w:Cyberspace Administration of China (cac.gov.cn) stated on its website. China announced this new law in November 2019.[33] The Chinese government seems to be reserving the right to prosecute both users and w:online video platforms failing to abide by the rules. [34]
For purposes of this subsection, "another person" includes a person whose image was used in creating, adapting, or modifying a videographic or still image with the intent to depict an actual person and who is recognizable as an actual person by the person's w:face, w:likeness, or other distinguishing characteristic.
B. If a person uses w:services of an w:Internet service provider, an electronic mail service provider, or any other information service, system, or access software provider that provides or enables computer access by multiple users to a computer server in committing acts prohibited under this section, such provider shall not be held responsible for violating this section for content provided by another person.
C.Venue for a prosecution under this section may lie in the w:jurisdiction where the unlawful act occurs or where any videographic or still image created by any means whatsoever is produced, reproduced, found, stored, received, or possessed in violation of this section.
D.The provisions of this section shall not preclude prosecution under any other w:statute.[38]
2019 | demonstration | 'Thispersondoesnotexist.com' (since February 2019) by Philip Wang. It showcases a w:StyleGAN at the task of making an endless stream of pictures that look like no-one in particular, but are eerily human-like. Relevancy: certain
2018 | demonstration | At the 2018 w:World Internet Conference in w:Wuzhen the w:Xinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao (w:Chinese language)[41] and Zhang Zhao (w:English language). The digital look-alikes were made in conjunction with w:Sogou.[42] Neither the w:speech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera.
2018 | controversy / demonstration | The w:deepfakes controversy surfaces where porn videos were doctored utilizing w:deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting.
2016 | movie | w:Rogue One is a Star Wars film for which digital look-alikes of actors w:Peter Cushing and w:Carrie Fisher were made. In the film their appearance would appear to be of same age as the actors were during the filming of the original 1977 w:Star Wars (film) film.
2016 | science / demonstration | w:DeepMind's w:WaveNet owned by w:Google also demonstrated ability to steal people's voices
w:Adobe Inc.'s logo. We can thank Adobe for publicly demonstrating their sound-like-anyone-machine in 2016 before an implementation was sold to criminal organizations.
{{#ev:youtube|I3l4XLZ59iw|420px|left|#w:Adobe Voco. Adobe Audio Manipulator Sneak Peak with w:Jordan Peele (at Youtube.com). November 2016 demonstration of a Adobe's unreleased sound-like-anyone-machine, the w:Adobe Voco at the w:Adobe MAX 2016 event in w:San Diego, w:California. The original Adobe Voco required 20 minutes of sample to thieve a voice.}}
2016 | music video | 'Plug' by Kube at youtube.com - A 2016 music video by w:Kube (rapper) (w:fi:Kube), that shows deepfake-like technology this early. Video was uploaded on 2016-09-15 and is directed by Faruk Nazeri.
2015 | movie | In the w:Furious 7 a digital look-alike made of the actor w:Paul Walker who died in an accident during the filming was done by w:Weta Digital to enable the completion of the film.[44]
2013 | demonstration | At the 2013 SIGGGRAPH w:Activision and USC presented a w:real time computing "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist,[45] utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture.[46] The end result both precomputed and real-time rendering with the modernest game w:GPU shown here and looks fairly realistic.
2011 | Law in Finland | Distribution and attempt of distribution and also possession of synthetic CSAM was criminalized on Wednesday 2011-06-01, upon the initiative of the w:Vanhanen II Cabinet. These protections against CSAM were moved into 19 §, 20 § and 21 § of Chapter 20 when the Law on sexual offences in Finland 2023 was improved and gathered into Chapter 20 upon the initiative of the w:Marin Cabinet.
2009 | movie | A digital look-alike of a younger w:Arnold Schwarzenegger was made for the movie w:Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger.
2009 | demonstration | Paul Debevec: 'Animating a photo-realistic face' at ted.com Debevec et al. presented new digital likenesses, made by w:Image Metrics, this time of actress w:Emily O'Brien whose reflectance was captured with the USC light stage 5. At 00:04:59 you can see two clips, one with the real Emily shot with a real camera and one with a digital look-alike of Emily, shot with a simulation of a camera - Which is which is difficult to tell. Bruce Lawmen was scanned using USC light stage 6 in still position and also recorded running there on a w:treadmill. Many, many digital look-alikes of Bruce are seen running fluently and natural looking at the ending sequence of the TED talk video. [47] Motion looks fairly convincing contrasted to the clunky run in the w:Animatrix: Final Flight of the Osiris which was w:state-of-the-art in 2003 if photorealism was the intention of the w:animators.
{{#ev:youtube|3qIXIHAmcKU|640px|right|Music video for Bullet by w:Covenant from 2002. Here you can observe the classic "skin looks like cardboard"-bug that stopped the pre-reflectance capture era versions from passing human testing.}}
2002 | music video | 'Bullet' by Covenant on Youtube by w:Covenant (band) from their album w:Northern Light (Covenant album). Relevancy: Contains the best upper-torso digital look-alike of Eskil Simonsson (vocalist) that their organization could procure at the time. Here you can observe the classic "skin looks like cardboard"-bug (assuming this was not intended) that thwarted efforts to make digital look-alikes that pass human testing before the reflectance capture and dissection in 1999 by w:Paul Debevec et al. at the w:University of Southern California and subsequent development of the "Analytical w:BRDF" (quote-unquote) by ESC Entertainment, a company set up for the sole purpose of making the cinematography for the 2003 films Matrix Reloaded and Matrix Revolutions possible, lead by George Borshukov.
1999 | science | 'Acquiring the reflectance field of a human face' paper at dl.acm.org w:Paul Debevec et al. of w:USC did the first known reflectance capture over the human face with their extremely simple w:light stage. They presented their method and results in w:SIGGRAPH 2000. The scientific breakthrough required finding the w:subsurface light component (the simulation models are glowing from within slightly) which can be found using knowledge that light that is reflected from the oil-to-air layer retains its w:Polarization (waves) and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.[5]
1994 | movie | w:The Crow (1994 film) was the first film production to make use of w:digital compositing of a computer simulated representation of a face onto scenes filmed using a w:body double. Necessity was the muse as the actor w:Brandon Lee portraying the protagonist was tragically killed accidentally on-stage.
1961 | demonstration | The first singing by a computer was performed by an w:IBM 704 and the song was w:Daisy Bell, written in 1892 by British songwriter w:Harry Dacre. Go to Mediatheque#1961 to view.
1939 | demonstration | w:Voder (Voice Operating Demonstrator) from the w:Bell Telephone Laboratory was the first time that w:speech synthesis was done electronically by breaking it down into its acoustic components. It was invented by w:Homer Dudley in 1937–1938 and developed on his earlier work on the w:vocoder. (Wikipedia)
↑It is terminologically more precise, more inclusive and more useful to talk about 'terroristic synthetic pornography', if we want to talk about things with their real names, than 'synthetic rape porn', because also synthesizing recordings of consentual looking sex scenes can be terroristic in intent.
↑Whether a suspect can defend against faked synthetic speech that sounds like him/her depends on how up-to-date the judiciary is. If no information and instructions about digital sound-alikes have been given to the judiciary, they likely will not believe the defense of denying that the recording is of the suspect's voice.
↑ Jump up to: 12.012.1
Damiani, Jesse (2019-09-03). "A Voice Deepfake Was Used To Scam A CEO Out Of $243,000". w:Forbes.com. w:Forbes. Retrieved 2022-01-01. According to a new report in The Wall Street Journal, the CEO of an unnamed UK-based energy firm believed he was on the phone with his boss, the chief executive of firm’s the German parent company, when he followed the orders to immediately transfer €220,000 (approx. $243,000) to the bank account of a Hungarian supplier. In fact, the voice belonged to a fraudster using AI voice technology to spoof the German chief executive. Rüdiger Kirsch of Euler Hermes Group SA, the firm’s insurance company, shared the information with WSJ.
↑
Edwards, Benj (2023-01-10). "Microsoft's new AI can simulate anyone's voice with 3 seconds of audio". w:Arstechnica.com. Arstechnica. Retrieved 2023-05-05. For the paper's conclusion, they write: "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."
↑In this TED talk video at 00:04:59 you can see two clips, one with the real Emily shot with a real camera and one with a digital look-alike of Emily, shot with a simulation of a camera - Which is which is difficult to tell. Bruce Lawmen was scanned using USC light stage 6 in still position and also recorded running there on a w:treadmill. Many, many digital look-alikes of Bruce are seen running fluently and natural looking at the ending sequence of the TED talk video.
↑Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien).