3,958
edits
Juho Kunsola (talk | contribs) (+ new <section> tags for "definitions-of-synthetic-human-like-fakes") |
Juho Kunsola (talk | contribs) m (mv content unchanged) |
||
Line 126: | Line 126: | ||
* In the '''2018''' at the '''[[w:Conference on Neural Information Processing Systems]]''' (NeurIPS) the work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented. The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results | * In the '''2018''' at the '''[[w:Conference on Neural Information Processing Systems]]''' (NeurIPS) the work [http://papers.nips.cc/paper/7700-transfer-learning-from-speaker-verification-to-multispeaker-text-to-speech-synthesis 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis'] ([https://arxiv.org/abs/1806.04558 at arXiv.org]) was presented. The pre-trained model is able to steal voices from a sample of only '''5 seconds''' with almost convincing results | ||
Observe how good the "VCTK p240" system is at deceiving to think that it is a person that is doing the talking. | Observe how good the "VCTK p240" system is at deceiving to think that it is a person that is doing the talking. | ||
{{#Widget:Iframe - Audio samples from Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis by Google Research}} | {{#Widget:Iframe - Audio samples from Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis by Google Research}} | ||
The Iframe above is transcluded from [https://google.github.io/tacotron/publications/speaker_adaptation/ 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"' at google.gituhub.io], the audio samples of a sound-like-anyone machine presented as at the 2018 [[w:NeurIPS]] conference by Google researchers. | |||
[[File:Helsingin-Sanomat-2012-David-Martin-Howard-of-University-of-York-on-apporaching-digital-sound-alikes.jpg|left|thumb|338px|A picture of a cut-away titled "''Voice-terrorist could mimic a leader''" from a 2012 [[w:Helsingin Sanomat]] warning that the sound-like-anyone machines are approaching. Thank you to homie [https://pure.york.ac.uk/portal/en/researchers/david-martin-howard(ecfa9e9e-1290-464f-981a-0c70a534609e).html Prof. David Martin Howard] of the [[w:University of York]], UK and the anonymous editor for the heads-up.]] | [[File:Helsingin-Sanomat-2012-David-Martin-Howard-of-University-of-York-on-apporaching-digital-sound-alikes.jpg|left|thumb|338px|A picture of a cut-away titled "''Voice-terrorist could mimic a leader''" from a 2012 [[w:Helsingin Sanomat]] warning that the sound-like-anyone machines are approaching. Thank you to homie [https://pure.york.ac.uk/portal/en/researchers/david-martin-howard(ecfa9e9e-1290-464f-981a-0c70a534609e).html Prof. David Martin Howard] of the [[w:University of York]], UK and the anonymous editor for the heads-up.]] |