Artificial intelligence, how to understand if a text was not generated by a human being

[ad_1]

Earlier this year made a lot of talk online an experimental tool intended for professionals in the way of education, called GptZero and developed by Princeton student Edward Tian. The system measures the probability that a piece of content has been generated by ChatGpt based on the rate of “perplexity” (i.e. randomness) and its “impact” (i.e. variance). OpenAithe startup behind ChatGpt, has unveiled a other tool sifting through and evaluating texts of over a thousand characters. The company recognizes the tool’s limitations, including false positives and limited effectiveness for languages other than English (since English data is often prioritized for AI text generators, currently most survey tools favor English speakers).

Would you be able to tell if an article was written, at least in part, by AI? “THE texts produced by generative AI can never do the job of a journalist like you, Reece“, Tian reassures me. cneta site that deals with technology, has published several articles written by algorithms and fixed by a human being. ChatGpt for the moment lacks audacity and of from time to time he invents facts from whole plant, an aspect that could be a problem for those trying to create a reliable story.

Watermarks and “radioactive” data

While these detection tools are useful for now, Tom Goldstein, a computer science professor at the University of Marylandprovides that in future they will become less effective, considering that natural language processing is set to become ever more sophisticated. “This type of detectors are based on the fact that there are systematic differences between human text and text produced by a machine Goldstein explains. But the goal of these companies is to create a text produced by machines that is as close as possible to the human text“. Does this mean that the hopes of detecting artificial content are zero? Absolutely not.

Goldstein recently collaborated on a research about possible methods for integrate watermarks – digital watermarks that allow you to trace the origin of a content – in the great linguistic models that feed the text generators of artificial intelligence. While it’s not a foolproof method, the idea is fascinating. ChatGpt tries to predict the probability of occurrence of the sequence of words in a sentence by comparing different options, and a watermark may be able to label some sequences of words as off-limit. In this way, if during the scanning of the text it turns out that the watermark rules have been broken several times, it is possible to determine that the content was probably created by a human being.

.

[ad_2]

Source link

Watermarks and “radioactive” data

Shiv

Leave a Reply Cancel reply