How to detect fake news: do it yourself or let Artificial Intelligence help you out
The term post-truth became the word of the year in 2016, according to the Oxford University Press. It defines the turn of events when public opinion is much more influenced by emotions which makes it easier to manipulate. This is where fake news comes in. In the era where clicks and views are the most coveted trophy, mastering the art of making fake news is required, and so is the ability to detect it. It's not that hard, once you know how these news shams are manufactured. Besides, if you can't do it yourself, a friendly AI algorithm is always there to help.
Where does fake news come from?
Not all misleading information can be labelled as intentionally fake, most of it is just manipulation — clickbait and catchy provocative headlines, for instance, — or mere lack of professionalism, or perhaps, playing fast and loose with the facts to toy with people's emotions. It, however, by no means excuses the poor quality of the news.
The resource snopes.com has been dealing with false information online since the mid-1990s. Founder David Mikkelson warned against generalising the "fake news" category. "The fictions and fabrications that comprise fake news are but a subset of the larger bad news phenomenon, which also encompasses many forms of shoddy, unresearched, error-filled, and deliberately misleading reporting that do a disservice to everyone," he wrote.
In 2008, the factcheck.org resource compiled a list of tips on how to spot a fake email, named "Key Characteristics of Bogusness." Among most blatant markers were named such techniques as "anonymous author; excessive exclamation points, capital letters and misspellings; entreaties that "This is NOT a hoax!"; and links to sourcing that does not support or completely contradicts the claims being made".
Sometimes, a journalist may quote only part of what a politician says, giving a false impression of the initial meaning. Again, this can be deliberately done to convince readers of a particular viewpoint, or it can be an honest mistake.
The media also exploits people's confirmation bias. Readers are more likely to accept information that is in sync with their convictions dismissing information that is not.
Finally, a lot of viral claims aren't "news" per se; they are meant to be fiction, or satire, or a prank.
How to weed it out?
-First of all, make sure the publication is credible, and it doesn't have anything to do with how popular it is;
-Check out the about us section to make sure you didn't stumble into a prank or satirical news resource, like the Onion;
-Study the credentials of the author and his/her previous work and contacts to see if the email address is not private and the portfolio is substantial;
-In terms of form, poor spelling and punctuation, excessive use of question or exclamation marks, caps or emoji will be obvious red flags.
-See if the story has been recently or regularly updated;
-Examine the quotes and where they came from, serious investigation of a complicated issue has to have grounding. Also, relevant information at the primary level has to be found on other serious sites as well;
-Visuals are also worth reverse searching to leave out any tampering with the photos. Also, try to spot something in the picture that would point to its authenticity and relevance (shop, street sign, licence plate, billboard).
-It is always worth consulting the fact-checking websites like FactCheck.org or Snopes.com. Or you might as well perform the due diligence of the resource by yourself.
Artificial Intelligence comes to the rescue
The computer has also learnt how to detect lies. Weeding out false information in texts using computer linguistics and machine learning is new. So how does it work?
The most important thing is to find the corpus that is proved to be 100% true or false. At times, researchers work with a corpus of 150-180 texts, but the statistical spread is very high at this rate, so the interpretation of the result has to be very thorough.
Once the necessary corpus is formed, the first thing on the list is the vocabulary. Texts have to be processed: the special linguistics software, like LIWC or MyStem, identify parts of speech, spot the emotionally charged words, names, especially celebs, and count it all.
The researchers' goal is to analyse the collected data and define the patterns. In the paper by Dina Pisarevsakya who studied news texts from different sources, the trends included length of words, frequency of adjectives, conjunctions, numerals, citations, and overall emotional tune.
Another method proposed by Pisarevskaya in collaborative research with Boris Galitsky is based on analysing larger entities of text, not just words and phrases, but the structure. The analysis is based on the theory of rhetorical structures. It is a "theory of text structure that is being extended to serve as a theoretical basis for computational text planning.... The schemes which compose the structural hierarchy of a text describe the functions of the parts rather than their form characteristics."
The creators of this theory, William Mann and Sandra Thompson defined several rhetorical relations, among which are detailing, antithesis, setting an objective, and others. The text written by a liar is expected to have a specific structure.
Other research delving into the nature of fake news emerged following the East Japan Great Earthquake of 11 March 2011. Rumours about an explosion at a petrochemical complex owned by Cosmo Oil started to pop up on twitter.
The study was done by a group of scientists from the US and Japan, who developed an algorithm for fake news detection based on topic diversity.
To analyse the situation, they combed through 200 million tweets to reveal topic diversity using a micro-clustering approach. They started by "extracting micro-clusters, small sets of keywords that represent topics, using a data polishing algorithm … from tweets about the event"; then they analysed these clusters over time to see how topics would change.
In other words, the news about a real event is more diverse and heterogeneous than the fakes. This is accounted for by truthful people changing the voice of the post and their attitude towards the event as they learn more details. As a result, the topic diversity index grows even though the overall number of tweets stays the same. The authors of fake tweets, on the other hand, remain on their original point and retain their initial, false, claims since they don't reflect on the subject and just go on tweeting. In this case, there is no change in the number of clusters, or subtopics, even if the quantity of tweets rises.