FGV’s Department of Public Policy Analysis (DAPP) has released a study indicating an illegitimate interference in online discussions via bots. Accounts programmed to make massive posts have become a potential tool to manipulate discussions on social networks, especially at times of political relevance.
The study points out that during the general strike in April this year, more than 20% of Twitter interactions between users in favor of the strike were caused by these automated accounts. Bots also influenced the 2014 elections, generating more than 10% of the discussions.
As in public discussions outside the virtual world, the web has also become a space to easily spread false information. With this, the virtual world has allowed the adaptation of old political strategies of slander and manipulation of public discussions, but now on a much larger scale.
“Therefore, the growth of concerted bot actions represents a real threat to public debate, potentially jeopardizing democracy by manipulating the process of consensus-building in the public sphere, as well as the selection of representatives and government agendas that may define the future of the country”, said DAPP Director Marco Aurelio Ruediger.
DAPP’s research effort warns us that no one is immune and that we must try to understand, filter, and report the use and dissemination of false or manipulative information through this kind of strategy and technology. It is important to constantly protect all of our democratic venues, including social networks. This protection requires us to identify the bots to differentiate real and manipulated situations in the virtual environment. On the eve of the ‘election year’ that will define the next Brazilian President, with potentially cutthroat campaigns, it is essential to map the patterns of use of these mechanisms in order to prevent illegitimate interference in discussions, as seen in other countries.
For this reason, DAPP has developed a refined system to identify suspicious accounts that behave like bots, generating content through algorithms, and whose results show the major role played by bots at key moments in recent Brazilian politics.
•It is evident that social networks have the merit of sparking debates and amplifying voices in a space that allows for large repercussion.
•Several studies show how Twitter, Facebook, among other platforms, by enabling exchanges and stimulating discussions, have become important instruments of democracy.
•However, similarly to the public debate outside of the virtual world, the networks have also been used as a fertile space for the dissemination of false information.
•Automated accounts that allow for mass posting have become a potential tool for manipulating debates on social networks, especially in moments of political relevance.
•In the general strike of 2017, for example, more than 20% of the interactions that occurred on Twitter between users in favor of the strike were provoked by this type of account. During the 2014 presidential elections, the bots also generated more than 10% of the debate.
•With this, the virtual world has been allowing for the adaptation of old political strategies of slander and manipulation of political debates, now in a larger scale.
•Identifying the presence of these bots and the debates they create is fundamental for distinguishing which situations are real and which ones are manipulated in the virtual environment. Only then will it be possible to effectively understand the social processes originated in the networks.
•This research effort by FGV/DAPP issues an alert that we are not immune, and that we must seek to understand, filter and report the use and dissemination of false or manipulative information through this type of strategy and technology. It is important to be attentive and protect the democratic spaces, including on the social networks.
•In the eve of the “election year” that will define the next Brazilian president, with campaigns happening in a context of extreme competition, it is essential to map the usage patterns of these mechanisms in order to avoid illegitimate interventions on the debate, as already seen in other countries.
BOTS, SOCIAL NETWORKS AND POLITICS IN BRAZIL
A study on illegitimate interferences with the public debate on the web, risks to the democracy and the 2018 elections
What are they and what do they do?
An important means of communication, information and construction of connections, the social networks are an increasingly significant part of our daily lives. Studies carried out by the Pew Research Center show, for example, that the majority of adults in the United States (62%) use social networks to stay informed. However, 64% state that the fake news circulating on the networks cause “confusion” about daily facts and events. It is in this environment of “trust” but high circulation of dubious information that bots proliferate.
At first, automated accounts may even contribute positively in certain aspects of life on the social networks. Chatbots (chats operated by bots), for example, speed up the service to clients of companies and, in some cases, even aid refugees in processing their visa requests. However, the growing number of bots acts, in truth, with malicious intent.
Social bots are accounts controlled by software that artificially generate content and establish interactions with non-bots. They seek to imitate human behavior and pass as humans in order to interfere with spontaneous debates and create forged discussions. With this type of manipulation, the bots create a false sensation of wide political support to a certain proposal, idea or public figure, change the course of public policies, interfere with the stock market, disseminate rumors, fake news and conspiracy theories, generate disinformation and content pollution, in addition to luring users to malicious links that steal personal data, among other risks.
At the same time, the social networks have become an integral part not only of the personal life of the citizens, but also of their political activity and of the acts of their representatives. Parties and other movements of social representation also use the space to engage voters, attack opponents and promote debates around their interests. In this case, it is common to observe the orchestrated use of bot networks (botnets) to generate a movement at a certain moment, manipulating trending topics and the debate in general.
These actions have been identified in important events of international politics, such as the American elections of 2010, the election of Donald Trump in 2016 and the United Kingdom European Union membership referendum, the Brexit. In Brazil, the scenario is not different: orchestrated bot actions occurred in key moments of national politics, such as the approval of the Labor Reform, the general strike of 2017, the elections of 2014, the debate about the impeachment and the municipal elections of São Paulo in 2016.
How can they affect our lives?
When they interfere with debates developing on social networks, bots are directly striking the political and democratic processes by influencing the public opinion. Their actions can, for example, produce an artificial opinion, or an unreal aspect of a certain opinion of public figure, by sharing versions of a certain topic, which spread on the network as if there was, among the part of the society represented there, a very strong opinion on a certain subject (Davis et al., 2016). This happens with the coordinated sharing of a certain opinion, giving it an unreal volume and, consequently, influencing undecided users about the topic and strengthening the more radical users in the organic debate, given the frequent location of the bots in the poles of the political debate.
The automated profiles also promote misinformation with the propagation of fake news and network polluting campaigns. Bots frequently use social networks to reproduce fake news aiming to influence a certain opinion about a person or topic, or to pollute the debate with information that is real but irrelevant for the discussion in question. This action, which relies on the sharing of links as the main mechanism of propagation, attempts to avoid or decrease the weight of the debate about a certain subject. For this purpose, the bots generate an enormous amount of information, which reaches users at the same time as the real and relevant information, which ends up having its impact reduced. Therefore, the actions of bots not only disseminate fake news, which can have harmful effects for the society, but also actively seek to keep users from becoming adequately informed.
Another common strategy of automated profiles is sharing malicious links, aiming to steal data or personal information. This information can be used for the creation of new bot profiles with characteristics that help these bots to start connections with real users on the networks, such as profile photos. A common action that usually raises suspicions about the use of bots is an unknown user tagging someone in a shortened link with no clear identification of its content. These links, besides stealing personal information for use in the social network itself, can also direct the user to fake news or sites that will use the number of clicks to expand their influence on the network (Wang, 2010).
There have also been detections of bots aiming to manipulate the stock market. This happens when bot networks are put to work to generate conversations that involve a certain company or topic in a positive way, manipulating the network monitoring systems of the brokerage firms. This way, the shares in question could increase in value based on an optimism that was forged by the actions of bots.
A recent case of this type of action involved a bot-generated debate on the networks about the technology firm Cynk. The automated algorithms for buying and selling stocks identified this debate and started to make transactions with the company shares, whose market value increased 200 times, reaching 5 billion dollars. When stockbrokers identified that it was an orchestrated action, heavy losses had already been suffered. This type of action shows another disruptive potential of automated profiles, this time for the economy, causing impacts that spill over to the political debates (Ferrara et al., 2016).
This type of action suggests that the social networks, used by so many people for information purposes, could be in fact contributing for a less well-informed society, manipulating the public debate and consistently determining the directions followed by the country.
How do they work?
Bots are used on social networks to propagate fake, malicious news, or to generate an artificial debate. For that purpose, they must have the largest possible number of followers. But how can an automated profile create a network around itself?
Bots spread more easily on Twitter than on Facebook for various reasons. The 140-character limit on Twitter generates a limitation of communication that facilitates the imitation of human actions. Additionally, the use of @ to tag users, even if they are not connected to your account on the network, enables bots to randomly tag real people to create an element that is similar to human interactions.
Bots also make use of the fact that, generally, people are not very judicious about following a profile on Twitter, and tend to act reciprocally when they get a new follower. Experiments show that on Facebook, a platform where people tend to be a little more careful about accepting new friends, 20% of real users accept friend requests indiscriminately, and 60% always accept if they have at least one mutual friend. This way, bots add a large number of people at the same time and follow real pages of famous people, besides following and being followed by a large number of bots, in such a way that they end up creating mixed communities – which include real and fake profiles (Ferrara et al., 2016).
Some bots intend only to divert attention from a certain top and, therefore, are less concerned with their similarity to a human user than with the intensity and capability of changing the course of a debate on the networks. Other mechanisms, however, have a series of strategies to imitate human behavior and, in doing so, be recognized as such by users and detection systems.
Knowing that human behavior on the social networks has some temporal pattern in the production and consumption of content, the profiles are programmed to post according to these same rules. Paradoxically, it is the lack of both temporal and content patterns in the long term that bots have the most difficulty in imitating, which usually allows their identification (Brito, Salvador e Nogueira, 2013). The more modern algorithms go beyond: they are capable of identifying popular profiles and following them, identifying a subject being talked about on the network and generating a short text through natural language algorithms and generating some degree of interaction.
How can they be identified?
There is not a single characteristic that positively indicates whether a certain profile belongs to a real user or a fake, automated one. The identification is the result of the composition of multiple characteristics and interrelated indicators. Research on this field is distributed between three main lines of methods: a) through information available on the social networks themselves; b) systems based on crowdsourcing and human intelligence to identify bot profiles; and c) through machine-learning, based on the identification of certain characteristics that enable the automation of the distinction between bots and people (Ferrara et al, 2016).
There are also different hypotheses that can be used to support the search for bots on the networks. With the method that uses connections between profiles and available data on the behavior on social networks as the formula for identifying bots, some systems assume that these automated profiles will be primarily linked to similar profiles, especially in the beginning of their digital life. That is because they need to build a base of followers to seem believable to the eyes of real users.
This method, however, needs to be weighed against the fact that human users are not very judicious when it comes to interactions and friendships with unknown accounts, especially on Twitter. Because of this, after existing for some time, bot accounts will have mixed networks, not primarily composed by bots or by humans. The amount of bots among the friends of the profile, however, can be an indicator of its nature.
The crowdsourcing method starts from the assumption that the detection of bots would be simple for human beings, whose capacity of understanding and identifying their own behavior has not yet been matched by machines. A test carried out by Wang et al (2013) reached the conclusion that, in a short training mechanism for identification (showing only examples of real and fake profiles) and following the decision of the majority inside a small group of volunteers, the number of fake positive identifications was very close to zero.
This system has some difficulties. One of them is the low cost-effectiveness for networks with a lot of users, such as Twitter and Facebook. Besides that, considering that amateur evaluators do not have good performance individually – only when in a majority vote system -, it is necessary that some trained people participate to guarantee the balance of the voting system.
The detection through machine-learning happens with the coding of behavior patterns starting from the collection of metadata. This way, the system is capable of automatically identifying humans and bots based on the behavioral pattern of the profile. These systems are normally organized from a database where humans and bots have already been distinguished previously.
The user metadata is considered one of the most predictable aspects to distinguish humans and bots and can contribute to a better understanding of how the more sophisticated bots work. Identifying these bots or hacked accounts, however, is difficult for these systems. Additionally, the constant evolution of the bots makes it so that the system, built from a static database, becomes less precise over time. However, it allows for the processing of a large number of correlations and complex patterns, in addition to analyzing a large number of accounts.
The most efficient identification mechanisms combine different aspects of these approaches, exploring multiple dimensions of the behavior of the profile, such as activity and schedule pattern (Boshmaf et al., 2012). These systems take into account, for example, that real users spend more time on the network exchanging messages and visiting the content of other users, such as photos and videos, while bot accounts spend their time researching profiles and sending friendship requests.
In this sense, research shows that the activities of bot accounts tend to be less complex in the variety of actions they perform, which adds another possibility to the combination of factors that allow the systems to determine for sure that a certain profile is a bot. Because this type of system combines different data, it also obtains good results from a smaller amount of information – such as the past 100 tweets -, which accelerates the analysis and processing capacity.
The studies about bot detection on social networks are inspired by the efforts for spam detection and blocking in electronic messaging systems. In this sense, there is also the analysis of shared links to identify link farms (companies that manage bots and sell likes, retweets, etc) and dynamics of interaction (Ghosh et al, 2012).
When we analyze the statistical processes that describe the interactions between users, several factors can be studied and combines to develop a model of bot detection on social networks. Some examples are:
•Variety of actions while connected to the network;
•Variety of actions while connected to the network;
•Characteristics of the user, considering the number of friends (real people have, on average, between 100 and 1000 followers), the proportion and correlation between profiles followed and profiles that follow the user.
•Characteristics of the friendships, analyzing how users on a certain network are interacting among themselves, including patterns related to language, popularity and time in the places of interaction;
•Characteristics of the retweet network, mentions and repetition of hashtags;
•Temporal characteristics, such as average time and production of tweets;
•Content and language characteristics;
•Characteristics of the sentiment expressed through the post.
How is DAPP working in order to identify them?
The masses of data collected by FGV/DAPP are composed by metadata – information about the data itself – and, through this data, we explore the possibilities of identification of accounts that have acted automatically during the periods of analysis. This way, we identified that the metadata named generator refers to the platform that generates the content of the tweet, which is very useful for the detection of bots.
We decided, then, to verify all the generators used in our databases and what is the amount of tweets generated by each one of them. From these results, we verified detailed extracts about the nature of each one of them and found platforms for the automation of content production listed among the generators.
Six cases were chosen for this first analysis:
•The debate on Rede Globo on October 4, 2014 between the presidential candidates in the first round of the elections;
•The debate on Rede Globo on October 24, 2014 between candidates Dilma Rousseff and Aécio Neves, who were running for the presidency in the second round of the elections;
•The pro-impeachment demonstrations on March 13, 2016;
•The debate on Rede Globo between the São Paulo mayoral candidates on September 29, 2016;
•The general strike on April 28, 2017;
•The voting for the Labor Reform in the Senate, on July 11, 2017.
After collecting the databases related to the six cases, we verified that 1925 different generators generated the 7.8 million tweets posted about all of them. From this total, 181 produced at least 100 tweets each, and those were the ones we analyzed manually. This evaluation allowed us to identify 83 generators that produce tweets automatically in a programmed way or using the Twitter platform through automation.
CASE STUDY – 2014 ELECTIONS
The run for the Presidency of the Republic in the 2014 elections was characterized by an increasingly fierce political competition, a consequence of the effervescence on the streets still fresh from the 2013 protests. On the social networks, the polarization manifested itself in an aggressive way, and part of this hostility was provoked by bots, which motivated around 11% of the discussions.
The first round of the elections was marked by the death of candidate Eduardo Campos (PSB), succeeded by Marina Silva (PSB). The dispute culminated in a deep antagonism between Dilma and Aécio (PSDB) on the second round, which resulted in the victory of then president Dilma with a narrow margin of advantage (around three percentage points).
To analyze the potential use of bots inside the discussions during the 2014 elections, we selected the tweets related to the debate between Dilma and Aécie on the second round and elaborated an interaction map from the retweets. Three major groups were identified: profiles supporting Dilma (red), profiles supporting Aécio (blue) and profiles having a general discussion about the topic – which includes press profiles (gray). We then selected the accounts that used suspicious generators and highlighted their size and color (pink).
Observing the graph above, we notice that accounts that produced tweets through suspicious generators are concentrated on the extreme ends of the poles supporting the candidates. Practically none of the suspicious accounts are in the group that in general does not support any candidate (gray zone).
From all Twitter interactions in the hours analyzed, 11.34% were motivated by tweets or retweets made by bots. Among Aécio Neves supporters (blue cluster), however, this portion of interactions with automated accounts (bots being retweeted by other bots or regular accounts) reached 19.41%. In the discussions between profiles supporting Dilma, the amount was 9.76%.
Exploring the activities of the suspicious accounts, we find that profiles are clearly automated in order to inflate the support to a certain candidate. Among these accounts, we identified the ones that posted more than once per second, an activity that raises suspicions of automation. And we highlighted in green, on the same map, the ones that produced two consecutive tweets in less than one second at least twice. Again, we notice that they are located in the extreme poles in support of each candidate.
The same can be seen in all the analyzed cases. On the following visualizations, we also colored the accounts that used suspicious generators in pink, the ones that generated two consecutive tweets in less than one second at least twice in green, and the accounts that correspond to both criteria in white.
In the debate about the 1st round, the interactions with bots represented only 6.29% of the discussion on Twitter. Once again, these interactions were more significant among the profiles supporting Aécio Neves, representing 19.18% of the debate in the blue cluster. Among Dilma supporters, the amount was 17.94%.
CASE SUDY – 2015 IMPEACHMENT
The victory of Dilma Rousseff did not halt the growing hostility between the political fields. The difficulty of the president to maintain political support in the Congress and the economic crisis in the country in recession resulted in an impeachment process with popular support manifested in a series of protests throughout the country. The graph on the right shows how the discussion on Twitter happened on the day of the largest pro-impeachment demonstration registered.
At least 10% of the interactions about the subject on this day were stimulated by bots, that is, retweets of content originated in an automated account. In the cluster of Dilma Rousseff supporters, this proportion reached 21.43%, which shows the power of influence that this type of account has on the political debate.
CASE STUDY – SÃO PAULO MUNICIPAL ELECTIONS OF 2016
The run for the mayor office in São Paulo began with a dispersion of the voting intentions between then mayor running for reelection Fernando Haddad (PT), João Doria (PSDB), Celso Russomanno (PRB), Marta Suplicy (PMDB) and Luiza Erundina (PSOL). This divide during the first round made the electoral debates not as marked by the antagonistic discussion between PT and PSDB as the presidential elections. The election was defined in the first round, with the victory of João Doria.
The graph on the right shows how other political forces influenced the debates on the social networks. Bot-motivated interactions were also more equally distributed. Among Doria supporters, they represented 11.25% of the debate; among Haddad supporters, 11.54%; among Russomanno supporters, 8.40%.
CASE STUDY – GENERAL STRIKE OF APRIL 28, 2017
After the impeachment of Dilma Rousseff, the debate on the legislation and labor reforms in the National Congress gained strength. The main argument in favor of the reforms was that the need for austerity to overcome the crisis should be seen as an opportunity to modernize these legislations, while the main argument against them identified in this movement a loss of rights and worsening of work conditions and of the network of social protection of the Brazilian State.
This scenario caused the labor unions and parties opposed to the reforms to convoke a general strike on April 28, 2017, counting on a large turnout to convince the political spectrum of the dissatisfaction of the people concerning these agendas. As can be observed in the graph on the right, the bots once again had a large presence. Among supporters of the strike, 22.39% of the interactions were motivated by automated tweets.
CASE STUDY – VOTING FOR THE LABOR REFORM IN THE SENATE ON JULY 11, 2017
One of the main focuses of the economic recovery agenda of the current government was the approval of a reform of the labor legislation. The debates about this proposal on the social networks followed a trend of polarization already observed in other moments of national politics. After months of discussions, negotiations and modifications, the Labor Reform was brought to a plenary session of the Federal Senate for voting on July 11.
The bots are once again present in the two extreme ends of the debate, this time in a larger number on the pole opposing the reform. In total, we identified 2% of the interactions related to this event as automated – 3% of the interactions opposing the reform and 1% of the ones favorable to it.
VERIFICATION OF THE ANALYSIS
To validate the analysis, we manually verified a sample of 2153 suspicious accounts for the six cases chosen. This manual procedure guarantees 95% accuracy with a margin of error of 2%. The verification sought to check whether a certain account produces content in a completely automated form.
We observed that more than 50% of these accounts have an aspect of total automation. Approximately 9% of the accounts are institutional (from press organizations and blogs, for example), around 6% no longer exist, almost 2% were suspended by the platform, and a little over 25% are only partially automated.
As an example of the latter, there are accounts that participated of programs supporting a certain cause and authorized, in certain moments, automatic tweets to be published by their profiles on the platform. Another example are accounts that produce original content but create triggers, such as publications of news involving a certain public figure or institution, and post them in an automated form.
We conclude, therefore, that the verification of accounts can be contradictory and, for this moment, we consider that each tweet must be evaluated regarding its origin (whether automated or not).
We will return now to the exploration of the topography of the retweet network. Concerning the general strike, we notice that the tweets produced in an automated way served basically the extreme ends of the discussions, as seen in the graph on the right, where we filtered only the interactions conducted in tweets produced automatically.
The centers of discussion pro and against the strike participated of discussions with tweets produced automatically. The gray group consists of automated contents that use topics outside of the current debate to obtain new followers, with no apparent attempt to influence the debate.Os núcleos de discussão pró e contra a greve participaram de interações com tuítes produzidos automaticamente. O grupo cinza é constituído por conteúdos automatizados que usam assuntos que estão no debate do momento para conseguir novos seguidores, sem aparente tentativa de influenciar o debatedo momento para conseguir novos seguidores, sem aparente tentativa de influenciar o debate.
The appearance of automated accounts allowed for strategies of manipulation, dissemination of rumors and slander, commonly used in political disputes, to gain an ever larger dimension on the social networks. The significant participation of bots in the virtual environment created an urgent necessity to identify their activities and, consequently, distinguish which debates are legitimate and which ones are forged. This distinction is essential for the social processes originated on the networks to be effectively understood.
To identify bots, DAPP has been developing a methodology that combines evaluations of different metadata to encompass all the possible strategies for creating and operating automated accounts. With the ever faster dynamics of technology upgrading, bots have their activities enhanced on a daily basis, becoming closer to the human behavior. In addition, it is indispensable to distinguish malicious bots from bots with other purposes, such as digital marketing, brand promotion, blogs and companies, institutional profiles, profiles with many administrators.
Another challenge we face is the identification of cyborg accounts, those partially automated but also manipulated by humans, who post real content to create an aspect of randomness and unpredictability that is common in the human interactions. The use of these accounts makes detecting bot operation more difficult when we try to classify an account on Twitter using the binary variable of bot or human, for example.
The analysis of interactions by accounts with tweets produced automatically already indicates and confirms the use of bots in the Brazilian political debate. From the analysis carried out by DAPP of metadata that indicates their operation, we can conclude that automatically generated content has been influencing discussions on Twitter with the objective of generating advantages for political actors.
The study identified that the operation and production of automated content has not been occurring exclusively in one political pole or field. The analysis of the six cases chosen suggests that groups with different interesting, especially those located on the extreme ends of the political spectrum, become inflated and attack each other with this practice.
Expanding the capacity to identify and suppress the malicious automation of profiles on the social networks must be a priority. Recent analyses show that this type of action has been successful in directing the public debate, which is more and more present on the networks, directly influencing turning moments that are decisive for the future.
Therefore, for social networks to continue being a democratic space for opinion and information, it is necessary to identify the organicity of the debates. For the networks to become more transparent it is also critical to start identifying those responsible for this type of coordinated action, seeking to understand the interests behind the contracting of these automation services and propagation of misinformation.
Marco Aurélio Ruediger
Lucas Roberto da Silva
Rebeca Liberatori Braga
Lucas Maciel Peixoto