Foreign Visitors in Brazil - 2005 to 2015 - Part II


So this is the continuation of our previous analysis of foreign visitors from 2005 to 2015.

The first thing we see is how the variable of interest (number of arrivals) behaves through the years.



It is observable that there has been a slight increase in the number of foreign visitors since 2013. It is also possible to see how the accumulated visits are distributed through the continents on the graph below.



From this graph we can understand that this increase is specially because of South America. For the other two continents with most visitors, Europe and North America, there has been a slight decrease in arrivals.

Below we have the graph of arrivals per continent in this 10 year window. South America and Europe have by far the larger number of visitors.



Next we observe how the sum of arrivals is ditributed through the months of the year. As expected, there is a larger number of arrivals during summer time in Brazil. However, there is an unexplained increase in July. So far, one possibility is due to the summer hollidays in the north hemisphere.



In order to understand the seasonal behaviour of international visitors, a new dataframe was created grouping the monthly sum of visitors per month, and a boxplot was created. As seen below, January is by far the month with most visitors. This single dot on June with over a million visitors happened in 2014, precisely during the soccer world cup.



The next plot shows how those international visitors got here. It is reasonable to suppose that the large majority of those land entries are from South American countries. Also, visitors that entered the country by air sums up more than twice all other categories together.



Next plot is about in which State do visitors arrive. It is important to note that this does not necessarily mean this is were they are headed: Sao Paulo often has the cheapest fares and larger capacity for international flights, so many visitors arrive there and then get a domestic flight to another State.



In order to analyse each country, I divided the graphs by continent to avoid the graph to get messy. On the next graphs it is evident how this idea of using a “other countries” category is wrong and wastes lots of what could be useful information. In Africa, Asia and Central America, this category has very large values. Despite the fact that together they sum up to around 1.2 Million visitors, which is around 2% of the nearly 50 Million visitors over the 10 years used, they play an important role if studying each continent individually.













So far, we have observed that visitors from South America are increasing in numbers over the past few years and are the majority in numbers, the majority of visitors come by air, arrive in Sao Paulo and in January... Right? Well, maybe not! So far, we have only observed the dimensions of this dataset individually, not how they interact with each other. This means we still need to further analyse those variables together. And this will be the task for the next and final part of this work. 

See you!

Comments

Popular posts from this blog

Distributed Computing with Spark SQL (Coursera)

Dealing With Large Features in Git Repos