Visualizing Airbnb Gentrification
Overview
Starting from the mid-2000, Airbnb has become an essential part of our everyday travelling. Demands are enormous in the OTA business nowadays. But have you ever stopped and wondered what factors contribute to the listing price? What type of go-to spots near Airbnb contribute significantly to its listing price? When tourists travel to Manhattan, they cannot miss the numerous galleries. What is the effect of the distance to an Airbnb’s nearest gallery on its listing price for each night, after accounting for the effects of the distance to the nearest subway station and room type in Manhattan, New York? The answer is the vicinity to a gallery contributes more for Airbnb's listing price than the other elements mentioned above. Let’s delve in deeper into the research process.
Tool
R Studio, ArcGIS
My Role
Research, Data Cleaning, Data Engineering, Data Analysis
Instructor
Carole Voulgaris
Duration
2 weeks
Setup - Choosing Data Variables
I will be addressing this question using Airbnb listing data from a mission-driven project, 'Inside Airbnb', and public amenities data from NYC OpenData. My dataset includes 21,598 Airbnb listings in Manhattan, New York. From the whole set of variables, I will only choose certain of them as my research focus.
Airbnb Listing Price
Review Score
Year Round Availability
Minimum Night of Stay
The Distance
To Its Nearest Gallery
To Its Nearest Subway Station
To Its Nearest Park
The Density
Of Its Nearby Galleries
Of Its Nearby Subway Stations
Of Its Nearby Parks
Room Type
Entire Home / Apartment
Private Room
Shared Room
Airbnb Listing Price
The Distance
To Its Nearest Gallery
To Its Nearest Subway Station
Room Type
Entire Home / Apartment
Private Room
Shared Room
*Inside my categorical variable Room Type, I choose Entire Home/Apartment as my reference category
To get the vicinity data (distance to an Airbnb listing's nearest gallery/subway station/park/etc.), I will use ArcGIS to calculate the geographical distance value between each airbnb listing to its nearest target of interest.
Data Showcase
Airbnb Listing Price
(USD$)
Vicinity of The Gallery
(Meter)
Vicinity of The Subway Station
(Meter)
Full Range
10-2000
2-3658
1-1658
Interquartile Range
95-220
130-395
233-512
Standard Deviation
164
290
228
Mean
186
312
398
Median
150
223
360
Table 1 presents basic descriptive statistics for each continuous variable in the dataset.
➊ Airbnb Listing Price
Listing prices in the sample have a minimum of $10, and a maximum of $2,000. Half of the Airbnb listings in the sample have a listing price between $95 and $220, representing an interquartile range of $125, which is less than the standard deviation of $164. The median value of $150 is less than the average value of $186, which suggests some left skew in the distribution, as illustrated in Figure 1.
➋ Vicinity to The Gallery
The distance from an Airbnb listing to its nearest gallery in the sample has a minimum of 2 metres and a maximum of 3,658 metres. Half of the Airbnb listings to their nearest galleries in the sample have distances between 233 and 512 metres, representing an interquartile range of 279 metres, larger than the standard deviation of 228 metres. The median value of 223 metres is less than the average value of $312, which suggests some left skew in the distribution, as illustrated in Figure 2.
➌ Vicinity to The Subway Station
The distance from an Airbnb listing to its nearest subway station in the sample has a minimum of 1 metre and a maximum of 1,658 metres. Half of the Airbnb listings to their nearest subway stations in the sample have distances between 130 and 395 metres, representing an interquartile range of 265 metres, slightly less than the standard deviation of 295 metres. The median value of 360 metres is less than the average value of $398, which suggests some left skew in the distribution, as illustrated in Figure 3.
➍ Airbnb Listing Room Type
Figure 4 illustrates the sample proportions for different Airbnb listing room types. More than half of the listing sample (61 percent) is an entire home or apartment. The remaining listing sample is mostly private rooms (37 percent) only 2 percent of the listing sample is shared rooms.
Hypothese Testing
I will first do hypothesis testing, calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest gallery (both are log-transformed), then calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest subway station, and calculate 95-percent confidence interval for the average listing price on each room type.
➊ Relationship between an Airbnb listing price and distance to the nearest gallery
The relationship between an Airbnb listing’s vicinity to its nearest gallery and its listing price is illustrated in Figure 5 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.30 and -0.28. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.
➋ Relationship between an Airbnb listing price and distance to the nearest subway station
The relationship between an Airbnb listing’s vicinity to its nearest subway station and its listing price is illustrated in Figure 6 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.05 and -0.02. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.
➌ Relationship between an Airbnb listing room type and listing price
Figure 7 shows the average Airbnb listing price within different usual room types. Error bars represent 95-percent confidence intervals. The 95-percent confidence interval for the average Airbnb listing price within Entire home/apts is between $230 and 236. The 95-percent confidence interval for the average Airbnb listing price within private rooms is $110 and 115. The 95-percent confidence interval for the average Airbnb listing price within shared rooms is $81 and 97.
Regression Models
I will then create regression models, to find a linear equation that would let me predict the value of listing price based on the value of its distance to the nearest gallery, its distance to the nearest subway station, and room type.
Coefficient
Estimated Value
P-Value
Intercept
233.413
<2e-16
Private Room
-121.004
<2e-16
Shared Room
-144.436
<2e-16
➊ Relationship between an Airbnb listing room type and listing price in regression
Table 2 shows the results of the model that predicts an Airbnb listing price based on room type.The R-squared value for this model was 0.1333, suggesting that about 13 percent of the variation on listing price can be explained by differences in rome type. The coefficients for private room and shared room both had significant coefficients, indicating that when booking different room types of Airbnb had listing prices that were significantly different from the listing price for the entire home or apartment. The coefficients for private room and shared room are both negative, indicating that booking these two room types of airbnb would cost less money than booking an entire home or apartment.
Predictor
Estimated Coefficient (In Regression With A Single PRedictor)
Estimated Intercept Value
P-Value
Model R2
Distance To Its Nearest Gallery
-0.11
219.74
<2e-16
0.0376
Distance To Its Nearest Subway Station
-0.014
191.26
0.0032
0.0004
➋ Relationship between an Airbnb listing price and continuous variables (distance to the nearest gallery and distance to the nearest subway station) in regression
Table 3 shows the results of two different regression models: one predicting Airbnb listing price based on the distance to its nearest gallery, and the other predicting listing price based on the distance to its nearest subway station.
The first model with the distance to its nearest gallery predicts about 3.76% of the total variation in listing price, with R-squared values of 0.0376; The second model with the distance to its nearest subway station predicts about 0.04% of the total variation in listing price, with R-squared values of 0.0004.
The coefficient in the first model is significant and negative, indicating that, without controlling for other factors, a shorter distances to an Airbnb’s nearest gallery is associated with higher listing price; The coefficient in the second model, however, is negative but not significant, indicating that, without controlling for other factors, a shorter distance to an Airbnb’s nearest subway station is less associated with higher listing price than a shorter distance to its nearest gallery.
Code Sample (R)
Click To View The Complete Code on GitHub ↗
Limitation
The distance to an Airbnb listing’s nearest gallery and subway station is the linear distance; this can only serve as heuristic value for the accessibility for a listing to its surrounding galleries, subway stations. To get a more accurate or realistic accessibility representation, I need to switch these linear distances to network distances in ArcGIS.
Wrap It Up
The vicinity to the galley, subway station and room types are some features that contribute to an Airbnb listing price in Manhattan, New York. The listing price of an Airbnb associates more with the distance to its nearest gallery than the distance to its nearest subway station, and after accounting for the effects of the distance to the nearest subway station and room type, when it gets closer to an gallery, the listing price would be higher. In order to get a more realistic representation of the vicinity value, my next step would be to switch the linear distance to the network distance to accurately represent the length for transportation.
Next Step
- Switch the linear distance to the network distance to accurately represent the length for transportation.
- Add more geographical variables such as the distance to a listing's nearest park or public institution, and also review scores, etc.
- Publish the data visualization in an interactive format using leaflet.js, d3.js, and Mapbox API.