X

Visualizing Airbnb Gentrification

Overview

Starting from the mid-2000, Airbnb has become an essential part of our everyday travelling. Demands are enormous in the OTA business nowadays. But have you ever stopped and wondered what factors contribute to the listing price? What type of go-to spots near Airbnb contribute significantly to its listing price? When tourists travel to Manhattan, they cannot miss the numerous galleries. What is the effect of the distance to an Airbnb’s nearest gallery on its listing price for each night, after accounting for the effects of the distance to the nearest subway station and room type in Manhattan, New York? The answer is the vicinity to a gallery contributes more for Airbnb's listing price than the other elements mentioned above. Let’s delve in deeper into the research process.

Tool

R Studio, ArcGIS

My Role

Research, Data Cleaning, Data Engineering, Data Analysis

Instructor

Carole Voulgaris

Duration

2 weeks

Setup - Choosing Data Variables

I will be addressing this question using Airbnb listing data from a mission-driven project, 'Inside Airbnb', and public amenities data from NYC OpenData. My dataset includes 21,598 Airbnb listings in Manhattan, New York. From the whole set of variables, I will only choose certain of them as my research focus.

Airbnb Listing Price

Review Score

Year Round Availability

Minimum Night of Stay

The Distance

To Its Nearest Gallery

To Its Nearest Subway Station

To Its Nearest Park

The Density

Of Its Nearby Galleries

Of Its Nearby Subway Stations

Of Its Nearby Parks

Room Type

Entire Home / Apartment

Private Room

Shared Room

Airbnb Listing Price

The Distance

To Its Nearest Gallery

To Its Nearest Subway Station

Room Type

Entire Home / Apartment

Private Room

Shared Room

outcome | continuous variable
Airbnb Listing Price
The expense of one night stay in an Airbnb listing in USD
predictor | continuous variable
Vicinity of The Gallery
The linear distance from one Airbnb listing to its nearest gallery
predictor | continuous variable
Vicinity of The Subway Station
The linear distance from one Airbnb listing to its nearest subway station
predictor | categorical variable
Room Type
A categorical variable representing the types of Airbnb listing.

*Inside my categorical variable Room Type, I choose Entire Home/Apartment as my reference category

To get the vicinity data (distance to an Airbnb listing's nearest gallery/subway station/park/etc.), I will use ArcGIS to calculate the geographical distance value between each airbnb listing to its nearest target of interest.

Data Showcase

Table 1: Descriptive Statistics For Continuous Variables

Airbnb Listing Price
(USD$)

Vicinity of The Gallery
(Meter)

Vicinity of The Subway Station
(Meter)

Full Range

10-2000

2-3658

1-1658

Interquartile Range

95-220

130-395

233-512

Standard Deviation

164

290

228

Mean

186

312

398

Median

150

223

360

Table 1 presents basic descriptive statistics for each continuous variable in the dataset.

Figure 1: Distribution of Airbnb Listing Price

➊ Airbnb Listing Price

Listing prices in the sample have a minimum of $10, and a maximum of $2,000. Half of the Airbnb listings in the sample have a listing price between $95 and $220, representing an interquartile range of $125, which is less than the standard deviation of $164. The median value of $150 is less than the average value of $186, which suggests some left skew in the distribution, as illustrated in Figure 1.

Figure 2: Distribution of Vicinity to The Gallery

➋ Vicinity to The Gallery

The distance from an Airbnb listing to its nearest gallery in the sample has a minimum of 2 metres and a maximum of 3,658 metres. Half of the Airbnb listings to their nearest galleries in the sample have distances between 233 and 512 metres, representing an interquartile range of 279 metres, larger than the standard deviation of 228 metres. The median value of 223 metres is less than the average value of $312, which suggests some left skew in the distribution, as illustrated in Figure 2.

Figure 3: Distribution of Vicinity to The Subway Station

➌ Vicinity to The Subway Station

The distance from an Airbnb listing to its nearest subway station in the sample has a minimum of 1 metre and a maximum of 1,658 metres. Half of the Airbnb listings to their nearest subway stations in the sample have distances between 130 and 395 metres, representing an interquartile range of 265 metres, slightly less than the standard deviation of 295 metres. The median value of 360 metres is less than the average value of $398, which suggests some left skew in the distribution, as illustrated in Figure 3.

Figure 4: Proportions of Sample Listing By Room Type

➍ Airbnb Listing Room Type

Figure 4 illustrates the sample proportions for different Airbnb listing room types. More than half of the listing sample (61 percent) is an entire home or apartment. The remaining listing sample is mostly private rooms (37 percent) only 2 percent of the listing sample is shared rooms.

Hypothese Testing

I will first do hypothesis testing, calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest gallery (both are log-transformed), then calculate a 95-percent confidence interval for a correlation between the listing price and its distance to the nearest subway station, and calculate 95-percent confidence interval for the average listing price on each room type.

Figure 5: Relationship Between Airbnb Listing Price And Distance To Its Nearest Gallery

➊ Relationship between an Airbnb listing price and distance to the nearest gallery

The relationship between an Airbnb listing’s vicinity to its nearest gallery and its listing price is illustrated in Figure 5 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.30 and -0.28. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.

Figure 6: Relationship Between Airbnb Listing Price And Distance To Its Nearest Subway Station

➋ Relationship between an Airbnb listing price and distance to the nearest subway station

The relationship between an Airbnb listing’s vicinity to its nearest subway station and its listing price is illustrated in Figure 6 (both axes are log-transformed).The 95-percent confidence interval for the correlation between the log of an Airbnb listing price and the log of the distance to its nearest gallery is between -0.05 and -0.02. This suggests that we can be 95-percent confident that there is a negative relationship between an Airbnb listing price and the distance to its nearest gallery.

Fugire 7: Distribution Of Airbnb Listing Price

➌ Relationship between an Airbnb listing room type and listing price

Figure 7 shows the average Airbnb listing price within different usual room types. Error bars represent 95-percent confidence intervals. The 95-percent confidence interval for the average Airbnb listing price within Entire home/apts is between $230 and 236. The 95-percent confidence interval for the average Airbnb listing price within private rooms is $110 and 115. The 95-percent confidence interval for the average Airbnb listing price within shared rooms is $81 and 97.

Regression Models

I will then create regression models, to find a linear equation that would let me predict the value of listing price based on the value of its distance to the nearest gallery, its distance to the nearest subway station, and room type.

Table 2: Results Of Regression Model Predicting Listing Price Based On Room Type

Coefficient

Estimated Value

P-Value

Intercept

233.413

<2e-16

Private Room

-121.004

<2e-16

Shared Room

-144.436

<2e-16

➊ Relationship between an Airbnb listing room type and listing price in regression

Table 2 shows the results of the model that predicts an Airbnb listing price based on room type.The R-squared value for this model was 0.1333, suggesting that about 13 percent of the variation on listing price can be explained by differences in rome type. The coefficients for private room and shared room both had significant coefficients, indicating that when booking different room types of Airbnb had listing prices that were significantly different from the listing price for the entire home or apartment. The coefficients for private room and shared room are both negative, indicating that booking these two room types of airbnb would cost less money than booking an entire home or apartment.

Table 3: Results Of Regression Model Predicting Listing Price Based On Distance To Its Nearest Gallery And Distance To Its Nearest Subway Station

Predictor

Estimated Coefficient (In Regression With A Single PRedictor)

Estimated Intercept Value

P-Value

Model R2

Distance To Its Nearest Gallery

-0.11

219.74

<2e-16

0.0376

Distance To Its Nearest Subway Station

-0.014

191.26

0.0032

0.0004

➋ Relationship between an Airbnb listing price and continuous variables (distance to the nearest gallery and distance to the nearest subway station) in regression

Table 3 shows the results of two different regression models: one predicting Airbnb listing price based on the distance to its nearest gallery, and the other predicting listing price based on the distance to its nearest subway station.
The first model with the distance to its nearest gallery predicts about 3.76% of the total variation in listing price, with R-squared values of 0.0376; The second model with the distance to its nearest subway station predicts about 0.04% of the total variation in listing price, with R-squared values of 0.0004.

The coefficient in the first model is significant and negative, indicating that, without controlling for other factors, a shorter distances to an Airbnb’s nearest gallery is associated with higher listing price; The coefficient in the second model, however, is negative but not significant, indicating that, without controlling for other factors, a shorter distance to an Airbnb’s nearest subway station is less associated with higher listing price than a shorter distance to its nearest gallery.

Code Sample (R)

Click To View The Complete Code on GitHub ↗

Limitation

The distance to an Airbnb listing’s nearest gallery and subway station is the linear distance; this can only serve as heuristic value for the accessibility for a listing to its surrounding galleries, subway stations. To get a more accurate or realistic accessibility representation, I need to switch these linear distances to network distances in ArcGIS.

Wrap It Up

The vicinity to the galley, subway station and room types are some features that contribute to an Airbnb listing price in Manhattan, New York. The listing price of an Airbnb associates more with the distance to its nearest gallery than the distance to its nearest subway station, and after accounting for the effects of the distance to the nearest subway station and room type, when it gets closer to an gallery, the listing price would be higher. In order to get a more realistic representation of the vicinity value, my next step would be to switch the linear distance to the network distance to accurately represent the length for transportation.

Next Step

- Switch the linear distance to the network distance to accurately represent the length for transportation.
- Add more geographical variables such as the distance to a listing's nearest park or public institution, and also review scores, etc.
- Publish the data visualization in an interactive format using leaflet.js, d3.js, and Mapbox API.