06 - Exploring Integrations

0. Required library installations

For the execution of one of the integration functions that will be presented here, the geopandas library needs to be installed. To obtain some data for demonstrating the functions, the omnsx library also needs to be installed

conda install geopandas osmnx

1. Imports

import pymove as pm
from pymove.utils import integration as it
from pymove.visualization import folium
import numpy as np
import pandas as pd
import geopandas
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

2. Load Data

move_df = pm.read_csv('geolife_sample.csv', nrows=5000)
move_df.head()
lat lon datetime id
0 39.984094 116.319236 2008-10-23 05:53:05 1
1 39.984198 116.319322 2008-10-23 05:53:06 1
2 39.984224 116.319402 2008-10-23 05:53:11 1
3 39.984211 116.319389 2008-10-23 05:53:16 1
4 39.984217 116.319422 2008-10-23 05:53:21 1
#Tamanho
move_df.shape[0]
5000

Visualization

folium.plot_trajectories(move_df)
Make this Notebook Trusted to load map: File -> Trust Notebook

3. Loading points of interest

bbox = move_df.get_bbox()
folium.plot_bbox(bbox, color='blue')
Make this Notebook Trusted to load map: File -> Trust Notebook
import osmnx as ox

tags = {'amenity':True}
POIs = ox.geometries_from_bbox(north=bbox[0], south=bbox[2], east=bbox[3], west=bbox[1], tags=tags)
POIs.head()
unique_id osmid element_type amenity fee geometry cuisine name name:en atm ... building:material roof:colour roof:material alt_name_1 not:name area ways type name:ja name:ko
0 node/269492188 269492188 node toilets no POINT (116.26750 39.98087) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 node/274942287 274942287 node toilets NaN POINT (116.27358 39.99664) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 node/276320137 276320137 node fast_food NaN POINT (116.33756 39.97541) chinese 永和大王 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 node/276320142 276320142 node massage NaN POINT (116.33751 39.97546) NaN Footmassage 富橋 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 node/286242547 286242547 node toilets NaN POINT (116.19982 40.00670) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 118 columns

Removing unrated (null) points of interest

POIs = POIs.dropna(subset=["amenity"], inplace=False)

Adapting to the format needed for integration (With labels ‘lat’ and ‘lon’ referring to latitude and longitude, respectively)

POIs = POIs[POIs['geometry'].type == 'Point']
POIs['lon'] = POIs['geometry'].x
POIs['lat'] = POIs['geometry'].y

Visualization

m = folium.plot_trajectories(move_df)
folium.plot_poi(POIs, slice_tags=['amenity'], base_map=m, poi_point='blue')
Make this Notebook Trusted to load map: File -> Trust Notebook

4. Integrating Points of Interest into the DataSet

df_4 = move_df.copy()
df_4 = it.join_with_pois(df_4, POIs, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))

Result

df_4.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 5572452688 116.862844 太平洋影城(中关村店)
1 39.984198 116.319322 2008-10-23 05:53:06 1 5572452688 119.142692 太平洋影城(中关村店)
2 39.984224 116.319402 2008-10-23 05:53:11 1 5572452688 116.595117 太平洋影城(中关村店)
3 39.984211 116.319389 2008-10-23 05:53:16 1 5572452688 116.257378 太平洋影城(中关村店)
4 39.984217 116.319422 2008-10-23 05:53:21 1 5572452688 114.886759 太平洋影城(中关村店)

Point of interest closest to each point of the trajectory

df_4['name_poi'].unique()
array(['太平洋影城(中关村店)', '东亚银行', '南京银行', '星巴克', '小吊梨汤', nan, '鑫蜀源', '必胜客',
       '潜渊', '上岛咖啡', '科苑餐厅', '2nd Place', '元绿回转寿司', '中信银行', 'HSBC',
       '咖啡王(暂时停业)', '招商银行', '中国建设银行', 'Paradiso Coffee', '798 bar',
       'Jazz Cafe', 'Hundred Years Cafe', '安家小厨', '清青快餐', '听涛园', '仰望咖啡',
       'China Construction Bank', '同仁堂', '北园餐厅', '北京银行', '交通银行', '宁波银行',
       '美嘉欢乐影城', '北京101中学', '西苑医院', 'Yu Xiao Mian Noodles', '茶大爷',
       "McDonald's", 'Pizza Hut', 'Starbucks', '云海肴', '兰州老妈拉面'],
      dtype=object)

5. Integrating Points of Interest into the DataSet (Using join_with_pois_optimizer)

Selecting data

POIs_5 = POIs[0:10].copy()
POIs_5['type_poi'] = POIs_5['amenity']
df_5 = move_df.copy()
POIs_5['type_poi'].unique()
array(['toilets', 'fast_food', 'massage', 'waste_basket', 'cafe',
       'restaurant', 'bank'], dtype=object)

Executing the function

df_5 = it.join_with_pois(df_5, POIs_5, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=10)))
df_5.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 312152376 1061.807427 永和大王
1 39.984198 116.319322 2008-10-23 05:53:06 1 312152376 1048.334810 永和大王
2 39.984224 116.319402 2008-10-23 05:53:11 1 312152376 1041.594793 永和大王
3 39.984211 116.319389 2008-10-23 05:53:16 1 312152376 1043.408891 永和大王
4 39.984217 116.319422 2008-10-23 05:53:21 1 312152376 1041.019464 永和大王

6. Integrating Points of Interest into the Category-Based DataSet

POIs_5
unique_id osmid element_type amenity fee geometry cuisine name name:en atm ... alt_name_1 not:name area ways type name:ja name:ko lon lat type_poi
0 node/269492188 269492188 node toilets no POINT (116.26750 39.98087) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.267504 39.980869 toilets
1 node/274942287 274942287 node toilets NaN POINT (116.27358 39.99664) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.273579 39.996640 toilets
2 node/276320137 276320137 node fast_food NaN POINT (116.33756 39.97541) chinese 永和大王 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.337557 39.975411 fast_food
3 node/276320142 276320142 node massage NaN POINT (116.33751 39.97546) NaN Footmassage 富橋 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.337510 39.975463 massage
4 node/286242547 286242547 node toilets NaN POINT (116.19982 40.00670) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.199822 40.006700 toilets
5 node/286246121 286246121 node waste_basket NaN POINT (116.20290 39.99787) NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.202902 39.997869 waste_basket
6 node/290600874 290600874 node cafe NaN POINT (116.32900 39.99117) NaN 迷你站奶茶专门店 Mini Station Milktea NaN ... NaN NaN NaN NaN NaN NaN NaN 116.328997 39.991167 cafe
7 node/297407376 297407376 node restaurant NaN POINT (116.33981 39.97537) NaN 沸腾渔乡 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN 116.339810 39.975369 restaurant
8 node/297407444 297407444 node bank NaN POINT (116.33826 39.97546) NaN 招商银行 China Merchants Bank yes ... NaN NaN NaN NaN NaN NaN NaN 116.338260 39.975462 bank
9 node/312152376 312152376 node restaurant NaN POINT (116.32766 39.99113) NaN 永和大王 Yonghe King NaN ... NaN NaN NaN NaN NaN NaN NaN 116.327660 39.991132 restaurant

10 rows × 121 columns

df_6 = move_df.copy()
df_6 = it.join_with_pois_by_category(df_6, POIs_5, label_category='amenity', label_id='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
df_6.head(10)
lat lon datetime id id_toilets dist_toilets id_fast_food dist_fast_food id_massage dist_massage id_waste_basket dist_waste_basket id_cafe dist_cafe id_restaurant dist_restaurant id_bank dist_bank
0 39.984094 116.319236 2008-10-23 05:53:05 1 NaN 4132.229067 永和大王 1835.502157 Footmassage 富橋 1829.070918 NaN 10028.323311 迷你站奶茶专门店 1144.603484 永和大王 1061.807427 招商银行 1883.831094
1 39.984198 116.319322 2008-10-23 05:53:06 1 NaN 4135.240296 永和大王 1835.403414 Footmassage 富橋 1828.951254 NaN 10033.797904 迷你站奶茶专门店 1131.338544 永和大王 1048.334810 招商银行 1883.466601
2 39.984224 116.319402 2008-10-23 05:53:11 1 NaN 4140.698090 永和大王 1831.182086 Footmassage 富橋 1824.720741 NaN 10040.095434 迷你站奶茶专门店 1124.395459 永和大王 1041.594793 招商银行 1879.127020
3 39.984211 116.319389 2008-10-23 05:53:16 1 NaN 4140.136625 永和大王 1831.345213 Footmassage 富橋 1824.886604 NaN 10039.220172 迷你站奶茶专门店 1126.193301 永和大王 1043.408891 招商银行 1879.325712
4 39.984217 116.319422 2008-10-23 05:53:21 1 NaN 4142.564150 永和大王 1829.326076 Footmassage 富橋 1822.864349 NaN 10041.897836 迷你站奶茶专门店 1123.692580 永和大王 1041.019464 招商银行 1877.266370
5 39.984710 116.319865 2008-10-23 05:53:23 1 NaN 4160.348133 永和大王 1827.992513 Footmassage 富橋 1821.434719 NaN 10071.059512 迷你站奶茶专门店 1058.680139 永和大王 975.127648 招商银行 1874.593280
6 39.984674 116.319810 2008-10-23 05:53:28 1 NaN 4157.187813 永和大王 1829.602658 Footmassage 富橋 1823.053098 NaN 10067.008973 迷你站奶茶专门店 1064.838599 永和大王 981.250366 招商银行 1876.325343
7 39.984623 116.319773 2008-10-23 05:53:33 1 NaN 4156.022778 永和大王 1829.027475 Footmassage 富橋 1822.486938 NaN 10064.722571 迷你站奶茶专门店 1071.002908 永和大王 987.550029 招商银行 1875.882508
8 39.984606 116.319732 2008-10-23 05:53:38 1 NaN 4153.324576 永和大王 1830.866492 Footmassage 富橋 1824.330899 NaN 10061.545473 迷你站奶茶专门店 1074.850689 永和大王 991.312815 招商银行 1877.792905
9 39.984555 116.319728 2008-10-23 05:53:43 1 NaN 4154.833968 永和大王 1827.989263 Footmassage 富橋 1821.460600 NaN 10062.044871 迷你站奶茶专门店 1078.957681 永和大王 995.702764 招商银行 1875.015873

7. Integrating events (points of interest with timestamp) to the DataSet

It integrates a normal dataframe with Points of interest of events, that is, in addition to the labels referring to latitude and longitude, it also has a label referring to the datetime in which the event occurred. In this example, we will assign random date and time values to some POIs to simulate an operation.

indexOfPois = np.arange(0, POIs.shape[0], POIs.shape[0]/20, dtype=np.int64)
POIs_events = POIs.iloc[indexOfPois].copy()
randomIndexOfMoveDf = np.arange(0, move_df.shape[0], move_df.shape[0]/20, dtype=np.int64)
randomMoveDfSlice = move_df.iloc[randomIndexOfMoveDf].copy()
POIs_events['datetime'] = randomMoveDfSlice['datetime'].copy()
df_7 = move_df.copy()
df_7 = it.join_with_events(
    df_7, POIs_events,
    label_date='datetime', time_window=900,
    label_event_id='osmid', label_event_type='amenity'
)
VBox(children=(HTML(value=''), IntProgress(value=0, max=20)))
df_7.head()
lat lon datetime id osmid dist_event amenity
0 39.984094 116.319236 2008-10-23 05:53:05 1 269492188 4422.237186 toilets
1 39.984198 116.319322 2008-10-23 05:53:06 1 269492188 4430.488277 toilets
2 39.984224 116.319402 2008-10-23 05:53:11 1 269492188 4437.521909 toilets
3 39.984211 116.319389 2008-10-23 05:53:16 1 269492188 4436.297310 toilets
4 39.984217 116.319422 2008-10-23 05:53:21 1 269492188 4439.154806 toilets

8. Integration with Point of Interest HOME

The Home type contains, in addition to latitude, longitude and id, the address and city labels.

Creating a home point

df_8 = move_df.copy()
home_df = df_8.iloc[300:302].copy()
home_df['formatted_address'] = ['Rua1, n02', 'Rua2, n03']
home_df['city'] = ['ChinaTown', 'ChinaTown']

Using the function

df_8 = it.join_with_home_by_id(df_8, home_df, label_id='id')
VBox(children=(HTML(value=''), IntProgress(value=0, max=1)))
df_8.head()
id lat lon datetime dist_home home city
0 1 39.984094 116.319236 2008-10-23 05:53:05 1031.348370 Rua1, n02 ChinaTown
1 1 39.984198 116.319322 2008-10-23 05:53:06 1017.690147 Rua1, n02 ChinaTown
2 1 39.984224 116.319402 2008-10-23 05:53:11 1011.332141 Rua1, n02 ChinaTown
3 1 39.984211 116.319389 2008-10-23 05:53:16 1013.152700 Rua1, n02 ChinaTown
4 1 39.984217 116.319422 2008-10-23 05:53:21 1010.959220 Rua1, n02 ChinaTown

9. Merge of HOME with DataSet already integrated with POIs

Integration

df_9 = it.join_with_pois(df_8, POIs, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))
df_9 = it.merge_home_with_poi(df_9)
df_9.head()
id lat lon datetime city id_poi dist_poi name_poi
0 1 39.984094 116.319236 2008-10-23 05:53:05 ChinaTown 5572452688 116.862844 太平洋影城(中关村店)
1 1 39.984198 116.319322 2008-10-23 05:53:06 ChinaTown 5572452688 119.142692 太平洋影城(中关村店)
2 1 39.984224 116.319402 2008-10-23 05:53:11 ChinaTown 5572452688 116.595117 太平洋影城(中关村店)
3 1 39.984211 116.319389 2008-10-23 05:53:16 ChinaTown 5572452688 116.257378 太平洋影城(中关村店)
4 1 39.984217 116.319422 2008-10-23 05:53:21 ChinaTown 5572452688 114.886759 太平洋影城(中关村店)

11. Union functions

They have the purpose of joining several types of POI that mean the same thing, or similar things, in a single type of POI

Union of Banks

Converts POIs of the types “bank_filials”, “bank_agencies”, “bank_posts”, “bank_PAE” and “bank” to a single type: “banks”

df_banks = move_df.copy()

#We create POIs with different type_poi that describe different types of banks to test
indexes_bp = np.linspace(0, df_banks.shape[0], 6)
banks_pois = df_banks[df_banks.index.isin(indexes_bp)].copy()
banks_pois['id'] = [0,1,2,3,4]
banks_pois['type_poi'] = ['bancos_filiais', 'bancos_agencias', 'bancos_postos', 'bancos_PAE', 'bank']

banks_pois.head()
lat lon datetime id type_poi
0 39.984094 116.319236 2008-10-23 05:53:05 0 bancos_filiais
1000 40.014125 116.306159 2008-10-23 23:43:56 1 bancos_agencias
2000 39.979558 116.312653 2008-10-24 03:26:10 2 bancos_postos
3000 39.979370 116.320649 2008-10-24 06:31:04 3 bancos_PAE
4000 40.003274 116.267484 2008-10-25 00:54:34 4 bank
#Join with POIs
df_banks = it.join_with_pois(df_banks, banks_pois, label_id='id', label_poi_name='type_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))
#Result
df_banks.head(10)
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 0 0.000000 bancos_filiais
1 39.984198 116.319322 2008-10-23 05:53:06 1 0 13.690153 bancos_filiais
2 39.984224 116.319402 2008-10-23 05:53:11 1 0 20.223428 bancos_filiais
3 39.984211 116.319389 2008-10-23 05:53:16 1 0 18.416895 bancos_filiais
4 39.984217 116.319422 2008-10-23 05:53:21 1 0 20.933073 bancos_filiais
5 39.984710 116.319865 2008-10-23 05:53:23 1 0 86.969343 bancos_filiais
6 39.984674 116.319810 2008-10-23 05:53:28 1 0 80.938365 bancos_filiais
7 39.984623 116.319773 2008-10-23 05:53:33 1 0 74.520547 bancos_filiais
8 39.984606 116.319732 2008-10-23 05:53:38 1 0 70.901768 bancos_filiais
9 39.984555 116.319728 2008-10-23 05:53:43 1 0 66.217975 bancos_filiais
#Checking the amount of each point assigned to each type of poi
bancos_filiais = df_banks.loc[df_banks['name_poi'] == 'bancos_filiais']
bancos_agencias = df_banks.loc[df_banks['name_poi'] == 'bancos_agencias']
bancos_postos = df_banks.loc[df_banks['name_poi'] == 'bancos_postos']
bancos_PAE = df_banks.loc[df_banks['name_poi'] == 'bancos_PA']
bank = df_banks.loc[df_banks['name_poi'] == 'bank']

print("Number of points close to each bank definition")
print("bancos_filiais: ", bancos_filiais.shape[0])
print("bancos_agencias: ", bancos_agencias.shape[0])
print("bancos_postos: ", bancos_postos.shape[0])
print("bancos_PAE: ", bancos_PAE.shape[0])
print("bank: ", bank.shape[0])
Number of points close to each bank definition
bancos_filiais:  579
bancos_agencias:  1407
bancos_postos:  916
bancos_PAE:  0
bank:  1238
#Finally, the Union
df_banks = it.union_poi_bank(df_banks, label_poi="name_poi")

#Result
df_banks.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 0 0.000000 banks
1 39.984198 116.319322 2008-10-23 05:53:06 1 0 13.690153 banks
2 39.984224 116.319402 2008-10-23 05:53:11 1 0 20.223428 banks
3 39.984211 116.319389 2008-10-23 05:53:16 1 0 18.416895 banks
4 39.984217 116.319422 2008-10-23 05:53:21 1 0 20.933073 banks
#Checking
df_banks.loc[df_banks['name_poi'] == 'banks'].shape[0]
5000

Union of Bus Stations

Converts “transit_station” and “bus_points” POIs to a single type: “bus_station”

df_bus = move_df.copy()


#We create POIs with different name_poi that describe different types of bus stops to test
indexes_bp = np.linspace(0, df_bus.shape[0], 6)
bus_pois = df_bus[df_bus.index.isin(indexes_bp)].copy()
bus_pois['id'] = [0,1,2,3,4]
bus_pois['name_poi'] = ['transit_station', 'transit_station', 'pontos_de_onibus', 'transit_station', 'pontos_de_onibus']

#Result
bus_pois.head()
lat lon datetime id name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 0 transit_station
1000 40.014125 116.306159 2008-10-23 23:43:56 1 transit_station
2000 39.979558 116.312653 2008-10-24 03:26:10 2 pontos_de_onibus
3000 39.979370 116.320649 2008-10-24 06:31:04 3 transit_station
4000 40.003274 116.267484 2008-10-25 00:54:34 4 pontos_de_onibus
#Integration
df_bus = it.join_with_pois(df_bus, bus_pois, label_id='id', label_poi_name='name_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))
#Result
df_bus.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 0 0.000000 transit_station
1 39.984198 116.319322 2008-10-23 05:53:06 1 0 13.690153 transit_station
2 39.984224 116.319402 2008-10-23 05:53:11 1 0 20.223428 transit_station
3 39.984211 116.319389 2008-10-23 05:53:16 1 0 18.416895 transit_station
4 39.984217 116.319422 2008-10-23 05:53:21 1 0 20.933073 transit_station
transit_station = df_bus.loc[df_bus['name_poi'] == 'transit_station']
pontos_de_onibus = df_bus.loc[df_bus['name_poi'] == 'pontos_de_onibus']

print("Number of points near transit_station's: ", transit_station.shape[0])
print("Number of points close to pontos_de_onibus's: ", pontos_de_onibus.shape[0])
Number of points near transit_station's:  2846
Number of points close to pontos_de_onibus's:  2154
#The union function
df_bus = it.union_poi_bus_station(df_bus, label_poi="name_poi")

df_bus.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 0 0.000000 bus_station
1 39.984198 116.319322 2008-10-23 05:53:06 1 0 13.690153 bus_station
2 39.984224 116.319402 2008-10-23 05:53:11 1 0 20.223428 bus_station
3 39.984211 116.319389 2008-10-23 05:53:16 1 0 18.416895 bus_station
4 39.984217 116.319422 2008-10-23 05:53:21 1 0 20.933073 bus_station
#Checking

df_bus.loc[df_bus['name_poi'] == 'bus_station'].shape[0]
5000

Union of Bars and Restaurants

Converts “bar” and “restaurant” POIs to a single type: “bar-restaurant”

df_bar = move_df.copy()

#We create POIs with both types
indexes_br = np.linspace(0, df_bar.shape[0], 5)
br_POIs = df_bar[df_bar.index.isin(indexes_br)].copy()
br_POIs['name_poi'] = ['bar','restaurant','restaurant', 'bar']

#Result
br_POIs.head()
lat lon datetime id name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 bar
1250 39.999756 116.322556 2008-10-23 23:58:02 1 restaurant
2500 39.979533 116.323162 2008-10-24 05:31:19 1 restaurant
3750 39.996251 116.293837 2008-10-25 00:40:56 1 bar
#Integration
df_bar = it.join_with_pois(df_bar, br_POIs, label_id='id', label_poi_name='name_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
#Result
df_bar.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 bar
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 bar
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 bar
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 bar
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 bar
#Number of points close to each type
bar = df_bar.loc[df_bar['name_poi'] == 'bar']
restaurant = df_bar.loc[df_bar['name_poi'] == 'restaurant']

print("Closest type points 'bar': ", bar.shape[0])
print("Closest type points 'restaurant': ", restaurant.shape[0])
Closest type points 'bar':  2539
Closest type points 'restaurant':  2461
#Union of the two types of POIs into a single
df_bar = it.union_poi_bar_restaurant(df_bar, label_poi="name_poi")

#Result
df_bar.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 bar-restaurant
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 bar-restaurant
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 bar-restaurant
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 bar-restaurant
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 bar-restaurant
#Checking
df_bar.loc[df_bar['name_poi'] == 'bar-restaurant'].shape[0]
5000

Union of Parks

Converts “pracas_e_parques” and “park” POIs to a single type: “parks”

df_parks = move_df.copy()

#We create POIs with both types
indexes_p = np.linspace(0, df_parks.shape[0], 5)
p_POIs = df_parks[df_parks.index.isin(indexes_p)].copy()
p_POIs['name_poi'] = ['pracas_e_parques','pracas_e_parques','park', 'park']

#Result
p_POIs.head()
lat lon datetime id name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 pracas_e_parques
1250 39.999756 116.322556 2008-10-23 23:58:02 1 pracas_e_parques
2500 39.979533 116.323162 2008-10-24 05:31:19 1 park
3750 39.996251 116.293837 2008-10-25 00:40:56 1 park
#Integration
df_parks = it.join_with_pois(df_parks, p_POIs, label_id='id', label_poi_name='name_poi')

#Result
df_parks.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 pracas_e_parques
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 pracas_e_parques
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 pracas_e_parques
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 pracas_e_parques
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 pracas_e_parques
#Number of points close to each type of POI
pracas_e_parques = df_parks.loc[df_parks['name_poi'] == 'pracas_e_parques']
park = df_parks.loc[df_parks['name_poi'] == 'park']

print("Number of points closest to pracas_e_parques: ", pracas_e_parques.shape[0])
print("Number of points closest to park: ", park.shape[0])
Number of points closest to pracas_e_parques:  2716
Number of points closest to park:  2284
#Union function
df_parks = it.union_poi_parks(df_parks, label_poi="name_poi")

df_parks.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 parks
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 parks
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 parks
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 parks
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 parks
#Checking the new quantity
df_parks.loc[df_parks['name_poi'] == 'parks'].shape[0]
5000

Union of police points

df_police = move_df.copy()

#We create POIs with both types
indexes_pol = np.linspace(0, df_police.shape[0], 5)
pol_POIs = df_police[df_police.index.isin(indexes_pol)].copy()
pol_POIs['name_poi'] = ['distritos_policiais','police','distritos_policiais', 'distritos_policiais']

#Result
pol_POIs.head()
lat lon datetime id name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 distritos_policiais
1250 39.999756 116.322556 2008-10-23 23:58:02 1 police
2500 39.979533 116.323162 2008-10-24 05:31:19 1 distritos_policiais
3750 39.996251 116.293837 2008-10-25 00:40:56 1 distritos_policiais
#Integration
df_police = it.join_with_pois(df_police, pol_POIs, label_id='id', label_poi_name='name_poi')

df_police.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 distritos_policiais
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 distritos_policiais
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 distritos_policiais
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 distritos_policiais
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 distritos_policiais
#Quantity of points closest to each type of point
distritos_policiais = df_police.loc[df_police['name_poi'] == 'distritos_policiais']

print("Number of points closest to distritos_policiais: ", distritos_policiais.shape[0])
Number of points closest to distritos_policiais:  3420
#Union funcion
df_police = it.union_poi_police(df_police, label_poi="name_poi")
#Result
df_police.head()
lat lon datetime id id_poi dist_poi name_poi
0 39.984094 116.319236 2008-10-23 05:53:05 1 1 0.000000 police
1 39.984198 116.319322 2008-10-23 05:53:06 1 1 13.690153 police
2 39.984224 116.319402 2008-10-23 05:53:11 1 1 20.223428 police
3 39.984211 116.319389 2008-10-23 05:53:16 1 1 18.416895 police
4 39.984217 116.319422 2008-10-23 05:53:21 1 1 20.933073 police
#Checking
df_police.loc[df_police['name_poi'] == 'police'].shape[0]
5000

12. Integração entre trajetórias e áreas coletivas

df_pd = pd.read_csv('geolife_sample.csv')
df_12 = df_pd[0:2000]
gdf = geopandas.GeoDataFrame(df_12, geometry=geopandas.points_from_xy(df_12.lon, df_12.lat))
gdf.head()
lat lon datetime id geometry
0 39.984094 116.319236 2008-10-23 05:53:05 1 POINT (116.31924 39.98409)
1 39.984198 116.319322 2008-10-23 05:53:06 1 POINT (116.31932 39.98420)
2 39.984224 116.319402 2008-10-23 05:53:11 1 POINT (116.31940 39.98422)
3 39.984211 116.319389 2008-10-23 05:53:16 1 POINT (116.31939 39.98421)
4 39.984217 116.319422 2008-10-23 05:53:21 1 POINT (116.31942 39.98422)
#Creating collective areas
indexes_ac = np.linspace(0, gdf.shape[0], 5)
area_c = df_12[df_12.index.isin(indexes_ac)].copy()
area_c
lat lon datetime id geometry
0 39.984094 116.319236 2008-10-23 05:53:05 1 POINT (116.31924 39.98409)
500 40.006436 116.317701 2008-10-23 10:53:31 1 POINT (116.31770 40.00644)
1000 40.014125 116.306159 2008-10-23 23:43:56 1 POINT (116.30616 40.01412)
1500 39.979009 116.326873 2008-10-24 00:11:29 1 POINT (116.32687 39.97901)
#Integration
gdf = it.join_collective_areas(gdf, area_c)
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
gdf.head()
lat lon datetime id geometry violating
0 39.984094 116.319236 2008-10-23 05:53:05 1 POINT (116.31924 39.98409) True
1 39.984198 116.319322 2008-10-23 05:53:06 1 POINT (116.31932 39.98420) False
2 39.984224 116.319402 2008-10-23 05:53:11 1 POINT (116.31940 39.98422) False
3 39.984211 116.319389 2008-10-23 05:53:16 1 POINT (116.31939 39.98421) False
4 39.984217 116.319422 2008-10-23 05:53:21 1 POINT (116.31942 39.98422) False