06 - Exploring Integrations¶
0. Required library installations¶
For the execution of one of the integration functions that will be presented here, the geopandas library needs to be installed. To obtain some data for demonstrating the functions, the omnsx library also needs to be installed
conda install geopandas osmnx
1. Imports¶
import pymove as pm
from pymove.utils import integration as it
from pymove.visualization import folium
import numpy as np
import pandas as pd
import geopandas
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
2. Load Data¶
move_df = pm.read_csv('geolife_sample.csv', nrows=5000)
move_df.head()
lat | lon | datetime | id | |
---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 |
#Tamanho
move_df.shape[0]
5000
Visualization¶
folium.plot_trajectories(move_df)
3. Loading points of interest¶
bbox = move_df.get_bbox()
folium.plot_bbox(bbox, color='blue')
import osmnx as ox
tags = {'amenity':True}
POIs = ox.geometries_from_bbox(north=bbox[0], south=bbox[2], east=bbox[3], west=bbox[1], tags=tags)
POIs.head()
unique_id | osmid | element_type | amenity | fee | geometry | cuisine | name | name:en | atm | ... | building:material | roof:colour | roof:material | alt_name_1 | not:name | area | ways | type | name:ja | name:ko | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node/269492188 | 269492188 | node | toilets | no | POINT (116.26750 39.98087) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | node/274942287 | 274942287 | node | toilets | NaN | POINT (116.27358 39.99664) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | node/276320137 | 276320137 | node | fast_food | NaN | POINT (116.33756 39.97541) | chinese | 永和大王 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | node/276320142 | 276320142 | node | massage | NaN | POINT (116.33751 39.97546) | NaN | Footmassage 富橋 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | node/286242547 | 286242547 | node | toilets | NaN | POINT (116.19982 40.00670) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 118 columns
Removing unrated (null) points of interest
POIs = POIs.dropna(subset=["amenity"], inplace=False)
Adapting to the format needed for integration (With labels ‘lat’ and ‘lon’ referring to latitude and longitude, respectively)
POIs = POIs[POIs['geometry'].type == 'Point']
POIs['lon'] = POIs['geometry'].x
POIs['lat'] = POIs['geometry'].y
Visualization¶
m = folium.plot_trajectories(move_df)
folium.plot_poi(POIs, slice_tags=['amenity'], base_map=m, poi_point='blue')
4. Integrating Points of Interest into the DataSet¶
df_4 = move_df.copy()
df_4 = it.join_with_pois(df_4, POIs, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))
Result
df_4.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 5572452688 | 116.862844 | 太平洋影城(中关村店) |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 5572452688 | 119.142692 | 太平洋影城(中关村店) |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 5572452688 | 116.595117 | 太平洋影城(中关村店) |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 5572452688 | 116.257378 | 太平洋影城(中关村店) |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 5572452688 | 114.886759 | 太平洋影城(中关村店) |
Point of interest closest to each point of the trajectory
df_4['name_poi'].unique()
array(['太平洋影城(中关村店)', '东亚银行', '南京银行', '星巴克', '小吊梨汤', nan, '鑫蜀源', '必胜客',
'潜渊', '上岛咖啡', '科苑餐厅', '2nd Place', '元绿回转寿司', '中信银行', 'HSBC',
'咖啡王(暂时停业)', '招商银行', '中国建设银行', 'Paradiso Coffee', '798 bar',
'Jazz Cafe', 'Hundred Years Cafe', '安家小厨', '清青快餐', '听涛园', '仰望咖啡',
'China Construction Bank', '同仁堂', '北园餐厅', '北京银行', '交通银行', '宁波银行',
'美嘉欢乐影城', '北京101中学', '西苑医院', 'Yu Xiao Mian Noodles', '茶大爷',
"McDonald's", 'Pizza Hut', 'Starbucks', '云海肴', '兰州老妈拉面'],
dtype=object)
5. Integrating Points of Interest into the DataSet (Using join_with_pois_optimizer)¶
Selecting data
POIs_5 = POIs[0:10].copy()
POIs_5['type_poi'] = POIs_5['amenity']
df_5 = move_df.copy()
POIs_5['type_poi'].unique()
array(['toilets', 'fast_food', 'massage', 'waste_basket', 'cafe',
'restaurant', 'bank'], dtype=object)
Executing the function
df_5 = it.join_with_pois(df_5, POIs_5, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=10)))
df_5.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 312152376 | 1061.807427 | 永和大王 |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 312152376 | 1048.334810 | 永和大王 |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 312152376 | 1041.594793 | 永和大王 |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 312152376 | 1043.408891 | 永和大王 |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 312152376 | 1041.019464 | 永和大王 |
6. Integrating Points of Interest into the Category-Based DataSet¶
POIs_5
unique_id | osmid | element_type | amenity | fee | geometry | cuisine | name | name:en | atm | ... | alt_name_1 | not:name | area | ways | type | name:ja | name:ko | lon | lat | type_poi | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node/269492188 | 269492188 | node | toilets | no | POINT (116.26750 39.98087) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.267504 | 39.980869 | toilets |
1 | node/274942287 | 274942287 | node | toilets | NaN | POINT (116.27358 39.99664) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.273579 | 39.996640 | toilets |
2 | node/276320137 | 276320137 | node | fast_food | NaN | POINT (116.33756 39.97541) | chinese | 永和大王 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.337557 | 39.975411 | fast_food |
3 | node/276320142 | 276320142 | node | massage | NaN | POINT (116.33751 39.97546) | NaN | Footmassage 富橋 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.337510 | 39.975463 | massage |
4 | node/286242547 | 286242547 | node | toilets | NaN | POINT (116.19982 40.00670) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.199822 | 40.006700 | toilets |
5 | node/286246121 | 286246121 | node | waste_basket | NaN | POINT (116.20290 39.99787) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.202902 | 39.997869 | waste_basket |
6 | node/290600874 | 290600874 | node | cafe | NaN | POINT (116.32900 39.99117) | NaN | 迷你站奶茶专门店 | Mini Station Milktea | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.328997 | 39.991167 | cafe |
7 | node/297407376 | 297407376 | node | restaurant | NaN | POINT (116.33981 39.97537) | NaN | 沸腾渔乡 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.339810 | 39.975369 | restaurant |
8 | node/297407444 | 297407444 | node | bank | NaN | POINT (116.33826 39.97546) | NaN | 招商银行 | China Merchants Bank | yes | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.338260 | 39.975462 | bank |
9 | node/312152376 | 312152376 | node | restaurant | NaN | POINT (116.32766 39.99113) | NaN | 永和大王 | Yonghe King | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 116.327660 | 39.991132 | restaurant |
10 rows × 121 columns
df_6 = move_df.copy()
df_6 = it.join_with_pois_by_category(df_6, POIs_5, label_category='amenity', label_id='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))
df_6.head(10)
lat | lon | datetime | id | id_toilets | dist_toilets | id_fast_food | dist_fast_food | id_massage | dist_massage | id_waste_basket | dist_waste_basket | id_cafe | dist_cafe | id_restaurant | dist_restaurant | id_bank | dist_bank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | NaN | 4132.229067 | 永和大王 | 1835.502157 | Footmassage 富橋 | 1829.070918 | NaN | 10028.323311 | 迷你站奶茶专门店 | 1144.603484 | 永和大王 | 1061.807427 | 招商银行 | 1883.831094 |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | NaN | 4135.240296 | 永和大王 | 1835.403414 | Footmassage 富橋 | 1828.951254 | NaN | 10033.797904 | 迷你站奶茶专门店 | 1131.338544 | 永和大王 | 1048.334810 | 招商银行 | 1883.466601 |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | NaN | 4140.698090 | 永和大王 | 1831.182086 | Footmassage 富橋 | 1824.720741 | NaN | 10040.095434 | 迷你站奶茶专门店 | 1124.395459 | 永和大王 | 1041.594793 | 招商银行 | 1879.127020 |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | NaN | 4140.136625 | 永和大王 | 1831.345213 | Footmassage 富橋 | 1824.886604 | NaN | 10039.220172 | 迷你站奶茶专门店 | 1126.193301 | 永和大王 | 1043.408891 | 招商银行 | 1879.325712 |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | NaN | 4142.564150 | 永和大王 | 1829.326076 | Footmassage 富橋 | 1822.864349 | NaN | 10041.897836 | 迷你站奶茶专门店 | 1123.692580 | 永和大王 | 1041.019464 | 招商银行 | 1877.266370 |
5 | 39.984710 | 116.319865 | 2008-10-23 05:53:23 | 1 | NaN | 4160.348133 | 永和大王 | 1827.992513 | Footmassage 富橋 | 1821.434719 | NaN | 10071.059512 | 迷你站奶茶专门店 | 1058.680139 | 永和大王 | 975.127648 | 招商银行 | 1874.593280 |
6 | 39.984674 | 116.319810 | 2008-10-23 05:53:28 | 1 | NaN | 4157.187813 | 永和大王 | 1829.602658 | Footmassage 富橋 | 1823.053098 | NaN | 10067.008973 | 迷你站奶茶专门店 | 1064.838599 | 永和大王 | 981.250366 | 招商银行 | 1876.325343 |
7 | 39.984623 | 116.319773 | 2008-10-23 05:53:33 | 1 | NaN | 4156.022778 | 永和大王 | 1829.027475 | Footmassage 富橋 | 1822.486938 | NaN | 10064.722571 | 迷你站奶茶专门店 | 1071.002908 | 永和大王 | 987.550029 | 招商银行 | 1875.882508 |
8 | 39.984606 | 116.319732 | 2008-10-23 05:53:38 | 1 | NaN | 4153.324576 | 永和大王 | 1830.866492 | Footmassage 富橋 | 1824.330899 | NaN | 10061.545473 | 迷你站奶茶专门店 | 1074.850689 | 永和大王 | 991.312815 | 招商银行 | 1877.792905 |
9 | 39.984555 | 116.319728 | 2008-10-23 05:53:43 | 1 | NaN | 4154.833968 | 永和大王 | 1827.989263 | Footmassage 富橋 | 1821.460600 | NaN | 10062.044871 | 迷你站奶茶专门店 | 1078.957681 | 永和大王 | 995.702764 | 招商银行 | 1875.015873 |
7. Integrating events (points of interest with timestamp) to the DataSet¶
It integrates a normal dataframe with Points of interest of events, that is, in addition to the labels referring to latitude and longitude, it also has a label referring to the datetime in which the event occurred. In this example, we will assign random date and time values to some POIs to simulate an operation.
indexOfPois = np.arange(0, POIs.shape[0], POIs.shape[0]/20, dtype=np.int64)
POIs_events = POIs.iloc[indexOfPois].copy()
randomIndexOfMoveDf = np.arange(0, move_df.shape[0], move_df.shape[0]/20, dtype=np.int64)
randomMoveDfSlice = move_df.iloc[randomIndexOfMoveDf].copy()
POIs_events['datetime'] = randomMoveDfSlice['datetime'].copy()
df_7 = move_df.copy()
df_7 = it.join_with_events(
df_7, POIs_events,
label_date='datetime', time_window=900,
label_event_id='osmid', label_event_type='amenity'
)
VBox(children=(HTML(value=''), IntProgress(value=0, max=20)))
df_7.head()
lat | lon | datetime | id | osmid | dist_event | amenity | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 269492188 | 4422.237186 | toilets |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 269492188 | 4430.488277 | toilets |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 269492188 | 4437.521909 | toilets |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 269492188 | 4436.297310 | toilets |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 269492188 | 4439.154806 | toilets |
8. Integration with Point of Interest HOME¶
The Home type contains, in addition to latitude, longitude and id, the address and city labels.
Creating a home point
df_8 = move_df.copy()
home_df = df_8.iloc[300:302].copy()
home_df['formatted_address'] = ['Rua1, n02', 'Rua2, n03']
home_df['city'] = ['ChinaTown', 'ChinaTown']
Using the function
df_8 = it.join_with_home_by_id(df_8, home_df, label_id='id')
VBox(children=(HTML(value=''), IntProgress(value=0, max=1)))
df_8.head()
id | lat | lon | datetime | dist_home | home | city | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1031.348370 | Rua1, n02 | ChinaTown |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1017.690147 | Rua1, n02 | ChinaTown |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1011.332141 | Rua1, n02 | ChinaTown |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1013.152700 | Rua1, n02 | ChinaTown |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1010.959220 | Rua1, n02 | ChinaTown |
9. Merge of HOME with DataSet already integrated with POIs¶
Integration
df_9 = it.join_with_pois(df_8, POIs, label_id='osmid', label_poi_name='name')
VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))
df_9 = it.merge_home_with_poi(df_9)
df_9.head()
id | lat | lon | datetime | city | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | ChinaTown | 5572452688 | 116.862844 | 太平洋影城(中关村店) |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | ChinaTown | 5572452688 | 119.142692 | 太平洋影城(中关村店) |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | ChinaTown | 5572452688 | 116.595117 | 太平洋影城(中关村店) |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | ChinaTown | 5572452688 | 116.257378 | 太平洋影城(中关村店) |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | ChinaTown | 5572452688 | 114.886759 | 太平洋影城(中关村店) |
11. Union functions¶
They have the purpose of joining several types of POI that mean the same thing, or similar things, in a single type of POI
Union of Banks¶
Converts POIs of the types “bank_filials”, “bank_agencies”, “bank_posts”, “bank_PAE” and “bank” to a single type: “banks”
df_banks = move_df.copy()
#We create POIs with different type_poi that describe different types of banks to test
indexes_bp = np.linspace(0, df_banks.shape[0], 6)
banks_pois = df_banks[df_banks.index.isin(indexes_bp)].copy()
banks_pois['id'] = [0,1,2,3,4]
banks_pois['type_poi'] = ['bancos_filiais', 'bancos_agencias', 'bancos_postos', 'bancos_PAE', 'bank']
banks_pois.head()
lat | lon | datetime | id | type_poi | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 0 | bancos_filiais |
1000 | 40.014125 | 116.306159 | 2008-10-23 23:43:56 | 1 | bancos_agencias |
2000 | 39.979558 | 116.312653 | 2008-10-24 03:26:10 | 2 | bancos_postos |
3000 | 39.979370 | 116.320649 | 2008-10-24 06:31:04 | 3 | bancos_PAE |
4000 | 40.003274 | 116.267484 | 2008-10-25 00:54:34 | 4 | bank |
#Join with POIs
df_banks = it.join_with_pois(df_banks, banks_pois, label_id='id', label_poi_name='type_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))
#Result
df_banks.head(10)
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 0 | 0.000000 | bancos_filiais |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 0 | 13.690153 | bancos_filiais |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 0 | 20.223428 | bancos_filiais |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 0 | 18.416895 | bancos_filiais |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 0 | 20.933073 | bancos_filiais |
5 | 39.984710 | 116.319865 | 2008-10-23 05:53:23 | 1 | 0 | 86.969343 | bancos_filiais |
6 | 39.984674 | 116.319810 | 2008-10-23 05:53:28 | 1 | 0 | 80.938365 | bancos_filiais |
7 | 39.984623 | 116.319773 | 2008-10-23 05:53:33 | 1 | 0 | 74.520547 | bancos_filiais |
8 | 39.984606 | 116.319732 | 2008-10-23 05:53:38 | 1 | 0 | 70.901768 | bancos_filiais |
9 | 39.984555 | 116.319728 | 2008-10-23 05:53:43 | 1 | 0 | 66.217975 | bancos_filiais |
#Checking the amount of each point assigned to each type of poi
bancos_filiais = df_banks.loc[df_banks['name_poi'] == 'bancos_filiais']
bancos_agencias = df_banks.loc[df_banks['name_poi'] == 'bancos_agencias']
bancos_postos = df_banks.loc[df_banks['name_poi'] == 'bancos_postos']
bancos_PAE = df_banks.loc[df_banks['name_poi'] == 'bancos_PA']
bank = df_banks.loc[df_banks['name_poi'] == 'bank']
print("Number of points close to each bank definition")
print("bancos_filiais: ", bancos_filiais.shape[0])
print("bancos_agencias: ", bancos_agencias.shape[0])
print("bancos_postos: ", bancos_postos.shape[0])
print("bancos_PAE: ", bancos_PAE.shape[0])
print("bank: ", bank.shape[0])
Number of points close to each bank definition
bancos_filiais: 579
bancos_agencias: 1407
bancos_postos: 916
bancos_PAE: 0
bank: 1238
#Finally, the Union
df_banks = it.union_poi_bank(df_banks, label_poi="name_poi")
#Result
df_banks.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 0 | 0.000000 | banks |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 0 | 13.690153 | banks |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 0 | 20.223428 | banks |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 0 | 18.416895 | banks |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 0 | 20.933073 | banks |
#Checking
df_banks.loc[df_banks['name_poi'] == 'banks'].shape[0]
5000
Union of Bus Stations¶
Converts “transit_station” and “bus_points” POIs to a single type: “bus_station”
df_bus = move_df.copy()
#We create POIs with different name_poi that describe different types of bus stops to test
indexes_bp = np.linspace(0, df_bus.shape[0], 6)
bus_pois = df_bus[df_bus.index.isin(indexes_bp)].copy()
bus_pois['id'] = [0,1,2,3,4]
bus_pois['name_poi'] = ['transit_station', 'transit_station', 'pontos_de_onibus', 'transit_station', 'pontos_de_onibus']
#Result
bus_pois.head()
lat | lon | datetime | id | name_poi | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 0 | transit_station |
1000 | 40.014125 | 116.306159 | 2008-10-23 23:43:56 | 1 | transit_station |
2000 | 39.979558 | 116.312653 | 2008-10-24 03:26:10 | 2 | pontos_de_onibus |
3000 | 39.979370 | 116.320649 | 2008-10-24 06:31:04 | 3 | transit_station |
4000 | 40.003274 | 116.267484 | 2008-10-25 00:54:34 | 4 | pontos_de_onibus |
#Integration
df_bus = it.join_with_pois(df_bus, bus_pois, label_id='id', label_poi_name='name_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))
#Result
df_bus.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 0 | 0.000000 | transit_station |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 0 | 13.690153 | transit_station |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 0 | 20.223428 | transit_station |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 0 | 18.416895 | transit_station |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 0 | 20.933073 | transit_station |
transit_station = df_bus.loc[df_bus['name_poi'] == 'transit_station']
pontos_de_onibus = df_bus.loc[df_bus['name_poi'] == 'pontos_de_onibus']
print("Number of points near transit_station's: ", transit_station.shape[0])
print("Number of points close to pontos_de_onibus's: ", pontos_de_onibus.shape[0])
Number of points near transit_station's: 2846
Number of points close to pontos_de_onibus's: 2154
#The union function
df_bus = it.union_poi_bus_station(df_bus, label_poi="name_poi")
df_bus.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 0 | 0.000000 | bus_station |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 0 | 13.690153 | bus_station |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 0 | 20.223428 | bus_station |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 0 | 18.416895 | bus_station |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 0 | 20.933073 | bus_station |
#Checking
df_bus.loc[df_bus['name_poi'] == 'bus_station'].shape[0]
5000
Union of Bars and Restaurants¶
Converts “bar” and “restaurant” POIs to a single type: “bar-restaurant”
df_bar = move_df.copy()
#We create POIs with both types
indexes_br = np.linspace(0, df_bar.shape[0], 5)
br_POIs = df_bar[df_bar.index.isin(indexes_br)].copy()
br_POIs['name_poi'] = ['bar','restaurant','restaurant', 'bar']
#Result
br_POIs.head()
lat | lon | datetime | id | name_poi | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | bar |
1250 | 39.999756 | 116.322556 | 2008-10-23 23:58:02 | 1 | restaurant |
2500 | 39.979533 | 116.323162 | 2008-10-24 05:31:19 | 1 | restaurant |
3750 | 39.996251 | 116.293837 | 2008-10-25 00:40:56 | 1 | bar |
#Integration
df_bar = it.join_with_pois(df_bar, br_POIs, label_id='id', label_poi_name='name_poi')
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
#Result
df_bar.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | bar |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | bar |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | bar |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | bar |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | bar |
#Number of points close to each type
bar = df_bar.loc[df_bar['name_poi'] == 'bar']
restaurant = df_bar.loc[df_bar['name_poi'] == 'restaurant']
print("Closest type points 'bar': ", bar.shape[0])
print("Closest type points 'restaurant': ", restaurant.shape[0])
Closest type points 'bar': 2539
Closest type points 'restaurant': 2461
#Union of the two types of POIs into a single
df_bar = it.union_poi_bar_restaurant(df_bar, label_poi="name_poi")
#Result
df_bar.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | bar-restaurant |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | bar-restaurant |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | bar-restaurant |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | bar-restaurant |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | bar-restaurant |
#Checking
df_bar.loc[df_bar['name_poi'] == 'bar-restaurant'].shape[0]
5000
Union of Parks¶
Converts “pracas_e_parques” and “park” POIs to a single type: “parks”
df_parks = move_df.copy()
#We create POIs with both types
indexes_p = np.linspace(0, df_parks.shape[0], 5)
p_POIs = df_parks[df_parks.index.isin(indexes_p)].copy()
p_POIs['name_poi'] = ['pracas_e_parques','pracas_e_parques','park', 'park']
#Result
p_POIs.head()
lat | lon | datetime | id | name_poi | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | pracas_e_parques |
1250 | 39.999756 | 116.322556 | 2008-10-23 23:58:02 | 1 | pracas_e_parques |
2500 | 39.979533 | 116.323162 | 2008-10-24 05:31:19 | 1 | park |
3750 | 39.996251 | 116.293837 | 2008-10-25 00:40:56 | 1 | park |
#Integration
df_parks = it.join_with_pois(df_parks, p_POIs, label_id='id', label_poi_name='name_poi')
#Result
df_parks.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | pracas_e_parques |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | pracas_e_parques |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | pracas_e_parques |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | pracas_e_parques |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | pracas_e_parques |
#Number of points close to each type of POI
pracas_e_parques = df_parks.loc[df_parks['name_poi'] == 'pracas_e_parques']
park = df_parks.loc[df_parks['name_poi'] == 'park']
print("Number of points closest to pracas_e_parques: ", pracas_e_parques.shape[0])
print("Number of points closest to park: ", park.shape[0])
Number of points closest to pracas_e_parques: 2716
Number of points closest to park: 2284
#Union function
df_parks = it.union_poi_parks(df_parks, label_poi="name_poi")
df_parks.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | parks |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | parks |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | parks |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | parks |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | parks |
#Checking the new quantity
df_parks.loc[df_parks['name_poi'] == 'parks'].shape[0]
5000
Union of police points¶
df_police = move_df.copy()
#We create POIs with both types
indexes_pol = np.linspace(0, df_police.shape[0], 5)
pol_POIs = df_police[df_police.index.isin(indexes_pol)].copy()
pol_POIs['name_poi'] = ['distritos_policiais','police','distritos_policiais', 'distritos_policiais']
#Result
pol_POIs.head()
lat | lon | datetime | id | name_poi | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | distritos_policiais |
1250 | 39.999756 | 116.322556 | 2008-10-23 23:58:02 | 1 | police |
2500 | 39.979533 | 116.323162 | 2008-10-24 05:31:19 | 1 | distritos_policiais |
3750 | 39.996251 | 116.293837 | 2008-10-25 00:40:56 | 1 | distritos_policiais |
#Integration
df_police = it.join_with_pois(df_police, pol_POIs, label_id='id', label_poi_name='name_poi')
df_police.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | distritos_policiais |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | distritos_policiais |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | distritos_policiais |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | distritos_policiais |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | distritos_policiais |
#Quantity of points closest to each type of point
distritos_policiais = df_police.loc[df_police['name_poi'] == 'distritos_policiais']
print("Number of points closest to distritos_policiais: ", distritos_policiais.shape[0])
Number of points closest to distritos_policiais: 3420
#Union funcion
df_police = it.union_poi_police(df_police, label_poi="name_poi")
#Result
df_police.head()
lat | lon | datetime | id | id_poi | dist_poi | name_poi | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | 1 | 0.000000 | police |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | 1 | 13.690153 | police |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 1 | 20.223428 | police |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | 1 | 18.416895 | police |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | 1 | 20.933073 | police |
#Checking
df_police.loc[df_police['name_poi'] == 'police'].shape[0]
5000
12. Integração entre trajetórias e áreas coletivas¶
df_pd = pd.read_csv('geolife_sample.csv')
df_12 = df_pd[0:2000]
gdf = geopandas.GeoDataFrame(df_12, geometry=geopandas.points_from_xy(df_12.lon, df_12.lat))
gdf.head()
lat | lon | datetime | id | geometry | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | POINT (116.31924 39.98409) |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | POINT (116.31932 39.98420) |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | POINT (116.31940 39.98422) |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | POINT (116.31939 39.98421) |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | POINT (116.31942 39.98422) |
#Creating collective areas
indexes_ac = np.linspace(0, gdf.shape[0], 5)
area_c = df_12[df_12.index.isin(indexes_ac)].copy()
area_c
lat | lon | datetime | id | geometry | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | POINT (116.31924 39.98409) |
500 | 40.006436 | 116.317701 | 2008-10-23 10:53:31 | 1 | POINT (116.31770 40.00644) |
1000 | 40.014125 | 116.306159 | 2008-10-23 23:43:56 | 1 | POINT (116.30616 40.01412) |
1500 | 39.979009 | 116.326873 | 2008-10-24 00:11:29 | 1 | POINT (116.32687 39.97901) |
#Integration
gdf = it.join_collective_areas(gdf, area_c)
VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))
gdf.head()
lat | lon | datetime | id | geometry | violating | |
---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | POINT (116.31924 39.98409) | True |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | POINT (116.31932 39.98420) | False |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | POINT (116.31940 39.98422) | False |
3 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1 | POINT (116.31939 39.98421) | False |
4 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 1 | POINT (116.31942 39.98422) | False |