06 - Exploring Integrations¶

0. Required library installations¶

For the execution of one of the integration functions that will be presented here, the geopandas library needs to be installed. To obtain some data for demonstrating the functions, the omnsx library also needs to be installed

conda install geopandas osmnx

1. Imports¶

import pymove as pm
from pymove.utils import integration as it
from pymove.visualization import folium
import numpy as np
import pandas as pd
import geopandas

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

2. Load Data¶

move_df = pm.read_csv('geolife_sample.csv', nrows=5000)
move_df.head()

	lat	lon	datetime	id
0	39.984094	116.319236	2008-10-23 05:53:05	1
1	39.984198	116.319322	2008-10-23 05:53:06	1
2	39.984224	116.319402	2008-10-23 05:53:11	1
3	39.984211	116.319389	2008-10-23 05:53:16	1
4	39.984217	116.319422	2008-10-23 05:53:21	1

#Tamanho
move_df.shape[0]

Visualization¶

folium.plot_trajectories(move_df)

Make this Notebook Trusted to load map: File -> Trust Notebook

3. Loading points of interest¶

bbox = move_df.get_bbox()
folium.plot_bbox(bbox, color='blue')

Make this Notebook Trusted to load map: File -> Trust Notebook

import osmnx as ox

tags = {'amenity':True}
POIs = ox.geometries_from_bbox(north=bbox[0], south=bbox[2], east=bbox[3], west=bbox[1], tags=tags)

POIs.head()

	unique_id	osmid	element_type	amenity	fee	geometry	cuisine	name	name:en	atm	...	building:material	roof:colour	roof:material	alt_name_1	not:name	area	ways	type	name:ja	name:ko
0	node/269492188	269492188	node	toilets	no	POINT (116.26750 39.98087)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	node/274942287	274942287	node	toilets	NaN	POINT (116.27358 39.99664)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	node/276320137	276320137	node	fast_food	NaN	POINT (116.33756 39.97541)	chinese	永和大王	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	node/276320142	276320142	node	massage	NaN	POINT (116.33751 39.97546)	NaN	Footmassage 富橋	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	node/286242547	286242547	node	toilets	NaN	POINT (116.19982 40.00670)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 118 columns

Removing unrated (null) points of interest

POIs = POIs.dropna(subset=["amenity"], inplace=False)

Adapting to the format needed for integration (With labels ‘lat’ and ‘lon’ referring to latitude and longitude, respectively)

POIs = POIs[POIs['geometry'].type == 'Point']
POIs['lon'] = POIs['geometry'].x
POIs['lat'] = POIs['geometry'].y

Visualization¶

m = folium.plot_trajectories(move_df)
folium.plot_poi(POIs, slice_tags=['amenity'], base_map=m, poi_point='blue')

Make this Notebook Trusted to load map: File -> Trust Notebook

4. Integrating Points of Interest into the DataSet¶

df_4 = move_df.copy()
df_4 = it.join_with_pois(df_4, POIs, label_id='osmid', label_poi_name='name')

VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))

Result

df_4.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	5572452688	116.862844	太平洋影城(中关村店)
1	39.984198	116.319322	2008-10-23 05:53:06	1	5572452688	119.142692	太平洋影城(中关村店)
2	39.984224	116.319402	2008-10-23 05:53:11	1	5572452688	116.595117	太平洋影城(中关村店)
3	39.984211	116.319389	2008-10-23 05:53:16	1	5572452688	116.257378	太平洋影城(中关村店)
4	39.984217	116.319422	2008-10-23 05:53:21	1	5572452688	114.886759	太平洋影城(中关村店)

Point of interest closest to each point of the trajectory

df_4['name_poi'].unique()

array(['太平洋影城(中关村店)', '东亚银行', '南京银行', '星巴克', '小吊梨汤', nan, '鑫蜀源', '必胜客',
       '潜渊', '上岛咖啡', '科苑餐厅', '2nd Place', '元绿回转寿司', '中信银行', 'HSBC',
       '咖啡王（暂时停业）', '招商银行', '中国建设银行', 'Paradiso Coffee', '798 bar',
       'Jazz Cafe', 'Hundred Years Cafe', '安家小厨', '清青快餐', '听涛园', '仰望咖啡',
       'China Construction Bank', '同仁堂', '北园餐厅', '北京银行', '交通银行', '宁波银行',
       '美嘉欢乐影城', '北京101中学', '西苑医院', 'Yu Xiao Mian Noodles', '茶大爷',
       "McDonald's", 'Pizza Hut', 'Starbucks', '云海肴', '兰州老妈拉面'],
      dtype=object)

5. Integrating Points of Interest into the DataSet (Using join_with_pois_optimizer)¶

Selecting data

POIs_5 = POIs[0:10].copy()
POIs_5['type_poi'] = POIs_5['amenity']
df_5 = move_df.copy()

POIs_5['type_poi'].unique()

array(['toilets', 'fast_food', 'massage', 'waste_basket', 'cafe',
       'restaurant', 'bank'], dtype=object)

Executing the function

df_5 = it.join_with_pois(df_5, POIs_5, label_id='osmid', label_poi_name='name')

VBox(children=(HTML(value=''), IntProgress(value=0, max=10)))

df_5.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	312152376	1061.807427	永和大王
1	39.984198	116.319322	2008-10-23 05:53:06	1	312152376	1048.334810	永和大王
2	39.984224	116.319402	2008-10-23 05:53:11	1	312152376	1041.594793	永和大王
3	39.984211	116.319389	2008-10-23 05:53:16	1	312152376	1043.408891	永和大王
4	39.984217	116.319422	2008-10-23 05:53:21	1	312152376	1041.019464	永和大王

6. Integrating Points of Interest into the Category-Based DataSet¶

POIs_5

	unique_id	osmid	element_type	amenity	fee	geometry	cuisine	name	name:en	atm	...	alt_name_1	not:name	area	ways	type	name:ja	name:ko	lon	lat	type_poi
0	node/269492188	269492188	node	toilets	no	POINT (116.26750 39.98087)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.267504	39.980869	toilets
1	node/274942287	274942287	node	toilets	NaN	POINT (116.27358 39.99664)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.273579	39.996640	toilets
2	node/276320137	276320137	node	fast_food	NaN	POINT (116.33756 39.97541)	chinese	永和大王	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.337557	39.975411	fast_food
3	node/276320142	276320142	node	massage	NaN	POINT (116.33751 39.97546)	NaN	Footmassage 富橋	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.337510	39.975463	massage
4	node/286242547	286242547	node	toilets	NaN	POINT (116.19982 40.00670)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.199822	40.006700	toilets
5	node/286246121	286246121	node	waste_basket	NaN	POINT (116.20290 39.99787)	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.202902	39.997869	waste_basket
6	node/290600874	290600874	node	cafe	NaN	POINT (116.32900 39.99117)	NaN	迷你站奶茶专门店	Mini Station Milktea	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.328997	39.991167	cafe
7	node/297407376	297407376	node	restaurant	NaN	POINT (116.33981 39.97537)	NaN	沸腾渔乡	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.339810	39.975369	restaurant
8	node/297407444	297407444	node	bank	NaN	POINT (116.33826 39.97546)	NaN	招商银行	China Merchants Bank	yes	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.338260	39.975462	bank
9	node/312152376	312152376	node	restaurant	NaN	POINT (116.32766 39.99113)	NaN	永和大王	Yonghe King	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	116.327660	39.991132	restaurant

10 rows × 121 columns

df_6 = move_df.copy()
df_6 = it.join_with_pois_by_category(df_6, POIs_5, label_category='amenity', label_id='name')

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

VBox(children=(HTML(value=''), IntProgress(value=0, max=5000)))

df_6.head(10)

	lat	lon	datetime	id	id_toilets	dist_toilets	id_fast_food	dist_fast_food	id_massage	dist_massage	id_waste_basket	dist_waste_basket	id_cafe	dist_cafe	id_restaurant	dist_restaurant	id_bank	dist_bank
0	39.984094	116.319236	2008-10-23 05:53:05	1	NaN	4132.229067	永和大王	1835.502157	Footmassage 富橋	1829.070918	NaN	10028.323311	迷你站奶茶专门店	1144.603484	永和大王	1061.807427	招商银行	1883.831094
1	39.984198	116.319322	2008-10-23 05:53:06	1	NaN	4135.240296	永和大王	1835.403414	Footmassage 富橋	1828.951254	NaN	10033.797904	迷你站奶茶专门店	1131.338544	永和大王	1048.334810	招商银行	1883.466601
2	39.984224	116.319402	2008-10-23 05:53:11	1	NaN	4140.698090	永和大王	1831.182086	Footmassage 富橋	1824.720741	NaN	10040.095434	迷你站奶茶专门店	1124.395459	永和大王	1041.594793	招商银行	1879.127020
3	39.984211	116.319389	2008-10-23 05:53:16	1	NaN	4140.136625	永和大王	1831.345213	Footmassage 富橋	1824.886604	NaN	10039.220172	迷你站奶茶专门店	1126.193301	永和大王	1043.408891	招商银行	1879.325712
4	39.984217	116.319422	2008-10-23 05:53:21	1	NaN	4142.564150	永和大王	1829.326076	Footmassage 富橋	1822.864349	NaN	10041.897836	迷你站奶茶专门店	1123.692580	永和大王	1041.019464	招商银行	1877.266370
5	39.984710	116.319865	2008-10-23 05:53:23	1	NaN	4160.348133	永和大王	1827.992513	Footmassage 富橋	1821.434719	NaN	10071.059512	迷你站奶茶专门店	1058.680139	永和大王	975.127648	招商银行	1874.593280
6	39.984674	116.319810	2008-10-23 05:53:28	1	NaN	4157.187813	永和大王	1829.602658	Footmassage 富橋	1823.053098	NaN	10067.008973	迷你站奶茶专门店	1064.838599	永和大王	981.250366	招商银行	1876.325343
7	39.984623	116.319773	2008-10-23 05:53:33	1	NaN	4156.022778	永和大王	1829.027475	Footmassage 富橋	1822.486938	NaN	10064.722571	迷你站奶茶专门店	1071.002908	永和大王	987.550029	招商银行	1875.882508
8	39.984606	116.319732	2008-10-23 05:53:38	1	NaN	4153.324576	永和大王	1830.866492	Footmassage 富橋	1824.330899	NaN	10061.545473	迷你站奶茶专门店	1074.850689	永和大王	991.312815	招商银行	1877.792905
9	39.984555	116.319728	2008-10-23 05:53:43	1	NaN	4154.833968	永和大王	1827.989263	Footmassage 富橋	1821.460600	NaN	10062.044871	迷你站奶茶专门店	1078.957681	永和大王	995.702764	招商银行	1875.015873

7. Integrating events (points of interest with timestamp) to the DataSet¶

It integrates a normal dataframe with Points of interest of events, that is, in addition to the labels referring to latitude and longitude, it also has a label referring to the datetime in which the event occurred. In this example, we will assign random date and time values to some POIs to simulate an operation.

indexOfPois = np.arange(0, POIs.shape[0], POIs.shape[0]/20, dtype=np.int64)
POIs_events = POIs.iloc[indexOfPois].copy()

randomIndexOfMoveDf = np.arange(0, move_df.shape[0], move_df.shape[0]/20, dtype=np.int64)
randomMoveDfSlice = move_df.iloc[randomIndexOfMoveDf].copy()

POIs_events['datetime'] = randomMoveDfSlice['datetime'].copy()

df_7 = move_df.copy()

df_7 = it.join_with_events(
    df_7, POIs_events,
    label_date='datetime', time_window=900,
    label_event_id='osmid', label_event_type='amenity'
)

VBox(children=(HTML(value=''), IntProgress(value=0, max=20)))

df_7.head()

	lat	lon	datetime	id	osmid	dist_event	amenity
0	39.984094	116.319236	2008-10-23 05:53:05	1	269492188	4422.237186	toilets
1	39.984198	116.319322	2008-10-23 05:53:06	1	269492188	4430.488277	toilets
2	39.984224	116.319402	2008-10-23 05:53:11	1	269492188	4437.521909	toilets
3	39.984211	116.319389	2008-10-23 05:53:16	1	269492188	4436.297310	toilets
4	39.984217	116.319422	2008-10-23 05:53:21	1	269492188	4439.154806	toilets

8. Integration with Point of Interest HOME¶

The Home type contains, in addition to latitude, longitude and id, the address and city labels.

Creating a home point

df_8 = move_df.copy()
home_df = df_8.iloc[300:302].copy()
home_df['formatted_address'] = ['Rua1, n02', 'Rua2, n03']
home_df['city'] = ['ChinaTown', 'ChinaTown']

Using the function

df_8 = it.join_with_home_by_id(df_8, home_df, label_id='id')

VBox(children=(HTML(value=''), IntProgress(value=0, max=1)))

df_8.head()

	id	lat	lon	datetime	dist_home	home	city
0	1	39.984094	116.319236	2008-10-23 05:53:05	1031.348370	Rua1, n02	ChinaTown
1	1	39.984198	116.319322	2008-10-23 05:53:06	1017.690147	Rua1, n02	ChinaTown
2	1	39.984224	116.319402	2008-10-23 05:53:11	1011.332141	Rua1, n02	ChinaTown
3	1	39.984211	116.319389	2008-10-23 05:53:16	1013.152700	Rua1, n02	ChinaTown
4	1	39.984217	116.319422	2008-10-23 05:53:21	1010.959220	Rua1, n02	ChinaTown

9. Merge of HOME with DataSet already integrated with POIs¶

Integration

df_9 = it.join_with_pois(df_8, POIs, label_id='osmid', label_poi_name='name')

VBox(children=(HTML(value=''), IntProgress(value=0, max=746)))

df_9 = it.merge_home_with_poi(df_9)

df_9.head()

	id	lat	lon	datetime	city	id_poi	dist_poi	name_poi
0	1	39.984094	116.319236	2008-10-23 05:53:05	ChinaTown	5572452688	116.862844	太平洋影城(中关村店)
1	1	39.984198	116.319322	2008-10-23 05:53:06	ChinaTown	5572452688	119.142692	太平洋影城(中关村店)
2	1	39.984224	116.319402	2008-10-23 05:53:11	ChinaTown	5572452688	116.595117	太平洋影城(中关村店)
3	1	39.984211	116.319389	2008-10-23 05:53:16	ChinaTown	5572452688	116.257378	太平洋影城(中关村店)
4	1	39.984217	116.319422	2008-10-23 05:53:21	ChinaTown	5572452688	114.886759	太平洋影城(中关村店)

11. Union functions¶

They have the purpose of joining several types of POI that mean the same thing, or similar things, in a single type of POI

Union of Banks¶

Converts POIs of the types “bank_filials”, “bank_agencies”, “bank_posts”, “bank_PAE” and “bank” to a single type: “banks”

df_banks = move_df.copy()

#We create POIs with different type_poi that describe different types of banks to test
indexes_bp = np.linspace(0, df_banks.shape[0], 6)
banks_pois = df_banks[df_banks.index.isin(indexes_bp)].copy()
banks_pois['id'] = [0,1,2,3,4]
banks_pois['type_poi'] = ['bancos_filiais', 'bancos_agencias', 'bancos_postos', 'bancos_PAE', 'bank']

banks_pois.head()

	lat	lon	datetime	id	type_poi
0	39.984094	116.319236	2008-10-23 05:53:05	0	bancos_filiais
1000	40.014125	116.306159	2008-10-23 23:43:56	1	bancos_agencias
2000	39.979558	116.312653	2008-10-24 03:26:10	2	bancos_postos
3000	39.979370	116.320649	2008-10-24 06:31:04	3	bancos_PAE
4000	40.003274	116.267484	2008-10-25 00:54:34	4	bank

#Join with POIs
df_banks = it.join_with_pois(df_banks, banks_pois, label_id='id', label_poi_name='type_poi')

VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))

#Result
df_banks.head(10)

	lat	lon	datetime	id	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	0.000000	bancos_filiais
1	39.984198	116.319322	2008-10-23 05:53:06	1	13.690153	bancos_filiais
2	39.984224	116.319402	2008-10-23 05:53:11	1	20.223428	bancos_filiais
3	39.984211	116.319389	2008-10-23 05:53:16	1	18.416895	bancos_filiais
4	39.984217	116.319422	2008-10-23 05:53:21	1	20.933073	bancos_filiais
5	39.984710	116.319865	2008-10-23 05:53:23	1	86.969343	bancos_filiais
6	39.984674	116.319810	2008-10-23 05:53:28	1	80.938365	bancos_filiais
7	39.984623	116.319773	2008-10-23 05:53:33	1	74.520547	bancos_filiais
8	39.984606	116.319732	2008-10-23 05:53:38	1	70.901768	bancos_filiais
9	39.984555	116.319728	2008-10-23 05:53:43	1	66.217975	bancos_filiais

#Checking the amount of each point assigned to each type of poi
bancos_filiais = df_banks.loc[df_banks['name_poi'] == 'bancos_filiais']
bancos_agencias = df_banks.loc[df_banks['name_poi'] == 'bancos_agencias']
bancos_postos = df_banks.loc[df_banks['name_poi'] == 'bancos_postos']
bancos_PAE = df_banks.loc[df_banks['name_poi'] == 'bancos_PA']
bank = df_banks.loc[df_banks['name_poi'] == 'bank']

print("Number of points close to each bank definition")
print("bancos_filiais: ", bancos_filiais.shape[0])
print("bancos_agencias: ", bancos_agencias.shape[0])
print("bancos_postos: ", bancos_postos.shape[0])
print("bancos_PAE: ", bancos_PAE.shape[0])
print("bank: ", bank.shape[0])

Number of points close to each bank definition
bancos_filiais:  579
bancos_agencias:  1407
bancos_postos:  916
bancos_PAE:  0
bank:  1238

#Finally, the Union
df_banks = it.union_poi_bank(df_banks, label_poi="name_poi")

#Result
df_banks.head()

	lat	lon	datetime	id	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	0.000000	banks
1	39.984198	116.319322	2008-10-23 05:53:06	1	13.690153	banks
2	39.984224	116.319402	2008-10-23 05:53:11	1	20.223428	banks
3	39.984211	116.319389	2008-10-23 05:53:16	1	18.416895	banks
4	39.984217	116.319422	2008-10-23 05:53:21	1	20.933073	banks

#Checking
df_banks.loc[df_banks['name_poi'] == 'banks'].shape[0]

Union of Bus Stations¶

Converts “transit_station” and “bus_points” POIs to a single type: “bus_station”

df_bus = move_df.copy()


#We create POIs with different name_poi that describe different types of bus stops to test
indexes_bp = np.linspace(0, df_bus.shape[0], 6)
bus_pois = df_bus[df_bus.index.isin(indexes_bp)].copy()
bus_pois['id'] = [0,1,2,3,4]
bus_pois['name_poi'] = ['transit_station', 'transit_station', 'pontos_de_onibus', 'transit_station', 'pontos_de_onibus']

#Result
bus_pois.head()

	lat	lon	datetime	id	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	0	transit_station
1000	40.014125	116.306159	2008-10-23 23:43:56	1	transit_station
2000	39.979558	116.312653	2008-10-24 03:26:10	2	pontos_de_onibus
3000	39.979370	116.320649	2008-10-24 06:31:04	3	transit_station
4000	40.003274	116.267484	2008-10-25 00:54:34	4	pontos_de_onibus

#Integration
df_bus = it.join_with_pois(df_bus, bus_pois, label_id='id', label_poi_name='name_poi')

VBox(children=(HTML(value=''), IntProgress(value=0, max=5)))

#Result
df_bus.head()

	lat	lon	datetime	id	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	0.000000	transit_station
1	39.984198	116.319322	2008-10-23 05:53:06	1	13.690153	transit_station
2	39.984224	116.319402	2008-10-23 05:53:11	1	20.223428	transit_station
3	39.984211	116.319389	2008-10-23 05:53:16	1	18.416895	transit_station
4	39.984217	116.319422	2008-10-23 05:53:21	1	20.933073	transit_station

transit_station = df_bus.loc[df_bus['name_poi'] == 'transit_station']
pontos_de_onibus = df_bus.loc[df_bus['name_poi'] == 'pontos_de_onibus']

print("Number of points near transit_station's: ", transit_station.shape[0])
print("Number of points close to pontos_de_onibus's: ", pontos_de_onibus.shape[0])

Number of points near transit_station's:  2846
Number of points close to pontos_de_onibus's:  2154

#The union function
df_bus = it.union_poi_bus_station(df_bus, label_poi="name_poi")

df_bus.head()

	lat	lon	datetime	id	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	0.000000	bus_station
1	39.984198	116.319322	2008-10-23 05:53:06	1	13.690153	bus_station
2	39.984224	116.319402	2008-10-23 05:53:11	1	20.223428	bus_station
3	39.984211	116.319389	2008-10-23 05:53:16	1	18.416895	bus_station
4	39.984217	116.319422	2008-10-23 05:53:21	1	20.933073	bus_station

#Checking

df_bus.loc[df_bus['name_poi'] == 'bus_station'].shape[0]

Union of Bars and Restaurants¶

Converts “bar” and “restaurant” POIs to a single type: “bar-restaurant”

df_bar = move_df.copy()

#We create POIs with both types
indexes_br = np.linspace(0, df_bar.shape[0], 5)
br_POIs = df_bar[df_bar.index.isin(indexes_br)].copy()
br_POIs['name_poi'] = ['bar','restaurant','restaurant', 'bar']

#Result
br_POIs.head()

	lat	lon	datetime	id	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	bar
1250	39.999756	116.322556	2008-10-23 23:58:02	1	restaurant
2500	39.979533	116.323162	2008-10-24 05:31:19	1	restaurant
3750	39.996251	116.293837	2008-10-25 00:40:56	1	bar

#Integration
df_bar = it.join_with_pois(df_bar, br_POIs, label_id='id', label_poi_name='name_poi')

VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))

#Result
df_bar.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	bar
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	bar
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	bar
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	bar
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	bar

#Number of points close to each type
bar = df_bar.loc[df_bar['name_poi'] == 'bar']
restaurant = df_bar.loc[df_bar['name_poi'] == 'restaurant']

print("Closest type points 'bar': ", bar.shape[0])
print("Closest type points 'restaurant': ", restaurant.shape[0])

Closest type points 'bar':  2539
Closest type points 'restaurant':  2461

#Union of the two types of POIs into a single
df_bar = it.union_poi_bar_restaurant(df_bar, label_poi="name_poi")

#Result
df_bar.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	bar-restaurant
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	bar-restaurant
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	bar-restaurant
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	bar-restaurant
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	bar-restaurant

#Checking
df_bar.loc[df_bar['name_poi'] == 'bar-restaurant'].shape[0]

Union of Parks¶

Converts “pracas_e_parques” and “park” POIs to a single type: “parks”

df_parks = move_df.copy()

#We create POIs with both types
indexes_p = np.linspace(0, df_parks.shape[0], 5)
p_POIs = df_parks[df_parks.index.isin(indexes_p)].copy()
p_POIs['name_poi'] = ['pracas_e_parques','pracas_e_parques','park', 'park']

#Result
p_POIs.head()

	lat	lon	datetime	id	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	pracas_e_parques
1250	39.999756	116.322556	2008-10-23 23:58:02	1	pracas_e_parques
2500	39.979533	116.323162	2008-10-24 05:31:19	1	park
3750	39.996251	116.293837	2008-10-25 00:40:56	1	park

#Integration
df_parks = it.join_with_pois(df_parks, p_POIs, label_id='id', label_poi_name='name_poi')

#Result
df_parks.head()

VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	pracas_e_parques
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	pracas_e_parques
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	pracas_e_parques
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	pracas_e_parques
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	pracas_e_parques

#Number of points close to each type of POI
pracas_e_parques = df_parks.loc[df_parks['name_poi'] == 'pracas_e_parques']
park = df_parks.loc[df_parks['name_poi'] == 'park']

print("Number of points closest to pracas_e_parques: ", pracas_e_parques.shape[0])
print("Number of points closest to park: ", park.shape[0])

Number of points closest to pracas_e_parques:  2716
Number of points closest to park:  2284

#Union function
df_parks = it.union_poi_parks(df_parks, label_poi="name_poi")

df_parks.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	parks
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	parks
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	parks
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	parks
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	parks

#Checking the new quantity
df_parks.loc[df_parks['name_poi'] == 'parks'].shape[0]

Union of police points¶

df_police = move_df.copy()

#We create POIs with both types
indexes_pol = np.linspace(0, df_police.shape[0], 5)
pol_POIs = df_police[df_police.index.isin(indexes_pol)].copy()
pol_POIs['name_poi'] = ['distritos_policiais','police','distritos_policiais', 'distritos_policiais']

#Result
pol_POIs.head()

	lat	lon	datetime	id	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	distritos_policiais
1250	39.999756	116.322556	2008-10-23 23:58:02	1	police
2500	39.979533	116.323162	2008-10-24 05:31:19	1	distritos_policiais
3750	39.996251	116.293837	2008-10-25 00:40:56	1	distritos_policiais

#Integration
df_police = it.join_with_pois(df_police, pol_POIs, label_id='id', label_poi_name='name_poi')

df_police.head()

VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	distritos_policiais
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	distritos_policiais
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	distritos_policiais
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	distritos_policiais
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	distritos_policiais

#Quantity of points closest to each type of point
distritos_policiais = df_police.loc[df_police['name_poi'] == 'distritos_policiais']

print("Number of points closest to distritos_policiais: ", distritos_policiais.shape[0])

Number of points closest to distritos_policiais:  3420

#Union funcion
df_police = it.union_poi_police(df_police, label_poi="name_poi")

#Result
df_police.head()

	lat	lon	datetime	id	id_poi	dist_poi	name_poi
0	39.984094	116.319236	2008-10-23 05:53:05	1	1	0.000000	police
1	39.984198	116.319322	2008-10-23 05:53:06	1	1	13.690153	police
2	39.984224	116.319402	2008-10-23 05:53:11	1	1	20.223428	police
3	39.984211	116.319389	2008-10-23 05:53:16	1	1	18.416895	police
4	39.984217	116.319422	2008-10-23 05:53:21	1	1	20.933073	police

#Checking
df_police.loc[df_police['name_poi'] == 'police'].shape[0]

12. Integração entre trajetórias e áreas coletivas¶

df_pd = pd.read_csv('geolife_sample.csv')
df_12 = df_pd[0:2000]
gdf = geopandas.GeoDataFrame(df_12, geometry=geopandas.points_from_xy(df_12.lon, df_12.lat))
gdf.head()

	lat	lon	datetime	id	geometry
0	39.984094	116.319236	2008-10-23 05:53:05	1	POINT (116.31924 39.98409)
1	39.984198	116.319322	2008-10-23 05:53:06	1	POINT (116.31932 39.98420)
2	39.984224	116.319402	2008-10-23 05:53:11	1	POINT (116.31940 39.98422)
3	39.984211	116.319389	2008-10-23 05:53:16	1	POINT (116.31939 39.98421)
4	39.984217	116.319422	2008-10-23 05:53:21	1	POINT (116.31942 39.98422)

#Creating collective areas
indexes_ac = np.linspace(0, gdf.shape[0], 5)
area_c = df_12[df_12.index.isin(indexes_ac)].copy()
area_c

	lat	lon	datetime	id	geometry
0	39.984094	116.319236	2008-10-23 05:53:05	1	POINT (116.31924 39.98409)
500	40.006436	116.317701	2008-10-23 10:53:31	1	POINT (116.31770 40.00644)
1000	40.014125	116.306159	2008-10-23 23:43:56	1	POINT (116.30616 40.01412)
1500	39.979009	116.326873	2008-10-24 00:11:29	1	POINT (116.32687 39.97901)

#Integration
gdf = it.join_collective_areas(gdf, area_c)

VBox(children=(HTML(value=''), IntProgress(value=0, max=4)))

gdf.head()

	lat	lon	datetime	id	geometry	violating
0	39.984094	116.319236	2008-10-23 05:53:05	1	POINT (116.31924 39.98409)	True
1	39.984198	116.319322	2008-10-23 05:53:06	1	POINT (116.31932 39.98420)	False
2	39.984224	116.319402	2008-10-23 05:53:11	1	POINT (116.31940 39.98422)	False
3	39.984211	116.319389	2008-10-23 05:53:16	1	POINT (116.31939 39.98421)	False
4	39.984217	116.319422	2008-10-23 05:53:21	1	POINT (116.31942 39.98422)	False