05 - Exploring Utils

Falar sobre para se trabalhar com trajetórias pode ser necessária algumas c onversões envolvendo tempo e data, distância e etc, fora outros utilitários.

Falar dos módulos presentes no pacote utils - constants - conversions - datetime - distances - math - trajectories - log - mem

Imports

import pymove.utils as utils
import pymove as pm
import datetime

Conversions

To transform latitude degree to meters, you can use function lat_meters. For example, you can convert Fortaleza’s latitude -3.8162973555:

utils.conversions.lat_meters(-3.8162973555)
110826.6722516857

To concatenates list elements, joining them by the separator specified by the parameter “delimiter”, you can use list_to_str

utils.conversions.list_to_str(["a", "b", "c", "d"], "-")
'a-b-c-d'

To concatenates the elements of the list, joining them by “,”, , you can use list_to_csv_str

utils.conversions.list_to_csv_str(["a", "b", "c", "d"])
'a,b,c,d'

To concatenates list elements in consecutive element pairs, you can use list_to_svm_line

utils.conversions.list_to_svm_line(["a", "b", "c", "d"])
'a 1:b 2:c 3:d'

To convert longitude to X EPSG:3857 WGS 84/Pseudo-Mercator, you can use lon_to_x_spherical

utils.conversions.lon_to_x_spherical(-38.501597)
-4285978.172767829

To convert latitude to Y EPSG:3857 WGS 84/Pseudo-Mercator, you can use lat_to_y_spherical

utils.conversions.lat_to_y_spherical(-3.797864)
-423086.2213610324

To convert X EPSG:3857 WGS 84/Pseudo-Mercator to longitude, you can use x_to_lon_spherical

utils.conversions.x_to_lon_spherical(-4285978.172767829)
-38.501597000000004

To convert Y EPSG:3857 WGS 84/Pseudo-Mercator to latitude, you can use y_to_lat_spherical

utils.conversions.y_to_lat_spherical(-423086.2213610324)
-3.7978639999999944
move_data = pm.read_csv("geolife_sample.csv")
move_data.generate_dist_time_speed_features()
move_data.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 0.577934

To convert values, in ms, in label_speed column to kmh, you can use ms_to_kmh

utils.conversions.ms_to_kmh(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 49.284551
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 5.330727
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 1.311180
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 2.080563

To convert values, in kmh, in label_speed column to ms, you can use kmh_to_ms

utils.conversions.kmh_to_ms(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 0.577934

To convert values, in meters, in label_distance column to kilometer, you can use meters_to_kilometers

utils.conversions.meters_to_kilometers(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 0.013690 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 0.007404 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 0.001821 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 0.002890 5.0 0.577934

To convert values, in kilometers, in label_distance column to meters, you can use kilometers_to_meters

utils.conversions.kilometers_to_meters(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 0.577934

To convert values, in seconds, in label_distance column to minutes, you can use seconds_to_minutes

utils.conversions.seconds_to_minutes(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 0.016667 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 0.083333 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 0.083333 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 0.083333 0.577934

To convert values, in minutes, in label_distance column to seconds, you can use minute_to_seconds

utils.conversions.minute_to_seconds(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 0.577934

To convert in minutes, in label_distance column to hours, you can use minute_to_hours

utils.conversions.seconds_to_minutes(move_data, inplace=True)
utils.conversions.minute_to_hours(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 0.000278 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 0.001389 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 0.001389 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 0.001389 0.577934

To convert in hours, in label_distance column to minute, you can use hours_to_minutes

utils.conversions.hours_to_minute(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 0.016667 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 0.083333 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 0.083333 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 0.083333 0.577934

To convert in seconds, in label_distance column to hours, you can use seconds_to_hours

utils.conversions.minute_to_seconds(move_data, inplace=True)
utils.conversions.seconds_to_hours(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 0.000278 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 0.001389 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 0.001389 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 0.001389 0.577934

To convert in seconds, in label_distance column to hours, you can use hours_to_seconds

utils.conversions.hours_to_seconds(move_data, inplace=True)
move_data.head()
id lat lon datetime dist_to_prev time_to_prev speed_to_prev
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN NaN NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 1.0 13.690153
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 5.0 1.480758
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 5.0 0.364217
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 5.0 0.577934

Datetime

To converts a datetime in string“s format”%Y-%m-%d” or “%Y-%m-%d %H:%M:%S” to datetime”s format, you can use str_to_datetime.

utils.datetime.str_to_datetime('2018-06-29 08:15:27')
datetime.datetime(2018, 6, 29, 8, 15, 27)

To get date, in string’s format, from timestamp, you can use date_to_str.

utils.datetime.date_to_str(utils.datetime.str_to_datetime('2018-06-29 08:15:27'))
'2018-06-29'

To converts a datetime to an int representation in minutes, you can use to_min.

utils.datetime.datetime_to_min(datetime.datetime(2018, 6, 29, 8, 15, 27))
25504335

To do the reverse use: min_to_datetime

utils.datetime.min_to_datetime(25504335)
datetime.datetime(2018, 6, 29, 8, 15)

To get day of week of a date, you can use to_day_of_week_int, where 0 represents Monday and 6 is Sunday.

utils.datetime.to_day_of_week_int(datetime.datetime(2018, 6, 29, 8, 15, 27))
4

To indices if a day specified by the user is a working day, you can use working_day.

utils.datetime.working_day(datetime.datetime(2018, 6, 29, 8, 15, 27), country='BR')
True
utils.datetime.working_day(datetime.datetime(2018, 4, 21, 8, 15, 27), country='BR')
False

To get datetime of now, you can use now_str.

utils.datetime.now_str()
'2021-07-13 19:56:01'

To convert time in a format appropriate of time, you can use deltatime_str.

utils.datetime.deltatime_str(1082.7180936336517)
'18m:02.72s'

To converts a local datetime to a POSIX timestamp in milliseconds, you can use timestamp_to_millis.

utils.datetime.timestamp_to_millis("2015-12-12 08:00:00.123000")
1449907200123

To converts milliseconds to timestamp, you can use millis_to_timestamp.

utils.datetime.millis_to_timestamp(1449907200123)
Timestamp('2015-12-12 08:00:00.123000')

To get time, in string’s format, from timestamp, you can use time_to_str.

utils.datetime.time_to_str(datetime.datetime(2018, 6, 29, 8, 15, 27))
'08:15:27'

To converts a time in string’s format “%H:%M:%S” to datetime’s format, you can use str_to_time.

utils.datetime.str_to_time("08:00:00")
datetime.datetime(1900, 1, 1, 8, 0)

To computes the elapsed time from a specific start time to the moment the function is called, you can use elapsed_time_dt.

utils.datetime.elapsed_time_dt(utils.datetime.str_to_time("08:00:00"))
3835166163375

To computes the elapsed time from the start time to the end time specifed by the user, you can use diff_time.

utils.datetime.diff_time(utils.datetime.str_to_time("08:00:00"), utils.datetime.str_to_time("12:00:00"))
14400000

Distances

To calculate the great circle distance between two points on the earth, you can use haversine.

utils.distances.haversine(-3.797864,-38.501597,-3.797890, -38.501681)
9.757976024363016

To calculate the euclidean distance between two points on the earth, you can use euclidean_distance_in_meters.

utils.distances.euclidean_distance_in_meters(-3.797864,-38.501597,-3.797890, -38.501681)
9.790407710249447

Math

To compute standard deviation, you can use std.

utils.math.std([600, 20, 5])
277.0178494048513

To compute the average of standard deviation, you can use avg_std.

utils.math.avg_std([600, 20, 5])
(208.33333333333334, 277.0178494048513)

To compute the standard deviation of sample, you can use std_sample.

utils.math.std_sample([600, 20, 5])
339.27619034251916

To compute the average of standard deviation of sample, you can use avg_std_sample.

utils.math.avg_std_sample([600, 20, 5])
(208.33333333333334, 339.27619034251916)

To computes the sum of the elements of the array, you can use array_sum.

To computes the sum of all the elements in the array, the sum of the square of each element and the number of elements of the array, you can use array_stats.

utils.math.array_stats([600, 20, 5])
(625.0, 360425.0, 3)

To perfomers interpolation and extrapolation, you can use interpolation.

utils.math.interpolation(15, 20, 65, 86, 5)
6.799999999999999

Trajectories

To read a csv file into a MoveDataFrame

move_data = utils.trajectories.read_csv('geolife_sample.csv')
type(move_data)
pymove.core.pandas.PandasMoveDataFrame

To invert the keys values of a dictionary

utils.trajectories.invert_dict({1: 'a', 2: 'b'})
{'a': 1, 'b': 2}

To flatten a nested dictionary

utils.trajectories.flatten_dict({'1': 'a', '2': {'3': 'b', '4': 'c'}})
{'1': 'a', '2_3': 'b', '2_4': 'c'}

To flatten a dataframe with dict as row values

df = move_data.head(3)
df['dict_column'] = [{'a': 1}, {'b': 2}, {'c': 3}]
df
lat lon datetime id dict_column
0 39.984094 116.319236 2008-10-23 05:53:05 1 {'a': 1}
1 39.984198 116.319322 2008-10-23 05:53:06 1 {'b': 2}
2 39.984224 116.319402 2008-10-23 05:53:11 1 {'c': 3}
utils.trajectories.flatten_columns(df, columns='dict_column')
lat lon datetime id dict_column_c dict_column_b dict_column_a
0 39.984094 116.319236 2008-10-23 05:53:05 1 NaN NaN 1.0
1 39.984198 116.319322 2008-10-23 05:53:06 1 NaN 2.0 NaN
2 39.984224 116.319402 2008-10-23 05:53:11 1 3.0 NaN NaN

To shift a sequence

utils.trajectories.shift([1., 2., 3., 4.], 1)
array([nan,  1.,  2.,  3.])

To fill a sequence with values from another

l1 = ['a', 'b', 'c', 'd', 'e']
utils.trajectories.fill_list_with_new_values(l1, [1, 2, 3])
l1
[1, 2, 3, 'd', 'e']

To transform a string representation back into a list

utils.trajectories.object_for_array('[1,2,3,4,5]')
array([1., 2., 3., 4., 5.], dtype=float32)

To convert a column with string representation back into a list

df['list_column'] = ['[1,2]', '[3,4]', '[5,6]']
df
lat lon datetime id dict_column list_column
0 39.984094 116.319236 2008-10-23 05:53:05 1 {'a': 1} [1,2]
1 39.984198 116.319322 2008-10-23 05:53:06 1 {'b': 2} [3,4]
2 39.984224 116.319402 2008-10-23 05:53:11 1 {'c': 3} [5,6]
utils.trajectories.column_to_array(df, column='list_column')
lat lon datetime id dict_column list_column
0 39.984094 116.319236 2008-10-23 05:53:05 1 {'a': 1} [1.0, 2.0]
1 39.984198 116.319322 2008-10-23 05:53:06 1 {'b': 2} [3.0, 4.0]
2 39.984224 116.319402 2008-10-23 05:53:11 1 {'c': 3} [5.0, 6.0]

Log

mdf = pm.read_csv('geolife_sample.csv')

To cotrol the verbosity of pymove functions, use the logger

To change verbosity use the utils.log.set_verbosity method, or create and environment variable named PYMOVE_VERBOSITY

By default, the berbosity level is set to INFO

utils.log.logger
<Logger pymove (INFO)>

INFO shows only useful information, like progress bars

mdf.generate_dist_features(inplace=False).head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id lat lon datetime dist_to_prev dist_to_next dist_prev_to_next
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN 13.690153 NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 7.403788 20.223428
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 1.821083 5.888579
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 2.889671 1.873356
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 66.555997 68.727260

DEBUG shows information from various steps in the functions

utils.log.set_verbosity('DEBUG')
mdf.generate_dist_features(inplace=False).head()
...Sorting by id and datetime to increase performance

...Set id as index to a higher performance


Creating or updating distance features in meters...
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id lat lon datetime dist_to_prev dist_to_next dist_prev_to_next
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN 13.690153 NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 7.403788 20.223428
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 1.821083 5.888579
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 2.889671 1.873356
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 66.555997 68.727260

WARN hides all output except warnings and errors

utils.log.set_verbosity('WARN')
mdf.generate_dist_features(inplace=False).head()
id lat lon datetime dist_to_prev dist_to_next dist_prev_to_next
0 1 39.984094 116.319236 2008-10-23 05:53:05 NaN 13.690153 NaN
1 1 39.984198 116.319322 2008-10-23 05:53:06 13.690153 7.403788 20.223428
2 1 39.984224 116.319402 2008-10-23 05:53:11 7.403788 1.821083 5.888579
3 1 39.984211 116.319389 2008-10-23 05:53:16 1.821083 2.889671 1.873356
4 1 39.984217 116.319422 2008-10-23 05:53:21 2.889671 66.555997 68.727260

Mem

utils.log.set_verbosity('INFO')

Calculate size of variable

utils.mem.total_size(mdf, verbose=True)
Size in bytes: 6965040, Type: <class 'pymove.core.pandas.PandasMoveDataFrame'>
6965040

Reduce size of dataframe

utils.mem.reduce_mem_usage_automatic(mdf)
Memory usage of dataframe is 6.64 MB
Memory usage after optimization is: 2.70 MB
Decreased by 59.4 %

Create a dataframe with the variables with largest memory footpring

lst = [*range(10000)]
utils.mem.top_mem_vars(globals())
var mem
0 move_data 6.6 MiB
1 mdf 2.7 MiB
2 lst 88.0 KiB
3 Out 2.2 KiB
4 df 1.1 KiB
5 In 648.0 B
6 l1 96.0 B
7 matplotlib 72.0 B
8 sys 72.0 B
9 os 72.0 B