05 - Exploring Utils¶
Falar sobre para se trabalhar com trajetórias pode ser necessária algumas c onversões envolvendo tempo e data, distância e etc, fora outros utilitários.
Falar dos módulos presentes no pacote utils - constants - conversions - datetime - distances - math - trajectories - log - mem
Imports¶
import pymove.utils as utils
import pymove as pm
import datetime
Conversions¶
To transform latitude degree to meters, you can use function lat_meters. For example, you can convert Fortaleza’s latitude -3.8162973555:
utils.conversions.lat_meters(-3.8162973555)
110826.6722516857
To concatenates list elements, joining them by the separator specified by the parameter “delimiter”, you can use list_to_str
utils.conversions.list_to_str(["a", "b", "c", "d"], "-")
'a-b-c-d'
To concatenates the elements of the list, joining them by “,”, , you can use list_to_csv_str
utils.conversions.list_to_csv_str(["a", "b", "c", "d"])
'a,b,c,d'
To concatenates list elements in consecutive element pairs, you can use list_to_svm_line
utils.conversions.list_to_svm_line(["a", "b", "c", "d"])
'a 1:b 2:c 3:d'
To convert longitude to X EPSG:3857 WGS 84/Pseudo-Mercator, you can use lon_to_x_spherical
utils.conversions.lon_to_x_spherical(-38.501597)
-4285978.172767829
To convert latitude to Y EPSG:3857 WGS 84/Pseudo-Mercator, you can use lat_to_y_spherical
utils.conversions.lat_to_y_spherical(-3.797864)
-423086.2213610324
To convert X EPSG:3857 WGS 84/Pseudo-Mercator to longitude, you can use x_to_lon_spherical
utils.conversions.x_to_lon_spherical(-4285978.172767829)
-38.501597000000004
To convert Y EPSG:3857 WGS 84/Pseudo-Mercator to latitude, you can use y_to_lat_spherical
utils.conversions.y_to_lat_spherical(-423086.2213610324)
-3.7978639999999944
move_data = pm.read_csv("geolife_sample.csv")
move_data.generate_dist_time_speed_features()
move_data.head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 0.577934 |
To convert values, in ms, in label_speed column to kmh, you can use ms_to_kmh
utils.conversions.ms_to_kmh(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 49.284551 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 5.330727 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 1.311180 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 2.080563 |
To convert values, in kmh, in label_speed column to ms, you can use kmh_to_ms
utils.conversions.kmh_to_ms(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 0.577934 |
To convert values, in meters, in label_distance column to kilometer, you can use meters_to_kilometers
utils.conversions.meters_to_kilometers(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 0.013690 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 0.007404 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 0.001821 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 0.002890 | 5.0 | 0.577934 |
To convert values, in kilometers, in label_distance column to meters, you can use kilometers_to_meters
utils.conversions.kilometers_to_meters(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 0.577934 |
To convert values, in seconds, in label_distance column to minutes, you can use seconds_to_minutes
utils.conversions.seconds_to_minutes(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 0.016667 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 0.083333 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 0.083333 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 0.083333 | 0.577934 |
To convert values, in minutes, in label_distance column to seconds, you can use minute_to_seconds
utils.conversions.minute_to_seconds(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 0.577934 |
To convert in minutes, in label_distance column to hours, you can use minute_to_hours
utils.conversions.seconds_to_minutes(move_data, inplace=True)
utils.conversions.minute_to_hours(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 0.000278 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 0.001389 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 0.001389 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 0.001389 | 0.577934 |
To convert in hours, in label_distance column to minute, you can use hours_to_minutes
utils.conversions.hours_to_minute(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 0.016667 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 0.083333 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 0.083333 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 0.083333 | 0.577934 |
To convert in seconds, in label_distance column to hours, you can use seconds_to_hours
utils.conversions.minute_to_seconds(move_data, inplace=True)
utils.conversions.seconds_to_hours(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 0.000278 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 0.001389 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 0.001389 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 0.001389 | 0.577934 |
To convert in seconds, in label_distance column to hours, you can use hours_to_seconds
utils.conversions.hours_to_seconds(move_data, inplace=True)
move_data.head()
id | lat | lon | datetime | dist_to_prev | time_to_prev | speed_to_prev | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | NaN | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 1.0 | 13.690153 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 5.0 | 1.480758 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 5.0 | 0.364217 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 5.0 | 0.577934 |
Datetime¶
To converts a datetime in string“s format”%Y-%m-%d” or “%Y-%m-%d %H:%M:%S” to datetime”s format, you can use str_to_datetime.
utils.datetime.str_to_datetime('2018-06-29 08:15:27')
datetime.datetime(2018, 6, 29, 8, 15, 27)
To get date, in string’s format, from timestamp, you can use date_to_str.
utils.datetime.date_to_str(utils.datetime.str_to_datetime('2018-06-29 08:15:27'))
'2018-06-29'
To converts a datetime to an int representation in minutes, you can use to_min.
utils.datetime.datetime_to_min(datetime.datetime(2018, 6, 29, 8, 15, 27))
25504335
To do the reverse use: min_to_datetime
utils.datetime.min_to_datetime(25504335)
datetime.datetime(2018, 6, 29, 8, 15)
To get day of week of a date, you can use to_day_of_week_int, where 0 represents Monday and 6 is Sunday.
utils.datetime.to_day_of_week_int(datetime.datetime(2018, 6, 29, 8, 15, 27))
4
To indices if a day specified by the user is a working day, you can use working_day.
utils.datetime.working_day(datetime.datetime(2018, 6, 29, 8, 15, 27), country='BR')
True
utils.datetime.working_day(datetime.datetime(2018, 4, 21, 8, 15, 27), country='BR')
False
To get datetime of now, you can use now_str.
utils.datetime.now_str()
'2021-07-13 19:56:01'
To convert time in a format appropriate of time, you can use deltatime_str.
utils.datetime.deltatime_str(1082.7180936336517)
'18m:02.72s'
To converts a local datetime to a POSIX timestamp in milliseconds, you can use timestamp_to_millis.
utils.datetime.timestamp_to_millis("2015-12-12 08:00:00.123000")
1449907200123
To converts milliseconds to timestamp, you can use millis_to_timestamp.
utils.datetime.millis_to_timestamp(1449907200123)
Timestamp('2015-12-12 08:00:00.123000')
To get time, in string’s format, from timestamp, you can use time_to_str.
utils.datetime.time_to_str(datetime.datetime(2018, 6, 29, 8, 15, 27))
'08:15:27'
To converts a time in string’s format “%H:%M:%S” to datetime’s format, you can use str_to_time.
utils.datetime.str_to_time("08:00:00")
datetime.datetime(1900, 1, 1, 8, 0)
To computes the elapsed time from a specific start time to the moment the function is called, you can use elapsed_time_dt.
utils.datetime.elapsed_time_dt(utils.datetime.str_to_time("08:00:00"))
3835166163375
To computes the elapsed time from the start time to the end time specifed by the user, you can use diff_time.
utils.datetime.diff_time(utils.datetime.str_to_time("08:00:00"), utils.datetime.str_to_time("12:00:00"))
14400000
Distances¶
To calculate the great circle distance between two points on the earth, you can use haversine.
utils.distances.haversine(-3.797864,-38.501597,-3.797890, -38.501681)
9.757976024363016
To calculate the euclidean distance between two points on the earth, you can use euclidean_distance_in_meters.
utils.distances.euclidean_distance_in_meters(-3.797864,-38.501597,-3.797890, -38.501681)
9.790407710249447
Math¶
To compute standard deviation, you can use std.
utils.math.std([600, 20, 5])
277.0178494048513
To compute the average of standard deviation, you can use avg_std.
utils.math.avg_std([600, 20, 5])
(208.33333333333334, 277.0178494048513)
To compute the standard deviation of sample, you can use std_sample.
utils.math.std_sample([600, 20, 5])
339.27619034251916
To compute the average of standard deviation of sample, you can use avg_std_sample.
utils.math.avg_std_sample([600, 20, 5])
(208.33333333333334, 339.27619034251916)
To computes the sum of the elements of the array, you can use array_sum.
To computes the sum of all the elements in the array, the sum of the square of each element and the number of elements of the array, you can use array_stats.
utils.math.array_stats([600, 20, 5])
(625.0, 360425.0, 3)
To perfomers interpolation and extrapolation, you can use interpolation.
utils.math.interpolation(15, 20, 65, 86, 5)
6.799999999999999
Trajectories¶
To read a csv file into a MoveDataFrame
move_data = utils.trajectories.read_csv('geolife_sample.csv')
type(move_data)
pymove.core.pandas.PandasMoveDataFrame
To invert the keys values of a dictionary
utils.trajectories.invert_dict({1: 'a', 2: 'b'})
{'a': 1, 'b': 2}
To flatten a nested dictionary
utils.trajectories.flatten_dict({'1': 'a', '2': {'3': 'b', '4': 'c'}})
{'1': 'a', '2_3': 'b', '2_4': 'c'}
To flatten a dataframe with dict as row values
df = move_data.head(3)
df['dict_column'] = [{'a': 1}, {'b': 2}, {'c': 3}]
df
lat | lon | datetime | id | dict_column | |
---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | {'a': 1} |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | {'b': 2} |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | {'c': 3} |
utils.trajectories.flatten_columns(df, columns='dict_column')
lat | lon | datetime | id | dict_column_c | dict_column_b | dict_column_a | |
---|---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | NaN | NaN | 1.0 |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | NaN | 2.0 | NaN |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | 3.0 | NaN | NaN |
To shift a sequence
utils.trajectories.shift([1., 2., 3., 4.], 1)
array([nan, 1., 2., 3.])
To fill a sequence with values from another
l1 = ['a', 'b', 'c', 'd', 'e']
utils.trajectories.fill_list_with_new_values(l1, [1, 2, 3])
l1
[1, 2, 3, 'd', 'e']
To transform a string representation back into a list
utils.trajectories.object_for_array('[1,2,3,4,5]')
array([1., 2., 3., 4., 5.], dtype=float32)
To convert a column with string representation back into a list
df['list_column'] = ['[1,2]', '[3,4]', '[5,6]']
df
lat | lon | datetime | id | dict_column | list_column | |
---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | {'a': 1} | [1,2] |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | {'b': 2} | [3,4] |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | {'c': 3} | [5,6] |
utils.trajectories.column_to_array(df, column='list_column')
lat | lon | datetime | id | dict_column | list_column | |
---|---|---|---|---|---|---|
0 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | 1 | {'a': 1} | [1.0, 2.0] |
1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 1 | {'b': 2} | [3.0, 4.0] |
2 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 1 | {'c': 3} | [5.0, 6.0] |
Log¶
mdf = pm.read_csv('geolife_sample.csv')
To cotrol the verbosity of pymove functions, use the logger
To change verbosity use the utils.log.set_verbosity
method, or
create and environment variable named PYMOVE_VERBOSITY
By default, the berbosity level is set to INFO
utils.log.logger
<Logger pymove (INFO)>
INFO
shows only useful information, like progress bars
mdf.generate_dist_features(inplace=False).head()
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id | lat | lon | datetime | dist_to_prev | dist_to_next | dist_prev_to_next | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | 13.690153 | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 7.403788 | 20.223428 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 1.821083 | 5.888579 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 2.889671 | 1.873356 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 66.555997 | 68.727260 |
DEBUG
shows information from various steps in the functions
utils.log.set_verbosity('DEBUG')
mdf.generate_dist_features(inplace=False).head()
...Sorting by id and datetime to increase performance
...Set id as index to a higher performance
Creating or updating distance features in meters...
VBox(children=(HTML(value=''), IntProgress(value=0, max=2)))
id | lat | lon | datetime | dist_to_prev | dist_to_next | dist_prev_to_next | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | 13.690153 | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 7.403788 | 20.223428 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 1.821083 | 5.888579 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 2.889671 | 1.873356 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 66.555997 | 68.727260 |
WARN
hides all output except warnings and errors
utils.log.set_verbosity('WARN')
mdf.generate_dist_features(inplace=False).head()
id | lat | lon | datetime | dist_to_prev | dist_to_next | dist_prev_to_next | |
---|---|---|---|---|---|---|---|
0 | 1 | 39.984094 | 116.319236 | 2008-10-23 05:53:05 | NaN | 13.690153 | NaN |
1 | 1 | 39.984198 | 116.319322 | 2008-10-23 05:53:06 | 13.690153 | 7.403788 | 20.223428 |
2 | 1 | 39.984224 | 116.319402 | 2008-10-23 05:53:11 | 7.403788 | 1.821083 | 5.888579 |
3 | 1 | 39.984211 | 116.319389 | 2008-10-23 05:53:16 | 1.821083 | 2.889671 | 1.873356 |
4 | 1 | 39.984217 | 116.319422 | 2008-10-23 05:53:21 | 2.889671 | 66.555997 | 68.727260 |
Mem¶
utils.log.set_verbosity('INFO')
Calculate size of variable
utils.mem.total_size(mdf, verbose=True)
Size in bytes: 6965040, Type: <class 'pymove.core.pandas.PandasMoveDataFrame'>
6965040
Reduce size of dataframe
utils.mem.reduce_mem_usage_automatic(mdf)
Memory usage of dataframe is 6.64 MB
Memory usage after optimization is: 2.70 MB
Decreased by 59.4 %
Create a dataframe with the variables with largest memory footpring
lst = [*range(10000)]
utils.mem.top_mem_vars(globals())
var | mem | |
---|---|---|
0 | move_data | 6.6 MiB |
1 | mdf | 2.7 MiB |
2 | lst | 88.0 KiB |
3 | Out | 2.2 KiB |
4 | df | 1.1 KiB |
5 | In | 648.0 B |
6 | l1 | 96.0 B |
7 | matplotlib | 72.0 B |
8 | sys | 72.0 B |
9 | os | 72.0 B |