pymove.core package¶

Submodules¶

pymove.core.dask module¶

DaskMoveDataFrame class.

class pymove.core.dask.DaskMoveDataFrame(data: DataFrame | list | dict, latitude: str = 'lat', longitude: str = 'lon', datetime: str = 'datetime', traj_id: str = 'id', n_partitions: int = 1)[source]¶

Bases: dask.dataframe.core.DataFrame, pymove.core.interface.MoveDataFrameAbstractModel

PyMove dataframe extending Dask DataFrame.

all(*args, **kwargs)[source]¶: Indicates if all elements are True, potentially over an axis.

any(*args, **kwargs)[source]¶: Indicates if any element is True, potentially over an axis.

append(*args, **kwargs)[source]¶: Append rows of other to the end of caller, returning a new object.

astype(*args, **kwargs)[source]¶: Casts a dask object to a specified dtype.

at¶: Access a single value for a row/column label pair.

columns¶: The column labels of the DataFrame.

convert_to(new_type: str) → MoveDataFrame | 'PandasMoveDataFrame' | 'DaskMoveDataFrame'[source]¶

Convert an object from one type to another specified by the user.

Parameters:	new_type ('pandas' or 'dask') – The type for which the object will be converted.
Returns:	The converted object.
Return type:	A subclass of MoveDataFrameAbstractModel

copy(*args, **kwargs)[source]¶: Make a copy of this object’srs indices and data.

count(*args, **kwargs)[source]¶: Counts the non-NA cells for each column or row.

datetime¶

Checks for the DATETIME column and returns its value.

Returns:	DATETIME column
Return type:	Series
Raises:	`AttributeError` – If the DATETIME column is not present in the DataFrame

describe(*args, **kwargs)[source]¶: Generate descriptive statistics.

drop(*args, **kwargs)[source]¶: Drops specified rows or columns of the dask Dataframe.

drop_duplicates(*args, **kwargs)[source]¶: Removes duplicated rows from the data.

dropna(*args, **kwargs)[source]¶: Removes missing data from dask DataFrame.

dtypes¶: Return the dtypes in the DataFrame.

duplicated(*args, **kwargs)[source]¶: Returns boolean Series denoting duplicate rows.

fillna(*args, **kwargs)[source]¶: Fills missing data in the dask DataFrame.

generate_date_features(*args, **kwargs)[source]¶: Create or update date feature.

generate_datetime_in_format_cyclical(*args, **kwargs)[source]¶: Create or update column with cyclical datetime feature.

generate_day_of_the_week_features(*args, **kwargs)[source]¶: Create or update a feature day of the week from datatime.

generate_dist_features(*args, **kwargs)[source]¶: Create the three distance in meters to an GPS point P.

generate_dist_time_speed_features(*args, **kwargs)[source]¶: Creates features of distance, time and speed between points.

generate_hour_features(*args, **kwargs)[source]¶: Create or update hour feature.

generate_move_and_stop_by_radius(*args, **kwargs)[source]¶: Create or update column with move and stop points by radius.

generate_speed_features(*args, **kwargs)[source]¶: Create the three speed in meters by seconds to an GPS point P.

generate_tid_based_on_id_datetime(*args, **kwargs)[source]¶: Create or update trajectory id based on id e datetime.

generate_time_features(*args, **kwargs)[source]¶: Create the three time in seconds to an GPS point P.

generate_time_of_day_features(*args, **kwargs)[source]¶: Create a feature time of day or period from datatime.

generate_weekend_features(*args, **kwargs)[source]¶: Create or update the feature weekend to the dataframe.

get_bbox(*args, **kwargs)[source]¶: Creates the bounding box of the trajectories.

get_type() → str[source]¶

Returns the type of the object.

Returns:	A string representing the type of the object.
Return type:	str

get_users_number(*args, **kwargs)[source]¶: Check and return number of users in trajectory data.

groupby(*args, **kwargs)[source]¶: Groups dask DataFrame using a mapper or by a Series of columns.

head(n: int = 5, npartitions: int = 1, compute: bool = True) → dask.dataframe.core.DataFrame[source]¶

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

Parameters:	n (int, optional, default 5) – Number of rows to select. npartitions (int, optional, default 1.) – Represents the number partitions. compute (bool, optional, default True.) – Wether to perform the operation
Returns:	The first n rows of the caller object.
Return type:	same type as caller

iloc¶: Purely integer-location based indexing for selection by position.

index¶: The row labels of the DataFrame.

info(*args, **kwargs)[source]¶: Print a concise summary of a DataFrame.

isin(*args, **kwargs)[source]¶: Determines whether each element is contained in values.

isna(*args, **kwargs)[source]¶: Detect missing values.

join(*args, **kwargs)[source]¶: Join columns of another DataFrame.

lat¶

Checks for the LATITUDE column and returns its value.

Returns:	LATITUDE column
Return type:	Series
Raises:	`AttributeError` – If the LATITUDE column is not present in the DataFrame

len(*args, **kwargs)[source]¶: Returns the length/row numbers in trajectory data.

lng¶

Checks for the LONGITUDE column and returns its value.

Returns:	LONGITUDE column
Return type:	Series
Raises:	`AttributeError` – If the LONGITUDE column is not present in the DataFrame

loc¶: Access a group of rows and columns by label(srs) or a boolean array.

max(*args, **kwargs)[source]¶: Return the maximum of the values for the requested axis.

memory_usage(*args, **kwargs)[source]¶: Return the memory usage of each column in bytes.

merge(*args, **kwargs)[source]¶: Merge columns of another DataFrame.

min(*args, **kwargs)[source]¶: Return the minimum of the values for the requested axis.

nunique(*args, **kwargs)[source]¶: Count distinct observations over requested axis.

plot(*args, **kwargs)[source]¶: Plot the data of the dask DataFrame.

plot_all_features(*args, **kwargs)[source]¶: Generate a visualization for each column that type is equal dtype.

plot_traj_id(*args, **kwargs)[source]¶: Generate a visualization for a trajectory with the specified tid.

plot_trajs(*args, **kwargs)[source]¶: Generate a visualization that show trajectories.

rename(*args, **kwargs)[source]¶: Alter axes labels..

reset_index(*args, **kwargs)[source]¶: Resets the dask DataFrame’srs index, and use the default one.

sample(*args, **kwargs)[source]¶: Samples data from the dask DataFrame.

select_dtypes(*args, **kwargs)[source]¶: Returns a subset of the columns based on the column dtypes.

set_index(*args, **kwargs)[source]¶: Set of row labels using one or more existing columns or arrays.

shape¶: Return a tuple representing the dimensionality of the DataFrame.

shift(*args, **kwargs)[source]¶: Shifts by desired number of periods with an optional time freq.

show_trajectories_info(*args, **kwargs)[source]¶: Show dataset information from dataframe.

sort_values(*args, **kwargs)[source]¶: Sorts the values of the dask DataFrame.

tail(n: int = 5, npartitions: int = 1, compute: bool = True) → dask.dataframe.core.DataFrame[source]¶

Return the last n rows.

This function returns the last n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

Parameters:	n (int, optional, default 5) – Number of rows to select. npartitions (int, optional, default 1.) – Represents the number partitions. compute (bool, optional, default True.) – ?
Returns:	The last n rows of the caller object.
Return type:	same type as caller

time_interval(*args, **kwargs)[source]¶: Get time difference between max and min datetime in trajectory.

to_csv(*args, **kwargs)[source]¶: Write object to a comma-separated values (csv) file.

to_data_frame() → dask.dataframe.core.DataFrame[source]¶

Converts trajectory data to DataFrame format.

Returns:	Represents the trajectory in DataFrame format.
Return type:	dask.dataframe.DataFrame

to_dict(*args, **kwargs)[source]¶: Converts trajectory data to dict format.

to_grid(*args, **kwargs)[source]¶: Converts trajectory data to grid format.

to_numpy(*args, **kwargs)[source]¶: Converts trajectory data to numpy array format.

unique(*args, **kwargs)[source]¶: Return unique values of Series object.

values¶: Return a Numpy representation of the DataFrame.

write_file(*args, **kwargs)[source]¶: Write trajectory data to a new file.

pymove.core.dataframe module¶

MoveDataFrame class.

class pymove.core.dataframe.MoveDataFrame[source]¶

Bases: object

Auxiliary class to check and transform data into Pymove Dataframes.

static format_labels(current_id: str, current_lat: str, current_lon: str, current_datetime: str) → dict[source]¶

Format the labels for the PyMove lib pattern labels output lat, lon and datatime.

Parameters:	current_id (str) – Represents the column name of feature id current_lat (str) – Represents the column name of feature latitude current_lon (str) – Represents the column name of feature longitude current_datetime (str) – Represents the column name of feature datetime
Returns:	Represents a dict with mapping current columns of data to format of PyMove column.
Return type:	Dict

static has_columns(data: pandas.core.frame.DataFrame) → bool[source]¶

Checks whether the received dataset has ‘lat’, ‘lon’, ‘datetime’ columns.

Parameters:	data (DataFrame) – Input trajectory data
Returns:	Represents whether or not you have the required columns
Return type:	bool

static validate_move_data_frame(data: pandas.core.frame.DataFrame)[source]¶

Converts the column type to the default type used by PyMove lib.

Parameters:	data (DataFrame) – Input trajectory data
Raises:	`KeyError` – If missing one of lat, lon, datetime columns ValueError, ParserError – If the data types can’t be converted

pymove.core.grid module¶

Grid class.

class pymove.core.grid.Grid(data: DataFrame | dict, cell_size: float | None = None, meters_by_degree: float | None = None)[source]¶

Bases: object

PyMove class representing a grid.

convert_one_index_grid_to_two(data: pandas.core.frame.DataFrame, label_grid_index: str = 'index_grid')[source]¶

Converts grid lat-lon ids to unique values.

Parameters:	data (DataFrame) – Dataframe with grid lat-lon ids label_grid_index (str, optional) – grid unique id column, by default INDEX_GRID

convert_two_index_grid_to_one(data: pandas.core.frame.DataFrame, label_grid_lat: str = 'index_grid_lat', label_grid_lon: str = 'index_grid_lon')[source]¶

Converts grid lat-lon ids to unique values.

Parameters:	data (DataFrame) – Dataframe with grid lat-lon ids label_grid_lat (str, optional) – grid lat id column, by default INDEX_GRID_LAT label_grid_lon (str, optional) – grid lon id column, by default INDEX_GRID_LON

create_all_polygons_on_grid()[source]¶

Create all polygons that are represented in a grid.

Stores the polygons in the grid_polygon key

create_all_polygons_to_all_point_on_grid(data: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Create all polygons to all points represented in a grid.

Parameters:	data (DataFrame) – Represents the dataset with contains lat, long and datetime
Returns:	Represents the same dataset with new key ‘polygon’ where polygons were saved.
Return type:	DataFrame

create_one_polygon_to_point_on_grid(index_grid_lat: int, index_grid_lon: int) → shapely.geometry.polygon.Polygon[source]¶

Create one polygon to point on grid.

Parameters:	index_grid_lat (int) – Represents index of grid that reference latitude. index_grid_lon (int) – Represents index of grid that reference longitude.
Returns:	Represents a polygon of this cell in a grid.
Return type:	Polygon

create_update_index_grid_feature(data: pandas.core.frame.DataFrame, unique_index: bool = True, label_dtype: Callable = <class 'numpy.int64'>, sort: bool = True)[source]¶

Create or update index grid feature.

It is not necessary pass dic_grid, because it creates a dic_grid if not provided.

Parameters:	data (DataFrame) – Represents the dataset with contains lat, long and datetime. unique_index (bool, optional) – How to index the grid, by default True label_dtype (Callable, optional) – Represents the type of a value of new column in dataframe, by default np.int64 sort (bool, optional) – Represents if needs to sort the dataframe, by default True

get_grid() → dict[source]¶

Returns the grid object in a dict format.

Returns:	Dict with grid information ’lon_min_x’: minimum x of grid, ‘lat_min_y’: minimum y of grid, ‘grid_size_lat_y’: lat y size of grid, ‘grid_size_lon_x’: lon x size of grid, ‘cell_size_by_degree’: cell size in radians
Return type:	Dict

point_to_index_grid(event_lat: float, event_lon: float) → tuple[int, int][source]¶

Locate the coordinates x and y in a grid of point (lat, long).

Parameters:	event_lat (float) – Represents the latitude of a point event_lon (float) – Represents the longitude of a point
Returns:	Represents the index y in a grid of a point (lat, long) Represents the index x in a grid of a point (lat, long)
Return type:	Tuple[int, int]

read_grid_pkl(filename: str) → Grid[source]¶

Read grid dict from a file .pkl.

Parameters:	filename (str) – Represents the name of a file.
Returns:	Grid object containing informations about virtual grid
Return type:	Grid

save_grid_pkl(filename: str)[source]¶

Save a grid with new file .pkl.

Parameters:	filename (Text) – Represents the name of a file.

pymove.core.interface module¶

class pymove.core.interface.MoveDataFrameAbstractModel[source]¶

Bases: abc.ABC

all()[source]¶

any()[source]¶

append()[source]¶

astype()[source]¶

at()[source]¶

columns()[source]¶

convert_to(new_type: str)[source]¶

copy()[source]¶

count()[source]¶

datetime()[source]¶

describe()[source]¶

drop()[source]¶

drop_duplicates()[source]¶

dropna()[source]¶

dtypes()[source]¶

duplicated()[source]¶

fillna()[source]¶

generate_date_features()[source]¶

generate_datetime_in_format_cyclical()[source]¶

generate_day_of_the_week_features()[source]¶

generate_dist_features()[source]¶

generate_dist_time_speed_features()[source]¶

generate_hour_features()[source]¶

generate_move_and_stop_by_radius()[source]¶

generate_speed_features()[source]¶

generate_tid_based_on_id_datetime()[source]¶

generate_time_features()[source]¶

generate_time_of_day_features()[source]¶

generate_weekend_features()[source]¶

get_bbox()[source]¶

get_type()[source]¶

get_users_number()[source]¶

groupby()[source]¶

head()[source]¶

iloc()[source]¶

index()[source]¶

info()[source]¶

isin()[source]¶

isna()[source]¶

join()[source]¶

lat()[source]¶

len()[source]¶

lng()[source]¶

loc()[source]¶

max()[source]¶

memory_usage()[source]¶

merge()[source]¶

min()[source]¶

nunique()[source]¶

plot()[source]¶

plot_all_features()[source]¶

plot_traj_id()[source]¶

plot_trajs()[source]¶

rename()[source]¶

reset_index()[source]¶

sample()[source]¶

select_dtypes()[source]¶

set_index()[source]¶

shape()[source]¶

shift()[source]¶

show_trajectories_info()[source]¶

sort_values()[source]¶

tail()[source]¶

time_interval()[source]¶

to_csv()[source]¶

to_data_frame()[source]¶

to_dict()[source]¶

to_grid()[source]¶

to_numpy()[source]¶

values()[source]¶

write_file()[source]¶

pymove.core.pandas module¶

PandasMoveDataFrame class.

class pymove.core.pandas.PandasMoveDataFrame(data: DataFrame | list | dict, latitude: str = 'lat', longitude: str = 'lon', datetime: str = 'datetime', traj_id: str = 'id')[source]¶

Bases: pandas.core.frame.DataFrame

PyMove dataframe extending Pandas DataFrame.

append(other: 'PandasMoveDataFrame' | DataFrame, ignore_index: bool = False, verify_integrity: bool = False, sort: bool = False) → 'PandasMoveDataFrame'[source]¶

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

Parameters:	other (DataFrame or Series/dict-like object, or list of these) – The data to append. ignore_index (bool, optional) – If True, do not use the index labels, by default False verify_integrity (bool, optional) – If True, raise ValueError on creating index with duplicates, by default False sort (bool, optional) – Sort columns if the columns of self and other are not aligned The default sorting is deprecated and will change to not-sorting in a future version of pandas. by default False
Returns:	A dataframe containing rows from both the caller and other.
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

astype(dtype: Callable | dict, copy: bool = True, errors: str = 'raise') → DataFrame[source]¶

Cast a pandas object to a specified dtype.

Parameters:	dtype (callable, dict) – Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns to column-specific types. copy (bool, optional) – Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects), by default True errors (str, optional) – Control raising of exceptions on invalid data for provided dtype, by default ‘raise raise : allow exceptions to be raised ignore : suppress exceptions. On error return original object
Returns:	Casted object to specified type.
Return type:	DataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

Raises:	`AttributeError` – If trying to change required types inplace

convert_to(new_type: str) → MoveDataFrame | 'PandasMoveDataFrame' | 'DaskMoveDataFrame'[source]¶

Convert an object from one type to another specified by the user.

Parameters:	new_type ('pandas' or 'dask') – The type for which the object will be converted.
Returns:	The converted object.
Return type:	A subclass of MoveDataFrameAbstractModel

copy(deep: bool = True) → PandasMoveDataFrame[source]¶

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below). When deep=False, a new object will be created without copying the calling object data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters:	deep (bool, optional) – Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied, by default True
Returns:	Object type matches caller.
Return type:	PandasMoveDataFrame

Notes

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below). While Index objects are copied when deep=True, the underlying numpy array is not copied for performance reasons. Since Index is immutable, the underlying data can be safely shared and a copy is not needed.

datetime¶

Checks for the DATETIME column and returns its value.

Returns:	DATETIME column
Return type:	Series
Raises:	`AttributeError` – If the DATETIME column is not present in the DataFrame

Removes rows or columns.

By specifying label names and corresponding axis, or by specifying directly index or column names. When using a multiindex, labels on different levels can be removed by specifying the level.

Parameters:	labels (str or list, optional) – Index or column labels to drop, by default None axis (int or str, optional) – Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’), by default 0 index (str or list, optional) – Alternative to specifying axis (labels, axis=0 is equivalent to index=labels), by default None columns (str or list, optional) – Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels), by default None level (str or int, optional) – For MultiIndex, level from which the labels will be removed, by default None inplace (bool, optional) – If True, do operation inplace and return None Otherwise, make a copy, do operations and return, by default False errors (bool, optional) – ‘ignore’, ‘raise’, by default ‘raise’ If ‘ignore’, suppress error and only existing labels are dropped.
Returns:	Object without the removed index or column labels or None
Return type:	PandasMoveDataFrame, DataFrame
Raises:	`AttributeError` – If trying to drop a required column inplace `KeyError` – If any of the labels is not found in the selected axis.

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

drop_duplicates(subset: int | str | None = None, keep: str | bool = 'first', inplace: bool = False) → 'PandasMoveDataFrame' | None[source]¶

Uses the pandas’s function drop_duplicates, to remove duplicated rows from data.

Parameters:	subset (int or str, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns, by default None keep (str, optional) – first : Drop duplicates except for the first occurrence. last : Drop duplicates except for the last occurrence. False : Drop all duplicates. by default ‘first’ inplace (bool, optional) – Whether to drop duplicates in place or to return a copy, by default False
Returns:	Object with duplicated rows or None
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

dropna(axis: int | str = 0, how: str = 'any', thresh: float | None = None, subset: list | None = None, inplace: bool = False)[source]¶

Removes missing data.

Parameters:	axis (0 or 'index', 1 or 'columns', None, optional) – Determine if rows or columns are removed, by default 0 - 0, or ‘index’ : Drop rows which contain missing values. - 1, or ‘columns’ : Drop columns which contain missing value. how (str, optional) – Determine if row or column is removed from DataFrame, by default ‘any when we have at least one NA or all NA. ’any’ : If any NA values are present, drop that row or column. ’all’ : If all values are NA, drop that row or column. thresh (float, optional) – Require that many non-NA values, by default None subset (array-like, optional) – Labels along other axis to consider, by default None e.g. if you are dropping rows these would be a list of columns to include. inplace (bool, optional) – If True, do operation inplace and return None, by default False
Returns:	Object with NA entries dropped or None
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

Raises:	`AttributeError` – If trying to drop required columns inplace

Fill NA/NaN values using the specified method.

Parameters:	value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list. method ({'backfill', 'bfill', 'pad', 'ffill', None}, default None) – Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap. axis ({0 or 'index', 1 or 'columns'}) – Axis along which to fill missing values. inplace (bool, default False) – If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame). limit (int, default None) – If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None. downcast (dict, default is None) – A dict of item->dtype of what to downcast if possible, or the str ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
Returns:	Object with missing values filled or None
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

generate_date_features(inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update date feature based on datetime.

Parameters:	inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

generate_datetime_in_format_cyclical(label_datetime: str = 'datetime', inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update column with cyclical datetime feature.

Parameters:	label_datetime (str, optional) – Represents column id type, by default DATETIME inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

References

https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/ https://www.avanwyk.com/encoding-cyclical-features-for-deep-learning/

generate_day_of_the_week_features(inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update day of the week features based on datetime.

Parameters:	inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

generate_dist_features(label_id: str = 'id', label_dtype: Callable = <class 'numpy.float64'>, sort: bool = True, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create the three distance in meters to an GPS point P.

Parameters:	label_id (str, optional) – Represents name of column of trajectories id, by default TRAJ_ID label_dtype (callable, optional) – Represents column id type, by default np.float64 sort (bool, optional) – If sort == True the dataframe will be sorted, by True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

Examples

P to P.next = 2 meters
P to P.previous = 1 meter
P.previous to P.next = 1 meters

generate_dist_time_speed_features(label_id: str = 'id', label_dtype: Callable = <class 'numpy.float64'>, sort: bool = True, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Adds distance, time and speed information to the dataframe.

Firstly, create the three distance to an GPS point P (lat, lon). After, create two time features to point P: time to previous and time to next. Lastly, create two features to speed using time and distance features.

Parameters:	label_id (str, optional) – Represents name of column of trajectories id, by default TRAJ_ID label_dtype (callable, optional) – Represents column id type, by default np.float64 sort (bool, optional) – If sort == True the dataframe will be sorted, by True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

Examples

dist_to_prev = 248.33 meters, dist_to_prev 536.57 meters
time_to_prev = 60 seconds, time_prev = 60.0 seconds
speed_to_prev = 4.13 m/srs, speed_prev = 8.94 m/srs.

generate_hour_features(inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update hour features based on datetime.

Parameters:	inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

generate_move_and_stop_by_radius(radius: float = 0, target_label: str = 'dist_to_prev', inplace: bool = True)[source]¶

Create or update column with move and stop points by radius.

Parameters:	radius (float, optional) – Represents radius, by default 0 target_label (str, optional) – Represents column to compute, by default DIST_TO_PREV inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

generate_speed_features(label_id: str = 'id', label_dtype: Callable = <class 'numpy.float64'>, sort: bool = True, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create the three speed in meter by seconds to an GPS point P.

Parameters:	label_id (str, optional) – Represents name of column of trajectories id, by default TRAJ_ID label_dtype (callable, optional) – Represents column id type, by default np.float64 sort (bool, optional) – If sort == True the dataframe will be sorted, by True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame
Raises:	`ValueError` – If feature generation fails

Examples

P to P.next = 1 meter/seconds
P to P.previous = 3 meter/seconds
P.previous to P.next = 2 meter/seconds

generate_tid_based_on_id_datetime(str_format: str = '%Y%m%d%H', sort: bool = True, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update trajectory id based on id and datetime.

Parameters:	str_format (str, optional) – Format to consider the datetime, by default ‘%Y%m%d%H’ sort (bool, optional) – Wether to sort the dataframe, by default True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

generate_time_features(label_id: str = 'id', label_dtype: Callable = <class 'numpy.float64'>, sort: bool = True, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create the three time in seconds to an GPS point P.

Parameters:	label_id (str, optional) – Represents name of column of trajectories id, by default TRAJ_ID label_dtype (callable, optional) – Represents column id type, by default np.float64 sort (bool, optional) – If sort == True the dataframe will be sorted, by True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

Examples

P to P.next = 5 seconds
P to P.previous = 15 seconds
P.previous to P.next = 20 seconds

generate_time_of_day_features(inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Create or update time of day features based on datetime.

Parameters:	inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None Early morning from 0H to 6H Morning from 6H to 12H Afternoon from 12H to 18H Evening from 18H to 24H
Return type:	PandasMoveDataFrame

Examples

datetime1 = 2019-04-28 02:00:56 -> period = Early Morning
datetime2 = 2019-04-28 08:00:56 -> period = Morning
datetime3 = 2019-04-28 14:00:56 -> period = Afternoon
datetime4 = 2019-04-28 20:00:56 -> period = Evening

generate_weekend_features(create_day_of_week: bool = False, inplace: bool = True) → 'PandasMoveDataFrame' | None[source]¶

Adds information to rows determining if it is a weekend day.

Create or update the feature weekend to the dataframe, if this resource indicates that the given day is the weekend, otherwise, it is a day of the week.

Parameters:	create_day_of_week (bool, optional) – Indicates if the column day should be keeped in the dataframe. If set to False the column will be dropped, by default False inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasMoveDataFrame

get_bbox() → tuple[float, float, float, float][source]¶

Returns the bounding box of the dataframe.

A bounding box (usually shortened to bbox) is an area defined by two longitudes and two latitudes, where:

Latitude is a decimal number between -90.0 and 90.0.

Longitude is a decimal number between -180.0 and 180.0.

They usually follow the standard format of: - bbox = left, bottom, right, top - bbox = min Longitude , min Latitude , max Longitude , max Latitude

Returns:	Represents a bound box, that is a tuple of 4 values with the min and max limits of latitude e longitude. lat_min, lon_min, lat_max, lon_max
Return type:	Tuple[float, float, float, float]

Examples

(22.147577, 113.54884299999999, 41.132062, 121.156224)

get_type() → str[source]¶

Returns the type of the object.

Returns:	A string representing the type of the object.
Return type:	str

get_users_number() → int[source]¶

Check and return number of users in trajectory data.

Returns:	Represents the number of users in trajectory data.
Return type:	int

head(n: int = 5) → PandasMoveDataFrame[source]¶

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

Parameters:	n (int, optional) – Number of rows to select, by default 5
Returns:	The first n rows of the caller object.
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html

isin(values: list | Series | DataFrame | dict) → DataFrame[source]¶

Determines whether each element in the DataFrame is contained in values.

values : iterable, Series, DataFrame or dict: The result will only be true at a location if all the labels match. If values is a Series, the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

Returns:	DataFrame of booleans showing whether each element in the DataFrame is contained in values
Return type:	DataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

join(other: 'PandasMoveDataFrame' | DataFrame, on: str | list | None = None, how: str = 'left', lsuffix: str = '', rsuffix: str = '', sort: bool = False) → 'PandasMoveDataFrame'[source]¶

Join columns of other, returning a new object.

Join columns with other PandasMoveDataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters:	other (DataFrame, Series, or list of DataFrame) – Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame. on (str or list of str or array-like, optional) – Column or index level name(srs) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation. how ({'left', 'right', 'outer', 'inner'}, optional) – How to handle the operation of the two objects, by default ‘left’ left: use calling frame index (or column if on is specified) right: use other index. outer: form union of calling frame index (or column if on is specified) with other index, and sort it. lexicographically. * inner: form intersection of calling frame index (or column if on is specified) with other index, preserving the order of the calling one. lsuffix (str, optional) – Suffix to use from left frame overlapping columns, by default ‘’ rsuffix (str, optional) – Suffix to use from right frame overlapping columns, by default ‘’ sort (bool, optional) – Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword)
Returns:	A dataframe containing columns from both the caller and other.
Return type:	PandasMoveDataFrame

Notes

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

lat¶

Checks for the LATITUDE column and returns its value.

Returns:	LATITUDE column
Return type:	Series
Raises:	`AttributeError` – If the LATITUDE column is not present in the DataFrame

len() → int[source]¶

Returns the length/row numbers in trajectory data.

Returns:	Represents the trajectory data length.
Return type:	int

lng¶

Checks for the LONGITUDE column and returns its value.

Returns:	LONGITUDE column
Return type:	Series
Raises:	`AttributeError` – If the LONGITUDE column is not present in the DataFrame

Merge DataFrame or named Series objects with a database-style join.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

Parameters:	right (DataFrame or named Series) – Object to merge with. how ({‘left’, ‘right’, ‘outer’, ‘inner’}, optional) – Type of merge to be performed, by default ‘inner’ left: use only keys from left frame, similar to a SQL left outer join; preserve key order. right: use only keys from right frame, similar to a SQL right outer join; preserve key order. outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. on (label or list, optional) – Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames, by default None left_on (str or list or array-like, optional) – Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns, by default None right_on (str or list or array-like, optional) – Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns, by default None left_index (bool, optional) – Use the index from the left DataFrame as the join key(s), by default False If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels. right_index (bool, optional) – Use the index from the right DataFrame as the join key, by default False Same caveats as left_index. sort (bool, optional) – Sort the join keys lexicographically in the result DataFrame, by default False If False, the order of the join keys depends on the join type (how keyword). suffixes (tuple of (str, str), optional) – Suffix to apply to overlapping column names in the left and right side respectively. To raise an exception on overlapping columns use (False, False) by default (‘_x’, ‘_y’) copy (bool, optional) – If False, avoid copy if possible, by default True indicator (bool or str, optional) – If True, adds a column to output DataFrame called ‘_merge’ with information on the source of each row. If string, column with information on source of each row will be added to output DataFrame, and column will be named value of string. Information column is Categorical-type and takes on a value of ‘left_only’ for observations whose merge key only appears in ‘left’ DataFrame, ‘right_only’ for observations whose merge key only appears in ‘right’ DataFrame, and ‘both’ if the observation’s merge key is found in both. by default False validate (str, optional) – If specified, checks if merge is of specified type, by default None ‘one_to_one’ or ‘1:1’: check if merge keys are unique in both left and right datasets. ’one_to_many’ or ‘1:m’: check if merge keys are unique in left dataset. ‘many_to_one’ or ‘m:1’: check if merge keys are unique in right dataset. ‘many_to_many’ or ‘m:m’: allowed, but does not result in checks.
Returns:	A DataFrame of the two merged objects.
Return type:	PandasMoveDataFrame

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge

Alter axes labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

Parameters:	mapper (dict or function, optional) – Dict-like or functions transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns, by default None index (dict or function, optional) – Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper), by default None columns (dict or function, optional) – Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper), by default None axis (int or str, optional) – Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1), by default None copy (bool, optional) – Also copy underlying data, by default True inplace (bool, optional) – Whether to return a new DataFrame. If True then value of copy is ignored, by default False
Returns:	DataFrame with the renamed axis labels or None
Return type:	PandasMoveDataFrame, DataFrame
Raises:	`AttributeError` – If trying to rename a required column inplace

Resets the DataFrame’s index, and use the default one.

One or more levels can be removed, if the DataFrame has a MultiIndex.

Parameters:

level (int or str or tuple or list, optional) – Only the levels specify will be removed from the index If set to None, all levels are removed, by default None
drop (bool, optional) – Do not try to insert index into dataframe columns This resets the index to the default integer index, by default False
inplace (bool, optional) – Modify the DataFrame in place (do not create a new object), by default False
col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into, by default 0
col_fill (str, optional) – If the columns have multiple levels, determines how the other levels are named If None then the index name is repeated, by default ‘’
PandasMoveDataFrame – The generated picture or None

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:	n (int, optional) – Number of items from axis to return. Cannot be used with frac, by default None frac (float, optional) – Fraction of axis items to return. Cannot be used with n, by deault None replace (bool, optional) – Allow or disallow sampling of the same row more than once, by default False weights (str or ndarray-like, optional) – If ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed. by default None random_state (int or numpy.random.RandomState, optional) – Seed for the random number generator (if int), or numpy RandomState object,by default None axis ({0 or 'index', 1 or 'columns', None}, optional) – Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames), by default None
Returns:	A new object of same type as caller containing n items randomly sampled from the caller object.
Return type:	PandasMoveDataFrame

pymove.core.pandas_discrete module¶

PandasDiscreteMoveDataFrame class.

class pymove.core.pandas_discrete.PandasDiscreteMoveDataFrame(data: DataFrame | list | dict, latitude: str = 'lat', longitude: str = 'lon', datetime: str = 'datetime', traj_id: str = 'id', local_label: str = 'local_label')[source]¶

Bases: pymove.core.pandas.PandasMoveDataFrame

PyMove discrete dataframe extending PandasMoveDataFrame.

discretize_based_grid(region_size: int = 1000)[source]¶

Discrete space in cells of the same size, assigning a unique id to each cell.

Parameters:	region_size (int, optional) – Size of grid cell, by default 1000

generate_prev_local_features(label_id: str = 'id', local_label: str = 'local_label', sort: bool = True, inplace: bool = True) → 'PandasDiscreteMoveDataFrame' | None[source]¶

Create a feature prev_local with the label of previous local to current point.

Parameters:	label_id (str, optional) – Represents name of column of trajectory id, by default TRAJ_ID local_label (str, optional) – Indicates name of column of place labels on symbolic trajectory, by default LOCAL_LABEL sort (bool, optional) – Wether the dataframe will be sorted, by default True inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasDiscreteMoveDataFrame

generate_tid_based_statistics(label_id: str = 'id', local_label: str = 'local_label', mean_coef: float = 1.0, std_coef: float = 1.0, statistics: DataFrame | None = None, label_tid_stat: str = 'tid_stat', drop_single_points: bool = False, inplace: bool = True) → 'PandasDiscreteMoveDataFrame' | None[source]¶

Splits the trajectories into segments based on time statistics for segments.

Parameters:	label_id (str, optional) – Represents name of column of trajectory id, by default TRAJ_ID local_label (str, optional) – Indicates name of column of place labels on symbolic trajectory, by default LOCAL_LABEL mean_coef (float, optional) – Multiplication coefficient of the mean time for the segment, by default 1.0 std_coef (float, optional) – Multiplication coefficient of sdt time for the segment, by default 1.0 statistics (DataFrame, optional) – Time Statistics of the pairwise local labels, by default None label_tid_stat (str, optional) – The label of the column containing the ids of the formed segments. Is the new splitted id, by default TID_STAT drop_single_points (bool, optional) – Wether to drop the trajectories with only one point, by default False inplace (bool, optional) – Represents whether the operation will be performed on the data provided or in a copy, by default True
Returns:	Object with new features or None
Return type:	PandasDiscreteMoveDataFrame
Raises:	`KeyError` – If missing local_label column `ValueError` – If the data contains only null values