Release 0.5.0 with summary statistics and aggregation of multiple activities!
New release of runpandas comes with the summary statistics for one workout and the possibility of combining multiple activities into a single session container for advanced statistical analysis over a period of time.
This current state of the project is
early beta
, which means that features can be added, removed or changed in backwards incompatible ways.
We published this major release with summary statistics for a single activity and the possibility of combining multiple workouts into a one multi-dimensional dataframe (season) , enabling other types of analysis, including historical performance over a period of time. In this release of RunPandas 0.5.0, we include:
- Now the Activity can be summarised through common summary statistics using the
runpandas.types.summary
method. - We enable now the analysis over multiple activities by combining them into a single Activity. This results into new possibilities of aggregated analysis over a group of workouts.
- There is a new acessor
runpandas.acessors.season
, that computes the running metrics through the combined activities. - Finally, there is a
runpandas.types.session_summary
method that includes summary statistics over the season (group) of activities.
Runpandas is a python package based on pandas
data analysis library, that makes it easier to perform data analysis from your running sessions stored at tracking files from cellphones and GPS smartwatches or social sports applications such as Strava, MapMyRUn, NikeRunClub, etc. It is designed to enable reading, transforming and running metrics analytics from several tracking files and apps.
The runpandas
provides for the data runners the runpandas.types.summary
method for the Activity dataframe. This methods computes the estimates of the total distance covered, the total duration, the time spent moving, and several averages metrics such as speed, pace, cadence, and heart rate, calculated based on total duration ot the time spent moving.
#Disable Warnings for a better visualization
import warnings
warnings.filterwarnings('ignore')
#!pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./data/sample.tcx')
activity
#compute the common metrics for the running activity such as distance per position, speed, pace, etc.
activity['distpos'] = activity.compute.distance()
activity['speed'] = activity.compute.speed(from_distances=True)
activity['vam'] = activity.compute.vertical_speed()
activity_only_moving = activity.only_moving()
activity_only_moving.summary()
The result above is an object of pandas.Series
including the main running statistics from the workout.
Runpandas
powered by pandas libraries comes with the pandas.MultiIndex
, which allows the dataframe have multiple columns as a row identifier, while having each index column related to another through a parent/child relationship. In our scenario we have the start time from each activity as the first index level and the timestamps from the activity as the second index level. This enables advanced statistical analysis acrosss one period of training sessions or over a period time.
The code chunk below loads the data using the method runpandas.read_directory_aggregate
, which allows the user to read all the tracking files of a support format in a directory and combine them in a data frame split by sessions based on the timestamps of each activity. It means that for each workout file will be stored in separate lines in the dataframe.
import runpandas
session = runpandas.read_dir_aggregate(dirname='./data/session/')
session
session.index #MultiIndex (start, timestamp)
The package comes now with an acessor runpandas.types.acessors.session._SessionAcessor
that holds special methods for computing the running metrics across all the activities. The calls delegate to the single activity metrics acessors.
#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()
How many activities are there in the activity ? There is a custom method count
that returns the total number of activities in the season frame.
print (session.session.count(), 'activities')
After the loading and metrics computation for all the activities, we now can load the basic summaries about the training sessions: time spent, total distance, mean speed and other insightful statistics for each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize
. It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.
summary = session.session.summarize()
summary
print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
As we can see above, we analyzed the period of 366 days (one year) of running workouts. In this period, she ran 68 times which achieved the total distance of 491 km! The average moving pace is 06'02" per km and average distance of 7.23km! Great numbers for a starter runner!
The next releases will focus on reading of Nike Run app workouts and support plugin for marathon results. It will be awesome, keep tunned!
We are constantly developing Runpandas improving its existing features and adding new ones. We will be glad to hear from you about what you like or don’t like, what features you may wish to see in upcoming releases. Please feel free to contact us.