This current state of the project is early beta, which means that features can be added, removed or changed in backwards incompatible ways.

We published this major release with summary statistics for a single activity and the possibility of combining multiple workouts into a one multi-dimensional dataframe (season) , enabling other types of analysis, including historical performance over a period of time. In this release of RunPandas 0.5.0, we include:

  • Now the Activity can be summarised through common summary statistics using the runpandas.types.summary method.
  • We enable now the analysis over multiple activities by combining them into a single Activity. This results into new possibilities of aggregated analysis over a group of workouts.
  • There is a new acessor runpandas.acessors.season , that computes the running metrics through the combined activities.
  • Finally, there is a runpandas.types.session_summary method that includes summary statistics over the season (group) of activities.

What is Runpandas?

Runpandas is a python package based on pandas data analysis library, that makes it easier to perform data analysis from your running sessions stored at tracking files from cellphones and GPS smartwatches or social sports applications such as Strava, MapMyRUn, NikeRunClub, etc. It is designed to enable reading, transforming and running metrics analytics from several tracking files and apps.

Main Features

Summary statistics for a single workout

The runpandas provides for the data runners the runpandas.types.summary method for the Activity dataframe. This methods computes the estimates of the total distance covered, the total duration, the time spent moving, and several averages metrics such as speed, pace, cadence, and heart rate, calculated based on total duration ot the time spent moving.

#Disable Warnings for a better visualization
import warnings
warnings.filterwarnings('ignore')
#!pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./data/sample.tcx')
activity
alt dist hr lon lat
time
00:00:00 178.942627 0.000000 62 -79.093187 35.951880
00:00:01 178.942627 0.000000 62 -79.093184 35.951880
00:00:06 178.942627 1.106947 62 -79.093172 35.951868
00:00:12 177.500610 13.003035 62 -79.093228 35.951774
00:00:16 177.500610 22.405027 60 -79.093141 35.951732
... ... ... ... ... ...
00:32:51 170.290649 4613.641602 178 -79.093241 35.951341
00:32:56 169.810059 4630.377930 178 -79.093192 35.951486
00:33:02 168.848755 4652.966309 179 -79.093086 35.951671
00:33:07 167.887329 4671.572754 179 -79.093000 35.951824
00:33:11 166.445312 4686.311035 180 -79.093014 35.951954

383 rows × 5 columns

#compute the common metrics for the running activity such as distance per position, speed, pace, etc.
activity['distpos']  = activity.compute.distance()
activity['speed']  = activity.compute.speed(from_distances=True)
activity['vam'] = activity.compute.vertical_speed()
activity_only_moving = activity.only_moving()
activity_only_moving.summary()
Session                           Running: 26-12-2012 21:29:53
Total distance (meters)                                4686.31
Total ellapsed time                            0 days 00:33:11
Total moving time                              0 days 00:33:05
Average speed (km/h)                                   8.47656
Average moving speed (km/h)                            8.49853
Average pace (per 1 km)                        0 days 00:07:04
Average pace moving (per 1 km)                 0 days 00:07:03
Average cadence                                            NaN
Average moving cadence                                     NaN
Average heart rate                                     156.653
Average moving heart rate                                157.4
Average temperature                                        NaN
dtype: object

The result above is an object of pandas.Series including the main running statistics from the workout.

Combining multiple activities into a Grouped Activity Dataframe

Runpandas powered by pandas libraries comes with the pandas.MultiIndex, which allows the dataframe have multiple columns as a row identifier, while having each index column related to another through a parent/child relationship. In our scenario we have the start time from each activity as the first index level and the timestamps from the activity as the second index level. This enables advanced statistical analysis acrosss one period of training sessions or over a period time.

The code chunk below loads the data using the method runpandas.read_directory_aggregate, which allows the user to read all the tracking files of a support format in a directory and combine them in a data frame split by sessions based on the timestamps of each activity. It means that for each workout file will be stored in separate lines in the dataframe.

import runpandas
session = runpandas.read_dir_aggregate(dirname='./data/session/')
session
alt hr lon lat
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055
00:00:01.091000 NaN NaN -34.893624 -8.045054
00:00:02.091000 NaN NaN -34.893641 -8.045061
00:00:03.098000 NaN NaN -34.893655 -8.045063
00:00:04.098000 NaN NaN -34.893655 -8.045065
... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602
00:52:43.582000 NaN NaN -34.894465 -8.046533
00:52:44.582000 NaN NaN -34.894443 -8.046515
00:52:45.582000 NaN NaN -34.894429 -8.046494
00:52:49.582000 NaN 190.0 -34.894395 -8.046398

48794 rows × 4 columns

session.index  #MultiIndex (start, timestamp)
MultiIndex([('2020-08-30 09:08:51.012000',        '00:00:00'),
            ('2020-08-30 09:08:51.012000', '00:00:01.091000'),
            ('2020-08-30 09:08:51.012000', '00:00:02.091000'),
            ('2020-08-30 09:08:51.012000', '00:00:03.098000'),
            ('2020-08-30 09:08:51.012000', '00:00:04.098000'),
            ('2020-08-30 09:08:51.012000', '00:00:05.096000'),
            ('2020-08-30 09:08:51.012000', '00:00:06.096000'),
            ('2020-08-30 09:08:51.012000', '00:00:07.097000'),
            ('2020-08-30 09:08:51.012000', '00:00:08.097000'),
            ('2020-08-30 09:08:51.012000', '00:00:09.102000'),
            ...
            ('2021-07-04 11:23:19.418000', '00:52:18.584000'),
            ('2021-07-04 11:23:19.418000', '00:52:22.584000'),
            ('2021-07-04 11:23:19.418000', '00:52:26.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:30.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:35.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:39.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:43.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:44.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:45.582000'),
            ('2021-07-04 11:23:19.418000', '00:52:49.582000')],
           names=['start', 'time'], length=48794)

Session compute metrics methods

The package comes now with an acessor runpandas.types.acessors.session._SessionAcessor that holds special methods for computing the running metrics across all the activities. The calls delegate to the single activity metrics acessors.

#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
alt hr lon lat distpos dist
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055 NaN NaN
00:00:01.091000 NaN NaN -34.893624 -8.045054 1.690587 1.690587
00:00:02.091000 NaN NaN -34.893641 -8.045061 2.095596 3.786183
00:00:03.098000 NaN NaN -34.893655 -8.045063 1.594298 5.380481
00:00:04.098000 NaN NaN -34.893655 -8.045065 0.163334 5.543815
... ... ... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602 12.015437 8220.018885
00:52:43.582000 NaN NaN -34.894465 -8.046533 10.749779 8230.768664
00:52:44.582000 NaN NaN -34.894443 -8.046515 3.163638 8233.932302
00:52:45.582000 NaN NaN -34.894429 -8.046494 2.851535 8236.783837
00:52:49.582000 NaN 190.0 -34.894395 -8.046398 11.300740 8248.084577

48794 rows × 6 columns

#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()

How many activities are there in the activity ? There is a custom method count that returns the total number of activities in the season frame.

print (session.session.count(), 'activities') 
68 activities

Session summary statistics

After the loading and metrics computation for all the activities, we now can load the basic summaries about the training sessions: time spent, total distance, mean speed and other insightful statistics for each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

summary = session.session.summarize()
summary
moving_time mean_speed max_speed mean_pace max_pace mean_moving_speed mean_moving_pace mean_cadence max_cadence mean_moving_cadence mean_heart_rate max_heart_rate mean_moving_heart_rate mean_temperature min_temperature max_temperature total_distance ellapsed_time
start
2020-07-03 09:50:53.162 00:25:29.838000 2.642051 4.879655 00:06:18 00:03:24 2.665008 00:06:15 NaN NaN NaN 178.819923 188.0 178.872587 NaN NaN NaN 4089.467333 00:25:47.838000
2020-07-05 09:33:20.999 00:05:04.999000 2.227637 6.998021 00:07:28 00:02:22 3.072098 00:05:25 NaN NaN NaN 168.345455 176.0 168.900000 NaN NaN NaN 980.162640 00:07:20.001000
2020-07-05 09:41:59.999 00:18:19 1.918949 6.563570 00:08:41 00:02:32 2.729788 00:06:06 NaN NaN NaN 173.894180 185.0 174.577143 NaN NaN NaN 3139.401118 00:27:16
2020-07-13 09:13:58.718 00:40:21.281000 2.509703 8.520387 00:06:38 00:01:57 2.573151 00:06:28 NaN NaN NaN 170.808176 185.0 170.795527 NaN NaN NaN 6282.491059 00:41:43.281000
2020-07-17 09:33:02.308 00:32:07.691000 2.643278 8.365431 00:06:18 00:01:59 2.643278 00:06:18 NaN NaN NaN 176.436242 186.0 176.436242 NaN NaN NaN 5095.423045 00:32:07.691000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-06-13 09:22:30.985 01:32:33.018000 2.612872 23.583956 00:06:22 00:00:42 2.810855 00:05:55 NaN NaN NaN 169.340812 183.0 169.655879 NaN NaN NaN 15706.017295 01:40:11.016000
2021-06-20 09:16:55.163 00:59:44.512000 2.492640 6.065895 00:06:41 00:02:44 2.749453 00:06:03 NaN NaN NaN 170.539809 190.0 171.231392 NaN NaN NaN 9965.168311 01:06:37.837000
2021-06-23 09:37:44.000 00:26:49.001000 2.501796 5.641343 00:06:39 00:02:57 2.568947 00:06:29 NaN NaN NaN 156.864865 171.0 156.957031 NaN NaN NaN 4165.492241 00:27:45.001000
2021-06-27 09:50:08.664 00:31:42.336000 2.646493 32.734124 00:06:17 00:00:30 2.661853 00:06:15 NaN NaN NaN 166.642857 176.0 166.721116 NaN NaN NaN 5074.217061 00:31:57.336000
2021-07-04 11:23:19.418 00:47:47.583000 2.602263 4.212320 00:06:24 00:03:57 2.856801 00:05:50 NaN NaN NaN 177.821862 192.0 177.956967 NaN NaN NaN 8248.084577 00:52:49.582000

68 rows × 18 columns

print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
Session Interval: 366 days
Total Workouts: 68 runnings
Tota KM Distance: 491.77377537338896
Average Pace (all runs): 0 days 00:07:18.411764
Average Moving Pace (all runs): 0 days 00:06:02.147058
Average KM Distance (all runs): 7.23

As we can see above, we analyzed the period of 366 days (one year) of running workouts. In this period, she ran 68 times which achieved the total distance of 491 km! The average moving pace is 06'02" per km and average distance of 7.23km! Great numbers for a starter runner!

What is coming next ?

The next releases will focus on reading of Nike Run app workouts and support plugin for marathon results. It will be awesome, keep tunned!

Thanks

We are constantly developing Runpandas improving its existing features and adding new ones. We will be glad to hear from you about what you like or don’t like, what features you may wish to see in upcoming releases. Please feel free to contact us.