Equine Geometry: how measuring yearling angular and geometric data helps to identify horses with high performance potential

Tom Wilson
7 min readDec 29, 2023

--

Geometry and Trigonometry were things that I thought we’d left behind at school. However, a recent project exploring Equine Geometry shows some early promise in helping identify correlations between angular measurements and future performance potential of horses. So it’s time to dust off those old textbooks.

As part of this project we generated angular data on 5,500 yearlings offered across Australia and New Zealand between 2019–2023. To generate the source data we created an angular profile for each horse and associated measurements using the conformation images of horses uploaded on the sales websites. Each yearling image is run through an image recognition model which has been trained to place markers on key parts or joints on the horses body. Measurements are then generated using the angles between the points.

From our image model we are generating a series of angular data and measurements for each horse. Taking the angles around each of the red dots on the diagrams. Here’s an example below.

In our database we currently have angular measurements of ~5,500 yearlings offered at commercial yearling sales across Australia and New Zealand between 2019–2023. These measurements as they would have been taken at the time of yearling sale are then compared to the subsequent performance records of the horses, mainly whether they were one of the following categories;

Non Runner — didn’t make the track
Non Winner — ran but didn’t win a race
Winner — won a race
Stakes Winner — won a stakes race

The key angles and joint measurements that we observed had a influence on identifying future stakes performers were predominantly focused around two main areas on the horse;

Shoulder — Elbow : as a proxy for stride length

Stifle — Hip — Buttock : the features on the hind quarter of the horse associated with generation of locomotive power.

You can see from the feature importance table below, the angular measurements that had the highest predictiveness in terms of stakes performers:

Yearlings with the angle of the wither — stifle — elbow >78degrees found it difficult to become elite racehorses. 76.9 was the median stifle angle for Stakes level horses across the dataset vs. 78.17 for non winners.

The angular data we were able to generate from the shoulder was the highly correlated to future performance. Shoulder angle from top of the head to the elbow — stakes horses displayed a median of 162.27 degrees vs. 163.66 degrees for non winners

Shoulder — Elbow — Fore Knee; Stakes winners displayed a median angle here of 230.16 degrees vs. non winners of 228.9 degrees.

The individual angles themselves are important. But also informative is the relationships between key angles and joints.

As mentioned above, the angular relationships between the stifle and hip are important for future performance. Stakes horses generally had a Stifle angle <77 degrees and a outer hip angle <294 degrees.

Blue dots represent Stakes Winners, Red dots are Non Runners / Non Winners.

The relationship between the stifle angle and shoulder angle also showed a correlation with future performance outcomes. A high proportion of stakes horses had the stifle angle <77 degrees combined with a head — shoulder — elbow angle of < 166 degrees.

Take IMPERATRIZ as an example, as she has a angular profile that puts her firmly in the middle of the quadrant where Elite horses come from.

Her Wither — Stifle — Elbow angle is 74.81 degrees, hip angle is 75.41 degrees. You can find her in the yellow dot in the image, in the middle of a cluster of stakes level horses.

Now that we understand which data points are influential on future performance, we can model performance potential of new horses that we generate angular data on.

Using our historic dataset of yearlings and their subsequent performance records, we developed a classification model which would output a score for each new horse that it sees; whether it believes based on the data inputs that they will be a Stakes Winner or Non Winner.

We tried a number of different machine learning algorithms, with a Naive Bayes Classification algorithm proving to be the highest performer. To explain the metrics below, AUC stands for “Area Under the ROC Curve” and is a measure of the accuracy of a ML model. A result of 0.5 would be indicate performance similar to that of a coin toss, a result of 1.0 would be a perfect outcome every time. (Or an overfit model)

The AUC of 0.73 is a pretty good score for a type of complex classification challenge such as identification of yearling ability and it can be useful in generating a set of ratings from which we can evaluate individuals.

We evaluate the model by giving it a series of yearlings that we held back from the original training dataset; 330 non winners and 300 stakes winners. Then asking it to predict the correct classification for them.

Note that in real world application the ratio of non winners and stakes winners is not 50/50. Around 5% of horses actually become stakes winners, but the classification step is useful in generating a rating system and score for each individual.

From the sample data, the angular model is able to correctly identify 2/3 stakes horses vs. non winners. (Sample of equal sizes stakes winners vs. non winners). From a ratings system perspective it does a decent job of pushing the higher ability horses to the top and the bottom ability horses to the bottom.

Show me Similar Horses

Something i’ve wanted to do for a while is use the data that we generate on yearlings and compare to previous Elite horses. Effectively, showing a comparison to “similar horses” that we’ve seen in the dataset. It’s another way of looking at this type of data and i think this type of clustering can be useful in helping to identify elite individuals.

Using a Louvain Clustering approach we generated a network which clusters communities of similar horses together. Louvain Clustering is a type of nearest-neighbor analysis, which creates communities of nodes that have similar features.

Each horse within our network is represented as an individual node, and the features of that horse generated from our angular conformation data are being leveraged to help define the communities in which the nodes (horses) sit.

This is a current visual representation of our network; Red Dots are Non Winners, Blue Dots are Stakes Winners and Green Dots are actually yearlings offered at the 2024 MM Gold Coast Yearling sale.

Via our visualisation we can identify communities of horses that might be worth further investigation. Or on the counter side, might be worth counting out, as the likelihood of them being high performers could be low.

For example, I wouldn’t be too interested in buying a yearling from this cluster. No stakes winners have come from there.

And it’s probably not worth spending too much time on horses in this cluster. As they represent a very low proportion of future stakes winners.

It could be worth exploring the following communities of similar horses:

The cluster down the bottom left contains a grouping of Elite Sprinter / Miler types in Mamargan, Coolangatta, Diamonds, Every Rose, Poland, Vangelic.

A horse that is scoring well on our angular model and comes from a cluster with a high proportion of stakes horses could certainly be worth further investigation.

For further discussion you can always contact me at racingsquared@gmail.com

--

--