As far as I know, no one is capturing the ITF data - the data disappears from the live scores app after a few days for all matches.
the Tennis Abstract data is mostly WTA, as that comes official, and for 'free'.
They also have the tennis match charting project which has people watch old matches, and keep stats for them, and then report them, and some of the fruits of that are included in the data, though about ~>80% of that is on mens matches.
So, yes, there would be heavy caveats as only players existing mostly on the WTA tour or in Slams have a reasonable baseline from which to paint the picture of the elements of their games and the stories they might tell, from the key stats.
And that is borne out by the nice figures you have produced, our WTA level players represented in about the quantity of WTA matches that they have each played proportionately.
The ATP side is much more complete, they include all challengers which on the WTA side would take you down to ~$50/60K's, and also have more ITF historical data. As I mentioned in reply to JonH recently, ~90% of all ongoing ad hoc stats work by people at large is for mens tennis.
AliBlahBlah, thanks for sharing your knowledge yet again.
I would imagine that all the historic ITF match data are held in their own database(s).
Do you think we might be able to gain support for at least some basic elements of that data to be released into the public domain by the ITF.
I could see the data being used in many beneficial ways, both for the promotion of tennis and for the 'advancement of data handling skills' amongst the wider population :
- the basic tennis match data is an interesting data set, in that it's very simple, yet it hold many stories within and can be subset in many ways ( date, nation, age, gender, tournament level, game element, etc ).
- the use in academia ( schools and colleges ) in computing projects involving data storage, consolidation, extraction, transformation, grouping, interpretation, and finally presentation of results. The advantage for tennis is that it gets youngsters thinking about the game, it's scoring system, how the games can swing one way or another very quickly, and how players might be losing but should never give up till after the last point is played.
- the use by the public (on forums such as this) to discuss tennis in a positive way, and be in a position to counter negativity with bone fida data.
Perhaps the players would support this initiative?
I had this discussion with the now sadly departed correspondent before the current reliable source (erm, I mean ISF ) They had contaced all the tennis authorities to try and licence their API for data use, to no avail. It's a revenue thing. The official stats are licensed at considerable profit to the ITF & WTA mostly to betting companies - it helps them price their odds; and broadcasters. The chances of that data being officially made freely available is next to nil, or was 5 or 6 years ago, and I can't see that anything has changed since them.
I'm sure - or even, rather, I know - ITF & WTA store massive amounts of match data and tables and variables to which their official statisticians have access. This also includes the much longed for deep historical data, over decades. You acn get that data, but the prices quoted were apparently Enterprise level scaling, and there are severe limitations on their usage.
Unless anyone knows differently that things have now changed? Hence the boon that Jeff Sackmann's Tennis Abstract data provides.
Maybe it could be doable, but it would probably need a groundswell of influencial folk (players?).
If done right it shouldn't affect the ability for the ITF to generate the revenue stream they currently enjoy.
True, it might have to be a limited dataset, possibly not the last 6 months data, and just the basic overall match data (not individual set data, for example).
Maybe just enough to be useful :
- match metadata (tournament info, surface, date, round, etc) - match player data (ranking going in to tournament, qualification mode, name, id, winner and loser) - match result (overall result, duration) - match data ( total service points, total 1st / 2nd service in and win, aces, df's ( for both winner and loser ) )
, that's it ..... percentages, and receiving stats can be calculated from the service count data.
And ***NOT*** ask for public access to an ITF web portal, with the consequent access to a subset of their API, which puts the onus on the ITF to maintain and keep secure.
Rather, get them to just put the selected raw data into an official ITF repository on Github, with the appropriate license / terms of use instructions.
It just seems a shame that such a useful dataset, that could fire the imagination of future generations of both tennis players and programmers, is at the moment kept instead for the proprietary world of betting companies and media houses.
I had this discussion with the now sadly departed correspondent before the current reliable source (erm, I mean ISF ) They had contaced all the tennis authorities to try and licence their API for data use, to no avail. It's a revenue thing. The official stats are licensed at considerable profit to the ITF & WTA mostly to betting companies - it helps them price their odds; and broadcasters. The chances of that data being officially made freely available is next to nil, or was 5 or 6 years ago, and I can't see that anything has changed since them.
I'm sure - or even, rather, I know - ITF & WTA store massive amounts of match data and tables and variables to which their official statisticians have access. This also includes the much longed for deep historical data, over decades. You acn get that data, but the prices quoted were apparently Enterprise level scaling, and there are severe limitations on their usage.
Unless anyone knows differently that things have now changed? Hence the boon that Jeff Sackmann's Tennis Abstract data provides.
Hi Aliblahblah - just seen this. I was really saddened to hear of insomniacfolders -ISF- passing. Although I only joined the board shortly before they started posting less, I enjoyed their great stats and of course ISF helped get the Strongest Nation thread going for me/us. My sympathies and best wishes to their friends and family.