This data set consists of statistics for a number of hitters (nonpitchers) in the game when this data set was collected several years ago (it's left as an exercise to work out the date. Hint: Look at home long Pete Rose has been in the game in the data set). The object of the analysis is to work out which factors most strongly affect how much a baseball player is paid, but we've also looked at other interesting features of the data set. 
In EDV, the user reads data into the program and then selects variables
using the mouse. Menu choices allow them to derive new variables, rank
variables and otherwise modify them; various options can be set and, of
course, data views can be created.
In this example, all of the variables form one data table so all views created of the data are automatically linked. In the following analysis, we concentrate on the career data as opposed to the current year's data (e.g. CHits/Years as opposed to Hits/AtBat) and present a distilled version of our original exploration. 
Looking at the distribution of salary, it clearly has a skewed distribution. Since salaries are usually increased by a percentage amount, rather than by fixed amounts, we expect this kind of exponential distribution, and for the purposes of analysis it is common for analysts to try a log transformation to symmetrize the data. It seems to work pretty well, although there appears to be a longer tail at the low salary end  there seems to be an additional effect that gives some players lower salaries than we might expect. One factor which seems very likely to have an effect is the amount of time a player has been playing for; we would expect players' salaries to increase over time. We create a scatterplot of salary against years and add a smoothing line through the data. We've also colored the players by their salaries with low paid players showed as blue and high paid ones as red. This plot has a very clear interpretation; for the first 56 years a players salary increases exponentially (remember the salary data is on a log scale), after which it remains more or less constant, perhaps slightly dipping towards the end of a longer career as a player's increases do not match the average increases. 
In a set of box plots of hitting statistics, we selected the highest paid players. Each of the boxplots and the bar chart shows the results. From the bar chart we see the age effect already noted, and in the boxplots we can see that the higherpaid players are indeed better than average. In particular note that the AtBats and Hits are almost always above the median  only a few outliers are below it. This is not true of the HR  home runs  statistic. There are a significant number of wellpaid players who are not big hitters. 

For these views, we selected those players with very long careers.
We colored them via their average number of AtBats per year
and created a spreadsheetlike list view so we can identify them.
Although most of the hitters are fairly similar, there are two obvious unusual cases:

A variety of tables and graphs were created to examine differences in the leagues and divisions. Apart from factors attributable to the 'Designated Hitter' rule, there seemed no evidence of any overall effect. These plots show that neither league nor dision prefers more experienced players or is more active in recruiting younger players. 
This animation shows a triplot of fielding information and
a bar chart coding player's fielding positions. Note that
we have recolored by position to accentuate the movie's effect.
The triplot is a plot that allocates variation among players to three variables; in this case Errors, PutOuts, and Assists. Players who have an exactly average ratio of the three variables to each other will be drawn in the center of the triangle. If they have more of one variable, then they will be closer to that variable's corner of the triangle. The separation into two groups shows that there are two different types of fielders; there is a strong distinction between fielders with many PutOuts and those with many Assists. There are a few points in the middle area, but these turn out to be either Utility fielders or Designated Hitters. In EDV we can animate over the Position bar chart as this movie shows. Not only does this show us how fielding stats are determined by position, but it also shows how these relate to fielding errors. 
How did we know to look for the effects we have displayed in the above sections? The figure here shows how we initially looked at the data. We created a graph that robustly correlates variables of all types and indicates whether or not there is any association. This graph is then displayed via the NicheWorks component of EDV. The resulting figure shows the strength of associations among variables The strongest links to Log(1+Salary) are to the Years variable and then to sets of batting statistics  most strongly to the highly autocorrelated set of Runs/Year, Hits/Year and AtBats/Year. The links to HR, home runs, and to the Walks are much weaker. In fact, looking at the graph, we might suspect that RBIs are more important than home runs, except for the fielding statistics. The fielding statistics form their own group away from the hitting statistics. One weakness of the method we use for correlations is that it cannot detect the interesting pattern in the PutOuts Assists association, although it flags both of them as strongly dependent on Position. 
This analysis has given a good initial picture of the relationships
among the variables. We would go on to suggest building quantitative
models and confirming the information we have discovered in the data.
These results include:

Home  gwills@research.belllabs.com 