I need your help! Show
If you find any typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations. Opening an issue or submitting a pull request on GitHub Adding an annotation using hypothes.is. To add an annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right-hand corner of the page. IntroductionFirst stepsExercise 3.2.1Run This code creates an empty plot. The Exercise 3.2.2How many rows are in There are 234 rows and 11 columns in the The
Exercise 3.2.3What does the Exercise 3.2.4Make a scatter plot of Exercise 3.2.5What happens if you make a scatter plot of The resulting scatterplot has only a few points. A scatter plot is not a useful display of these variables since both A simple scatter plot does not show how many observations there are for each ( Warning: The following code uses functions introduced in a later section. Come back to this
after reading section 7.5.2, which introduces methods for plotting two categorical variables. The first is The second is In the previous plot, there are many missing tiles. These missing tiles represent unobserved combinations of Aesthetic mappingsExercise 3.3.1What’s gone wrong with this code? Why are the points not blue? The argument The following code does produces the expected result. Exercise 3.3.2Which variables in The following list
contains the categorical variables in
The following list contains the continuous variables in
In the printed data frame, angled brackets at the top of each column provide type of each variable. Those with
For those lists, I considered any variable that was non-numeric was considered categorical and any variable that was numeric was considered
continuous. This largely corresponds to the heuristics However, this definition of continuous vs. categorical misses several important cases. Of the numeric variables, In this case the R data types largely encode the semantics of the variables; e.g. integer variables are stored as integers, categorical variables with
no order are stored as character vectors and so on. However, that is not always the case. Instead, the data could have stored the categorical Fundamentally, categorizing variables as “discrete”, “continuous”, “ordinal”, “nominal”, “categorical”, etc. is about specifying what operations can be performed on the variables. Discrete variables support counting and calculating the mode. Variables with an ordering support sorting and calculating quantiles. Variables that have an interval scale support addition and subtraction and operations such as taking the mean that rely on these primitives. In this way, the types of data or variables types is an information class system, something that is beyond the scope of R4DS but discussed in Advanced R. Exercise 3.3.3Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? The variable Instead of using discrete colors, the continuous variable uses a scale that varies from a light to dark blue color. When mapped to size, the sizes of the points vary continuously as a function of their size. When a continuous value is mapped to shape, it gives an error. Though we could split a continuous variable into discrete categories and use a shape aesthetic, this would conceptually not make sense. A numeric variable has an order, but shapes do not. It is clear that smaller points correspond to smaller values, or once the color scale is given, which colors correspond to larger or smaller values. But it is not clear whether a square is greater or less than a circle. Exercise 3.3.4What happens if you map the same variable to multiple aesthetics? In the above plot, Exercise 3.3.5What does the stroke aesthetic do? What shapes does it work with? (Hint: use Stroke changes the size of the border for shapes (21-25). These are filled shapes in which the color and size of the border can differ from that of the filled interior of the shape. For example Exercise 3.3.6What happens if you map an aesthetic to something other than a variable name, like Aesthetics
can also be mapped to expressions like This also explains why, in Exercise 3.3.1, the expression Common problemsFacetsExercise 3.5.1What happens if you facet on a continuous variable? Let’s see. The continuous variable is converted to a categorical variable, and the plot contains a facet for each distinct value. Exercise 3.5.2What do the empty cells in plot with The empty cells (facets) in this plot are combinations of Exercise 3.5.3What plots does the following code make? What does The symbol While, Exercise 3.5.4Take the first faceted plot in this section: What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset? In the following plot the Advantages of encoding Given human visual perception, the max number of colors to use when encoding unordered categorical (qualitative) data is nine, and in practice, often much less than that. Displaying observations from different categories on different scales makes it difficult to directly compare values of observations across categories. However, it can make it easier to compare the shape of the relationship between the x and y variables across categories. Disadvantages of encoding the The benefit of encoding a variable with facetting over encoding it with color increase in both the number of points and the number of categories. With a large number of points,
there is often overlap. It is difficult to handle overlapping points with different colors color. Jittering will still work with color. But jittering will only work well if there are few points and the classes do not overlap much, otherwise, the colors of areas will no longer be distinct, and it will be hard to pick out the patterns of different categories visually. Transparency ( As the number of categories increases, the difference between colors decreases, to the point that the color of categories will no longer be visually distinct. Exercise 3.5.5Read The arguments The Exercise 3.5.6When using There will be more space for columns if the plot is laid out horizontally (landscape). Geometric objectsExercise 3.6.1What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
Exercise 3.6.2Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. This code produces a scatter plot with Exercise 3.6.3What does The theme option Consider this example earlier in the chapter. In that plot, there is no legend. Removing the In the chapter, the legend is
suppressed because with three plots, adding a legend to only the last plot would make the sizes of plots different. Different sized plots would make it more difficult to see how arguments change the appearance of the plots. The purpose of those plots is to show the difference between no groups, using a Exercise 3.6.4What does the It adds standard error bands to the lines. By default Exercise 3.6.5Will these two graphs look different? Why/why not? No. Because both Exercise 3.6.6The following code will generate those plots. Statistical transformationsExercise 3.7.1What is the default geom associated with The “previous plot” referred to in the question is the following. The argumentsfun.ymin , fun.ymax , and fun.y have been deprecated and replaced with
fun.min , fun.max , and fun in ggplot2 v 3.3.0.The default geom for The resulting message says that Exercise 3.7.2What does The The default stat of Exercise 3.7.3Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common? The following tables lists the pairs of geoms and stats that are almost always used in concert. Complementary geoms and stats
These pairs of geoms and stats tend to have their names in common, such The following tables contain the geoms and stats in ggplot2 and their defaults as of version 3.3.0. Many geoms have
Exercise 3.7.4What variables does The function
The “Computed Variables” section of the
The parameters that control the behavior of
TODO: Plots with examples illustrating the uses of these arguments. Exercise 3.7.5In our proportion bar chart, we
need to set If The problem with these two plots is that the proportions are calculated within the groups. The following code will produce the intended stacked bar charts for the case with no With the Position adjustmentsExercise 3.8.1What is the problem with this plot? How could you improve it? There is
overplotting because there are multiple observations for each combination of I would improve the plot by using a jitter position adjustment to decrease overplotting. The relationship between Exercise 3.8.2What parameters to From the
The defaults values of However, we can change these parameters. Here are few a examples to understand how these parameters affect the amount of jittering. When When When When When Note that the The default values of Exercise 3.8.3Compare and contrast The geom However, the reduction in overlapping comes at the cost of slightly changing the The geom The Combining But as this example shows, unfortunately, there is no universal solution to overplotting. The costs and benefits of different approaches will depend on the structure of the data and the goal of the data scientist. Exercise 3.8.4What’s the default position adjustment for The default position for When we add position_identity() is used the boxplots overlap.Coordinate systemsExercise 3.9.1Turn a stacked bar chart into a
pie chart using A pie chart is a stacked bar chart with the addition of polar coordinates. Take this stacked bar chart with a single category. Now add The argument Exercise 3.9.2What does The The arguments to The Exercise 3.9.3What’s the difference between The See the coord_map() documentation for more information on these functions and some examples. Exercise 3.9.4What does the plot below tell you about the relationship between city and highway mpg? Why is The function If we didn’t include On average, humans are best able to perceive differences in angles relative to 45 degrees. See Cleveland
(1993b), Cleveland (1994),Cleveland
(1993a), Cleveland, McGill, and McGill (1988), Heer and Agrawala
(2006) for discussion on how the aspect ratio of a plot affects perception of the values it encodes, evidence that 45-degrees is generally the optimal aspect ratio, and methods to calculate the optimal aspect ratio of a plot. The function The layered grammar of graphicsReferencesCleveland, William S. 1993a. “A Model for Studying Display Methods of Statistical Graphics.” Journal of Computational and Graphical Statistics 2 (4). Taylor & Francis: 323–43. https://doi.org/10.1080/10618600.1993.10474616. Cleveland, William S. 1993b. Visualizing Information. Hobart Press. Cleveland, William S. 1994. The Elements of Graphing Data. Hobart Press. Cleveland, William S., Marylyn E. McGill, and Robert McGill. 1988. “The Shape Parameter of a Two-Variable Graph.” Journal of the American Statistical Association 83 (402). [American Statistical Association, Taylor & Francis, Ltd.]: 289–300. https://www.jstor.org/stable/2288843. Heer, Jeffrey, and Maneesh Agrawala. 2006. “Multi-Scale Banking to 45º.” Ieee Transactions on Visualization and Computer Graphics 12 (5, September/October). https://doi.org/10.1109/TVCG.2006.163. What are the benefits of using ggplot2?Advantages of ggplot2 over lattice graphics, or Base R graphics.. consistent underlying grammar of graphics (Wilkinson, 2005). plot specification at a high level of abstraction.. very flexible.. theme system for polishing plot appearance (more on this later). mature and complete graphics system.. many users, active mailing list.. Which of the following are benefits of using ggplot2 select all that?Solution. The benefits of using ggplot2 include easily adding layers to your plot, customizing the look and feel of your plot, combining data manipulation and visualization.
Which of the following tasks can you complete with ggplot2 features select all that apply?Solution. In ggplot2, you can create scatterplots and bar charts, change the colors and dimensions of your plot, and add a title and subtitle to your plot.
What is correct about ggplot2?ggplot2 is an open-source data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.
|