Skip to content

Include a function to find optimal break to color a continuous variable

Notifications You must be signed in to change notification settings

miaow27/R_Code_Ref-COLOR_ANALYSIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Color Analysis

For any EDA or visulization, you must encounter the following scenario: like this heatmap or like this heatmap with divergent color or any line graphs with different color

All these visuals are at its essense: represent a given measure with the intensity of a color.

In many tools available, most software take care of the mapping values to color for you. However, have you ever wonder what is the best way to create breaks for any continous variable? In other words, if the deafult setting is not ideal, and nothing interesting in the visual are showing, could it be due to the 'non-ideal' way of breaking breaks and how can you fix such problem?

In this Git Repo, I want to discuss several common way to create breaks for coloring on a given continous measure.

  • mean - sd: you can create breaks based on the mean, and 1/2 or more sd away from mean. While it works well for normally distributed measure, it is not ideal for any measure with extreme values or clustered values.

  • jenks: jenks is another popular way to create color breaks based on a clustering algorithem. However, you have to choose a k (number of groups), which might change the view.

  • kmeans: similar to jenks, we can also try to use k-means as a way to create breaks (groups) to color a continous variable.

  • quantile: quantile is a great way if you want to make sure each groups contains approx. similar points.

In summary, no way are perfect in all cases. You should try different methods with different groups and breaks to see the optimal choice for the given problem. Based on such motivation, I include an R script that will generate the different approach in one shot using the defined function color_analysis. Looks like the below example.

About

Include a function to find optimal break to color a continuous variable

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages