St. Petersburg Tags

CODE: GitHub
DATATYPE: tag score histograms
SOURCE: Instagram, Google Cloud Vision
PLACE: St. Petersburg
YEARS: 2014-2015
SIZE: 697 tags
VIZTYPE: one-on-two
FEATURES: tag scores
FEAT SRC: Google Cloud Vision
GRIDDING: nearest open
ANNOTATIONS: tag labels, avg rank

This is a one-on-two visualization, where a single data variable is mapped to two spatial axes. Here, the variable is "average rank" of machine tags from Google's Cloud Vision API. When you make a label request for an image, Google returns a list of tags, each with a confidence score. The higher the score, the surer Google's classifiers are that the tag applies to the image. The "rank" of a tag in this list is its position in a confidence ordering. The highest confidence tag is rank 0, so lower ranks are "better". A tag's average rank is simply the mean of its ranks across all photos. Intuitively, the better a tag's average rank, the more semantically general it should be. That is, it ought to be easier to identify "car" than "Lamborghini" (although this is not always the case, and Google's engine does not produce "car" every time it produces a specific car, so it is "ignorant" to analytic relationships). The plot here uses as its plot elements something I call a "histogram line": a line that connects the tops of histogram columns, to simplify its visual representation. These are fairly simple 5-bin histograms, each bin representing a range of possible tag scores (they range from 0.5 to 1). Histogram lines that turn upward on the right side have disproportionately high amounts of high scores, and we should expect these tags to have high average rank. Although note that this relationship is not guaranteed: you can be the highest score for a given image without being a high score overall, and you can be a high overall score and still be outstripped by slightly higher scores, and so have low rank. The histogram lines are annotated with the tag label (bottom) and average rank (top).