Data-Mining the Zodiac

A timely visualization by Information is Beautiful‘s David McCandless and friends, Horoscoped applies word-cloud analysis to daily astrology forecasts. Programmer Thomas Winningham wrote a Python script to scrape 22,000 horoscopes archived by Yahoo’s astrology channel, Shine. To the resulting data dump, McCandless and crew applied the off-the-shelf world-cloud generator Tag Crowd, breaking down both the most-used words in the corpus and the words most frequently used in each sign. Then, with the help of designer Matt Hancock, MCandless made an elegant chart of the whole thing.

Given the peculiar mandate of the horoscope, perhaps we shouldn’t be surprised that the word clouds of the zodiac don’t vary much from sign to sign, with a general reliance on words like “ready,” “feel,” and “better.” Unique words for each sign from the top 50 most commonly-used words, however, do seem to hint at the traditional characteristics of each sign: Capricorn’s favorites are willing, instead; Virgo’s are totally, perfect). Best of all, McCandless built out of the highest-frequency words a brilliant horoscope engineered to apply perfectly well to any sign for any day of the year. It begins as follows: “Ready? Sure? Whatever the situation or secret moment, enjoy everything a lot.” I won’t give away the whole thing, though; do click through.

What all this says about astrology is hard to discern. The clusters of each sign’s uniquely-frequent words in McCandless’ analysis do have a certain uncanny charisma. It’s the charisma of apophenia, of course�the attractive illusion of pattern in essentially random clustering things of all kinds�and astrology as a whole is a confabulation built on such powerfully suggestive phenomena. Given the recent brouhaha over the supposedly mistaken structure of the zodiac system in Western astrology (an urban legend that crops up every few years), McCandless’ “Horoscoped” serves as a reminder of the brew of pseudoscience and willful suspension of disbelief that the institution of the daily horoscope�an invention of the modern daily newspaper�depends on.

Of course, we knew that already; as Theodor Adorno observed in the early fifties, a “climate of semi-erudition is the fertile breeding ground for astrology.” But we should be cautious with our disregard; semi-erudition is rife even on the highbrow end of Internet culture. Info visualization is one area in which we risk falling prey to semi-erudition; correlation can look very beautiful. It’s worth pointing out that, unlike the mostly-anonymous authors of daily horoscopes, McCandless does a very thorough job of exposing his data and his inventive, inspired methods to scrutiny.

About Mohit


  1. Wow. Aquarius: special deal. Keep making love . . .

  2. Not the I buy into astrology, but…it strikes me that, if you’re going to read anything into the repetition of certain words among signs, you’d have to track it over time to reach any real conclusion. That is, the word “better” might occur with the same frequency in the horoscopes of Gemini’s and Cancer’s, but perhaps (as astrology would be apt to say) Gemini’s are supposed to have a “better” time in Spring and Cancer’s in the Fall, under a different moon or some such.

    As to the unique word lists…I’m not so sure those are apophenia, either. For as much as astrologers might throw darts at a dartboard most days, astrology is a codified system with thousands of years of analysis to look back on. Those thousands of years have developed stereotypes for each sign, arbitrary to be sure, but consistent. For instance: Libras are supposed, traditionally, to be introverted and indecisive. The unique word list “learn, stars, almost” for Libras seems less like a pattern we’re reading into the data and more like a traditional that astrologers simply carry out when making their “predictions.”

  3. It’s still pretty ambiguous to me.

Leave a Reply

Your email address will not be published. Required fields are marked *