Overcoming Imposter Syndrome at Conference

It’s that time again, conference time, in another bright and bustling city with incredible analysts and vizzers mulling about, buzzing around. There are zen masters, ambassadors, user group leaders, and so much more all bursting with skill and talent. Your heart is racing with them, this year to the beat of a big brass band. Until suddenly, your heart has a strangely familiar dip in it. “These people are AMAZING. So why am I here?”

This is the drooling sting of imposter syndrome here to douse your flames of excitement and energy. I am personally no stranger to its presumptive seat at the dinner table and just like clockwork, I find myself sitting across from it again at TC18. Many people come face to face with imposter syndrome at some point, especially folks who have traditionally been excluded from their own seat at the table such as femmes, people of colour, and people with disabilities. Even Adam Savage, last year’s myth busting guest speaker, shared that no matter how much recognition and celebrity status he achieves, that feeling of doubt never goes away.

I am noticing that what’s different this year though, is my brain seems to be armed with its own arsenal of tools to confront this unwelcome guest. I thought I would share them in the hopes that even though imposter syndrome will most likely linger for many people, we just might be able to keep it in the back seat of the auditorium.

1. Remember the turtle

When I was growing up, I remember the parable of the tortoise and the hare, the story of how a slow-moving tortoise manages to outwit its speedy opponent in a race by keeping its cool and moving at its own pace – “slow and steady wins the race.” When it comes to learning any new skill, I have found that keeping this story in mind can help soothe thoughts that I am not progressing fast enough, especially if I start comparing myself to people I have seen over the years at Tableau Conference.

Everyone moves at the their own pace and there is absolutely nothing wrong with this. In fact, its quite possible that a slower pace allows you to pick up nuances that other people might be able to nail by running through a syllabus at the speed of light, but then might forget in a few weeks because they haven’t taken the time to properly digest. For example, I personally still struggle with Table Calculations, and it’s been 3 years since I’ve been a Tableau user! However, even people I have met at conference who seem to be super confident with table calculations can struggle with the simplest of Workout Wednesday challenges. Which brings me to my next tip:

2. No matter what your level, try beginner sessions

There are some amazing Jedi-level sessions at Tableau Conference. And as much as you might want to learn the Jedi-est, most complicated Tableau tricks, sometimes it can be refreshing to take a beginner session and learn a new perspective on something you thought you were pretty good at. Those tricky table calcs? Maybe someone has managed to come up with a funky metaphorical explanation that just clicks for you and you FINALLY get the hang of things. It can be a huge ego boost to realize that you can grasp the trickiest of concepts, and there is absolutely no shame in going to a beginner session to gain that perspective. On top of that, there are often beginner sessions that have nothing to do with technical skill, but instead focus on creativity and inspiration that can really add some fuel to your fire.

3. Don’t even attempt to do everything

Those amazing creativity-fueling sessions are EVERYWHERE during conference. Don’t even try to attend all of them because you will BURN YOURSELF OUT. You might even feel like even more of a failure because “what? I can’t even keep up with a schedule that’s right in front of me!” That schedule is a trap, many of the sessions fill up 15 minutes before they even start and you’ve jogged halfway across the convention centre only to be told there’s no more room available. Definitely have a few choices available, but don’t make all your selections with the assumption that you will be able to go to all of the sessions. Instead, use them as Plan B options with different locations so that if you find yourself stuck in front of an “at capacity” sign, you can go to a different session you’re interested in that might even be physically closer to you. Worst case scenario, you can take a break and browse the awesome booths at the Data Village. Or maybe even try out the new Braindates feature in the app! A break can give you the space to process what you picked up in the last session and possibly even share with others who didn’t have a chance to go to that at-capacity session.

4. Celebrate the accomplishments of others

As much as it can feel like a hit to the ego when you see someone who was a complete beginner last year now outshining everyone around them, remember what most likely went into that process of growth. That person has probably worked really hard to get to where they are now. Most of the time they have done it with a sense of wonder and dazzle that you yourself might have had the first time you used Tableau and found golden nuggets of insight that you had never conceived of before. Don’t let the green-eyed monster take hold of you and push you away from these people. Try to talk to them and ask them about their journey, you’d be surprised how infectious their excitement can be and how willing they are to share a bit of their light with you. Celebrate their accomplishments with them, and I am sure that in this community, they will be more than happy to celebrate your own successes when you are ready to achieve them.

Advertisements

Netflix and Chill: How I Made My 2017 Iron Viz Feeder Entry

I’ll be honest, finding inspiration for this challenge was not easy. Of course, I, like most of us, have my favourite movies and TV shows that I could binge watch for hours. But the challenge lies in telling a data story, not the story of the film or TV show. Those stories have been told brilliantly well by their award-winning directors, actors, and actresses. I needed to find something that wouldn’t tempt me to repeat those ideas, something with a data story that would stand on its own.

So I procrastinated. Appropriately. By binge watching on Netflix. I haven’t always been a Netflix fan, I used to be a quite dedicated cord-watcher. Mostly (shamefully) because I have a strange obsession with awful reality TV shows and Dr. Phil, which you just can’t get on Netflix! I started to think about my own personal history with Netflix, and then got curious about the history of Netflix itself. I started where any good researcher starts: Google.

The first thing I found out was that Netflix started over a dispute over a $40 late fee that Reed Hastings (the founder of Netflix) had been hit with when he borrowed Apollo 13 from Blockbuster. That got me thinking about the history of Blockbuster as well. I still clearly remember the skeleton of a Blockbuster store that sat eerily empty on my bus route to university when I was living in Vancouver. What happened? Who gutted you my friend?

My research eventually led me to this:
netflix-vs-blockbuster-revenues

I think this one image is the perfect example of a data story that hits hard. The sweeping nosedive into bankruptcy, and the David that came out on top of Goliath. Ouch. Painful, but there was my inspiration for this Iron Viz feeder.

Data Sources

Finding information and datasources for Netflix was no problem. I found datasets for the movies and TV shows featured, information about their subscribers, and even the original dataset that was used in their famous competition where they challenged data scientists to beat the accuracy of their recommendation system. Blockbuster on the other hand, was much more challenging.

The issue with Blockbuster is that it went bankrupt around 2010, when data analytics was just starting to enter mainstream consciousness. Prior to that, at their prime, no one had bothered to collect and collate data about the company, or if they had they certainly did not put it out on Kaggle or Github, both of which were born just as Blockbuster died out. So my only source was Blockbuster’s financial statements.

Now, even the keenest accountant will probably tell you that for an average Jane like me, reading financial statements is not the most riveting piece of literature. But I started all the way back to 1999 and powered on through. And I started to see a narrative come to light.

What I found was that Blockbuster’s late fees were a huge source of strife for the company, even in it’s early days. It was involved in several lawsuits related to its late fees and despite customer dissatisfaction, they weren’t willing to let go. This was understandable given that at one time they raked in almost $800 million from late fees alone! It wasn’t until Netflix and other competitors entered the scene that they started to rethink things and introduced their “no late fees” pitch. Possibly (probably) too little too late.

Design

In terms of design and analytics, I tried to keep everything as minimal and simple as possible. This is financial statement data, but I don’t believe it needs to be cached in accountant-speak to be effective. The most “complex” chart I used is probably the jump plot, but I felt like it gave another perspective of Blockbuster’s decline/Netflix’s rise beyond the trend line. (Note: Thank you to Chris deMartini for outlining how to build this chart, and Robin Kennedy for helping me figure it all out!)

The only colours I used in the viz were a pale yellow-grey for the background (I dislike white, it’s quite jarring on computer screens), charcoal grey (I dislike black for the same reason), and red and blue (company branding colours) to represent Netflix and Blockbuster respectively. I tried to eliminate colour legends as much as possible and wherever I mentioned the names of the companies, I used the red and blue colours to indicate that these are the companies that my charts referenced.

I was mindful of users’ interactions with my trend lines, so I included dots overlaid on the lines to make it easier for users to know where to point their cursor for information from the tooltip. I also used a calculation to switch between displaying information in millions or billions beyond a certain threshold so that users would always be able to see Netflix’s KPIs, even before they made their first $ billion. I included an annotation in my bubble charts because I knew it would be challenging to find the little pixel that proportionally represented $40 compared to Blockbuster’s hourly revenue. I tried to make everything easy, simple, and smooth.

And yes, I did use a pie chart, although it’s technically more of a DVD chart, but I felt like I’d throw in a bit of artistic liberty in the mix. I also felt that because I was only showing two proportions of the whole (DVD), it was an appropriate use of a pie/DVD and really emphasised how much Blockbuster’s late fees contributed to their revenue.

Putting the R in AlteRyx: A Personal Challenge

There are a lot of nuanced differences between Data Analytics and Data Science that can be difficult to pinpoint. In general, analytics tends to explore patterns in the now to find actionable insight, while science tends to explore patterns in the now to make predictions for actionable insight. I’ve loved analytics, but I’ve been curious to see how machine learning and predictive analytics can enhance my data explorations. To that end, I recently completed Springboard’s Data Science curriculum that provides an introduction to data science, mostly using the R programming language.

My first introduction to data analytics tools was Alteryx, and in my experience, it can be challenging to switch to a programming language like R to conduct analysis. Alteryx is intuitive, there is no programming involved, and a lot of the most common manipulations like transposing, selecting fields, and joining data can be done with just a couple of clicks. However, the benefit of using R is that there are several packages pre-built that allow you to do some pretty advanced predictive analytics. Oh if only there was a way to combine the two!

Enter Alteryx and R integration, circa 2013.

Since version 8.5, Alteryx has provided several tools that are based on the R data exploration language. This allows users to explore data with the advanced predictive analytics packages in R, while still incorporating the intuitive and visual workflows that make analytics easier and more efficient in Alteryx.

So I have decided to take on a personal challenge. I’ve decided to replicate different R predictive exercises in Alteryx, not only to gain a stronger understanding of the logic behind these analyses, but also to demonstrate how they can be performed much more efficiently in Alteryx. I’ve spent a lot of time using Alteryx for data preparation and clean up, but I feel it’s strength lies in its forward-facing capacities. Over the next few weeks, I will showcase a workflow that was originally created using R coding and demonstrate how I managed to translate this with the tools provided by Alteryx. So stay tuned and watch this space!

Challenge 1: CART Models and Predicting Supreme Court Decisions

Putting the R in AlteRyx: CART Models

Challenge 1: CART Models and Predicting Supreme Court Decisions

This post is part of a series I am writing to translate R scripts I have seen or written into Alteryx workflows. Original post can be found here.

This script comes from MIT’s Data Analytics course (which you can sign up for here). In the section where they introduce Classification and Regression Trees (CART), they use data showcasing decisions made by US supreme court judges and try to predict if a supreme court judge will reverse or uphold the decision of the lower court. Between 1991 and 2001, the same nine judges served on the supreme court, the longest time in US history. This provides us with a data set with more information that can be examined than if this analysis was done during a different time period. More specifically, this CART model looks at the decisions of supreme court judge John Paul Stevens, and whether or not several factors can predict his decision to affirm or reverse the decision of the lower court. These factors include the subject of the case, whether the lower court was more liberal or conservative, and the type of petitioner involved in the case.

Although this analysis could be performed with logistic regression, the outcome would not be as easily interpretable. When we create a decision tree in R, if a variable has an effect on the outcome (in this case, supreme court Judge Stevens reversing the decision of the lower court), we can easily see where it sits in terms of its affect on the outcome and its relationship to other significant variables:

Rplot

Decision tree plot produced by R.

Creating a Decision Tree model in Alteryx uses just 2 tools:

  1. The Create Samples tool to split our data into a train set we will build our model off of, and a test set to validate our model
  2. The Decision Tree tool to actually build our model2017-07-16_16-30-20

And with that, Alteryx spits out a summary report to show the details of how the model was run, and a visual report that includes the Decision Tree, the significance of each variable in determining the outcome, and a confusion matrix to summarise the accuracy:

2017-07-16_16-32-502017-07-16_16-33-152017-07-16_16-33-50

Although the Model Comparison tool is not included in the default Alteryx package, it can be found in the Alteryx gallery.

2017-07-16_16-38-17

We can use this tool to evaluate the accuracy of our model when compared to a simple baseline that predicts the most frequent outcome in our test set.

Before looking at the report generated by this tool, we can check our simple baseline accuracy by using a summarise tool to get a count of 0 and 1 responses. We then use another summarise tool to get both the most frequent outcome, and the total number of rows in our data set. Lastly, we can calculate the accuracy using a formula tool and the calculation: “[Max_Count]/[Sum_Count].” This gives us an accuracy of 54.7%

2017-07-16_16-39-45

If we look at the report generated by the model comparison tool, we can see that our accuracy is about 67%, an improvement on our baseline. The report also indicates that the AUC is 73%, which tells us our model is good at differentiating between a reversal and an affirmation decision from Justice Stevens.

2017-07-16_16-40-57

Summary of accuracy and AUC from report generated by Model Comparison tool

The benefit of using tools in Alteryx is that the code has already been written for us, but we still have the ability to  change the default parameters in the Decision Tree tool such as the complexity parameter and the independent variables. The Model Comparison tool can also be used to quickly compare the accuracy of several models generated in Alteryx, such as logistic regression and random forest. With just a handful of tools, we are able to create an interpretable model that predicts the decisions of a supreme court judge with an accuracy well-above baseline. In addition, the report generated by the model comparison tool provides assessment plots  like an ROC curve that can help in deciding what thresholds to use when building our models.

Things to Watch When Replacing Data Sources

When creating workbooks that will have future iterations (i.e. not one-time, static infographics), there may come a time when you have to either refresh the data in your dashboard or replace is with another data source.

In the ideal scenario, especially if you have your workbook on Tableau Server, your workbook would be connected to a live data source and you would just update your data source (without changing the name of the data source or field names) and your workbook would update automatically. No problems.

However, sometimes you will have to replace the original data source with a new one. If for whatever reason you cannot update or refresh a live data source connected to your workbook, there are some things you need to bear in mind.

The usual process to replace a data source is as follows: open your workbook, click the add data source icon, add the new data source, and then replace your original data source:

    1. Add new data source
      02-05-2017 15-41-32
    2. Right click on original data source, select “Replace data source”
      02-05-2017 15-48-54

 

  1. Replace with new data source
    02-05-2017 15-49-09

If the new data source has EXACTLY the same field names, you should generally be fine. However, if anything has changed, even if it’s just removing a hyphen or changing a field so that it’s capitalised, you will break a few things.

For example, let’s say you build a dashboard with an initial data source (in my case, Sample Superstore). Then you decide that you need to replace the data source and for whatever reason (maybe a different person pulled the data this time, maybe the fields were renamed as part of a new policy, maybe you wore the wrong kind of socks that morning, whatever the case may be),  some of the fields were renamed. For this example, I’ve renamed Category as category and Subcategory as subcat.

The first thing you will notice when you replace the data source is that your fields that were renamed now have a red ! exclamation mark next to them. This is because Tableau thinks the fields are no longer in the data. To fix this, you just right click on the field, select replace references, and point it to the new renamed field:

02-05-2017 16-14-03

02-05-2017 16-16-34

This is where the break happens:

02-05-2017 16-06-47

This is my dashboard before I replace my data source

02-05-2017 16-19-33

This is my dashboard after I replace my data source

What has changed? Well there are a couple of things:

  1. Colour: The most obvious change is the colour that I had initially used for my different categories. When you replace your data source with new field names, Tableau will revert to its default colour scheme
  2. Legend arrangement: In addition to the colour change, Tableau has also rearranged my legend so it is no longer a single row
  3. Default sort: My sales by subcategory initially had a default sort that put technology at the top. Tableau has reverted to an alphabetical sort
  4. Aliases: If you look at the Segment Profitability bars, you’ll notice that the bar that was initially called “Self-Employed” has reverted to its original non-alias, “Home Office”
  5. Although it didn’t change in this instance, I have seen “Grand Total” fields disappear. In my own experience, I’ve typically seen it happen with Grand Total columns that sum up your rows, but be mindful of this as well if you have Grand Total rows that some up your columns

When you replace your data sources, make sure you pay attention to the potential changes outlined above. Some other areas to pay attention to are sets, the format of quick filters on your dashboard, and groups. In light of all these loose ends, it’s best to avoid having to replace data sources entirely and just connect your workbook to a live data source that is updated via Tableau Server. It will save a lot of time on maintaining and updating your dashboards.

Later days

Amazing Apps for #Data16

Data 16 is coming up fast and as many of us in the UK get ready for the 11 hour journey across the Atlantic, I’ve been checking out tons of apps to keep me prepared and entertained. I love apps, if I didn’t put them in neat little folders on my phone I would have home screens in the double digits. If you haven’t already, make sure to download the official data16 app to see all the available sessions, register for hands on workshops, and get live updates on what’s going on in Austin. In addition to the official app, these are 4 apps I’ve found that are in my tool kit for the great data saga of 2016.

1) Pack Point

Screen Shot 2016-11-03 at 11.16.20 pm.png

Worried about packing too little? Too much? Too warm? Too cold? This is a great little app that generates a packing list for you based on the weather where you’re going and how long you’re staying. You can also choose from a list of activities so your list is fully customised. Now you can use all that extra suitcase space for more #data16 souvenirs!

2) Jetlag Rooster

screen-shot-2016-11-03-at-11-18-20-pm

I need my sleep. I am an absolute crank if I don’t get enough hours of sleep and I love my nap time. Jetlag is a terrible affliction for me. Enter Jetlag Rooster, an app that will create an optimised sleep schedule for you to minimise the effects of jet lag. You can choose if you want to work on adjusting a few days before you leave, or when you arrive in your destination city. Use the website linked above, or download the app on iOS or the Google Play store.

3) Google Maps

It’s a staple app on many phones, but what makes it useful for traveling is that you can download maps for later offline use. Mark all the places you need to keep track of in Austin like your hotel/staying place, the Austin Convention centre, the nearest pub with the best local beer (especially if you’re coach Kriebel :)), etc. Data roaming is not cheap and if you need to figure out where you are without the convenience of GPS, the offline maps will be a life saver.

4) #Data16 Dashboards

Okay, I lied, this post isn’t all about apps. Some amazing and very useful dashboards have been developed by folks in the Tableau community that are just as resourceful and accessible as an app:

Screen Shot 2016-11-03 at 11.20.58 pm.png

See when people are arriving in town, where they’re staying, and where the fellow newbs are to huddle in a corner with (no, don’t do this, huddle with everyone everywhere please!). Fill out the google sheet here to add your data to the viz!

Screen Shot 2016-11-03 at 11.22.54 pm.png

If you’re able to get a data or wifi connection, I highly recommend checking this viz out rather than Google maps. It’s designed to provide you with a map and walking estimate to get from session to session during the conference. It might also help you narrow down your choices among hundreds of amazing presentations; kickassness ratio held equal, why not attend the presentation that’s just a hop skip and a jump away?

screen-shot-2016-11-03-at-11-19-56-pm

You know those awful flights you get where the person you’re sitting next to just wants to gab away, spills their drink in your lap, and brings smoked salmon as a mid-flight snack? Yeah that’s me. BUT I like to think if I’m sharing a seat with a fellow TC attendee, at least the gabbing about data geekery won’t be so bad? Chapman’s dashboard shows all the flights UK folks are taking on their trek across the Atlantic – see if you’ve got some other data geeks on board!

 

Dealing with Cognitive Quirks

I have a confession to make. I suffer from a debilitating condition that affects hundreds, if not thousands, of data analysts on a daily basis. It is a serious condition and it leads to many an unpublished viz and countless hours of unnecessary calculations. We call this condition… Analysis paralysis.

Over the last few months, I’ve been feeling stagnant in my creativity. The problem is, I get a little too excited when I see data. My brain automatically conjures up a thousand ways I could investigate, analyse, and extrapolate. But there aren’t enough hours in the day and more importantly, not every one of these ideas deserves investigation. We know from cognitive psych research that too many choices can lead to a lack of action (see Barry Schwartz – The Paradox of Choice). And this seems to have become my theme song lately – too many choices, not enough vizzing.

In addition, I crave my gold stars: If I create something, I want it to be perfect and I want my gold star of praise and bright shining acknowledgement. But when you’re already dealing with analysis paralysis, throwing a gold star fixation on top is a recipe for complete brain freeze, and not the fun kind that comes with chocolate chip ice cream.

I’ve dealt with this in a few ways, and if you’re dealing with one or both of these awful cognitive grips, maybe this will help you break loose:

1. Know Thyself

Everything I’ve written above has come from a lot of introspection and monitoring, as well as non-judgment. I’m not proud of my flaws, but it is foolish to pretend they aren’t there. The first step is to watch yourself, know what makes you tick and know what makes you stop in your tracks. My problem is too much inspiration, maybe yours is the opposite? Or maybe you don’t care for other peoples’ opinions, but no one likes what you’re producing and you need to learn some foundational skills? I’m not saying bend to others’ will, but being aware of what is stopping you from growing as an analyst is necessary to dealing with the problem.

2. Challenge yourself (based on what you know)

The second part of this is critical, and it’s why I emphasize self-awareness so much: Others’ challenges might not be yours. What I mean by this is that challenges are only helpful if they help you grow, not if they completely burn you out. You have to push your muscle to stretch it, but you don’t want to throw yourself out of commission. For example, I normally work on vizzes for HOURS. My challenge is not to spend more time working on vizzes, but to spend less. So I limit myself to 1 hour of work on a viz. It gives me enough time to get something satisfactory done while still pushing me to break a sweat. For some people, the challenge might be 2 hours, or even 10 minutes. You know you, set appropriate challenges, make sure they’re challenging, but don’t overdo it by using someone else’s metric.

3. Expect the expected

At the end of an hour of work, I am rarely 100% satisfied. My inner perfectionist is a whiny nagging worm and as they say, you are your biggest critic. Expect criticism, not only from yourself but from people reading your work as well. You cannot satisfy everyone, sometimes not even yourself. The way I deal with it? Just put it out there. No excuses. Don’t fear criticism, it can shape your growth in ways you couldn’t have come to own your own.

An example:

This weeks Makeover Monday (and any Makeover Monday really) provided a great opportunity to put these goals into action. Part of the challenge this week was to create a visualization based off of two numbers (and only two numbers, see link here and try the challenge yourself!). When I sat down to flex my data brain, I set my clock for an hour and just let things unfold. I ended up creating a visualization to give some context to an inconceivable number for US debt – $19.5 TRILLION. I added some measurements for things we typically think are astronomical in cost, but that are only a fraction of the current US debt. I had data, I dug deeper, I found a story that I thought deserved telling and it was told because I challenged myself to tell it. Unfortunately, I missed the mark on the original challenge and received some well-deserved criticism. So what now? Well, the benefit of my system is that because I committed to my own challenges based on my own needs (and succeeded!) I still get my gold stars. But gold stars on their own are meaningless unless they help me push my muscle further. So even though I failed the challenge this week, I have a new challenge set for myself next week! As cheesy as it is, remember:

fail-first-attempt-in-learning-2

And stay tuned for my next successful gold star 🙂