Top 20 Stack Overflow Tags

· by tuxdna · Read in about 1 min · (156 Words)

Finding Top 20 Tags on StackOverflow main site

Download StackOverflow Tags data from following site:

wget -c "https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z"
7za e stackoverflow.com-Tags.7z

Now convert the file Tags.xml into a dataframe in Python:

import json
import pandas as pd
import numpy as np
import xmltodict
import matplotlib.pyplot as plt
f = open("Tags.xml")
all_data = f.read()
o = xmltodict.parse(all_data)
df = pd.DataFrame.from_dict(o['tags']['row'])
df[['counts']] = df[['@Count']].astype(int)

df2 = df.sort_values(by=['counts'], ascending=False).head(20)[['counts', '@TagName']]


"""
 counts       @TagName
 1067078     javascript
 1025688           java
  918586             c#
  885422            php
  800779        android
  712360         jquery
  542985         python
  511091           html
  431790            c++
  414394            ios
  380535          mysql
  372444            css
  319001            sql
  282582        asp.net
  253474    objective-c
  235532  ruby-on-rails
  227675           .net
  210953         iphone
  210835              c
  170559         arrays
"""

counts = df2['counts'].as_matrix()
x = np.array(range(len(counts)))
labels = df2['@TagName'].values

fig = plt.figure()
fig.set_size_inches(15, 10.5)
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
ax.set_xlabel("Tags")
ax.set_ylabel("Counts")
ax.bar(x, counts, align='center')
ax.set_xticks(x)
ax.set_xticklabels(labels)
fig.show()
fig.savefig('plot.png', format='png')

Tags Plot image is availabe here:

Code available here:

That’s all folks!