Finding Top 20 Tags on StackOverflow main site
Download StackOverflow Tags data from following site:
wget -c "https://archive.org/download/stackexchange/stackoverflow.com-Tags.7z"
7za e stackoverflow.com-Tags.7z
Now convert the file Tags.xml
into a dataframe in Python:
import json
import pandas as pd
import numpy as np
import xmltodict
import matplotlib.pyplot as plt
f = open("Tags.xml")
all_data = f.read()
o = xmltodict.parse(all_data)
df = pd.DataFrame.from_dict(o['tags']['row'])
df[['counts']] = df[['@Count']].astype(int)
df2 = df.sort_values(by=['counts'], ascending=False).head(20)[['counts', '@TagName']]
"""
counts @TagName
1067078 javascript
1025688 java
918586 c#
885422 php
800779 android
712360 jquery
542985 python
511091 html
431790 c++
414394 ios
380535 mysql
372444 css
319001 sql
282582 asp.net
253474 objective-c
235532 ruby-on-rails
227675 .net
210953 iphone
210835 c
170559 arrays
"""
counts = df2['counts'].as_matrix()
x = np.array(range(len(counts)))
labels = df2['@TagName'].values
fig = plt.figure()
fig.set_size_inches(15, 10.5)
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
ax.set_xlabel("Tags")
ax.set_ylabel("Counts")
ax.bar(x, counts, align='center')
ax.set_xticks(x)
ax.set_xticklabels(labels)
fig.show()
fig.savefig('plot.png', format='png')
Tags Plot image is availabe here:
Code available here:
That’s all folks!