Visualizing Wikipedia Locations using Spark
- This git repo contains notebooks with all the source code to...
- ...download a wikipedia dump... load it into spark delta tables... use spark to extract location information... and generate visualizations like these...
- These links interactive visualizations should be more interseting than the static image below, though they are rather memory and GPU intensive.
- A scatter-plot on a globe - zoom with the scroll wheel; spin the globe with the mouse; our mouse over any point to see the title of the wikipedia page that mentioned that location; but it takes a lot of memory.
- A flat map - It groups wikipedia pages in Uber H3 Level 5 grids, so is somewhat less memory intensive than the previous link.
If the above links are too memory intensive for your browser, the result should look something like this: