Gazetiki

Description

Gazetiki is a geographical database which was constituted at CEA LIST (http://www-list.cea.fr/gb/index_gb.htm) by using information from Geonames (geonames.org) and from different Web sources. This work was supported via the French ANR Georama project (ANR-08-CORD-009). Gazetiki_v1 contains 8323702 geographical names, with coming from Geonames and another 1043807 extracted from Wikipedia, DBPedia and Flickr. For Geonames elements, the main modification is the addition of a popularity score which was calculated based on the usage of a place name in a geotagged dataset. For the other elements of the dataset, the name(s) of the place, its geographical category, its coordinates and its popularity and were extracted using Web data. Data about encompassing entities and timezone were added using those of the Geonames nearest neighbor. The automatically discovered place names have the same format as Geonames elements. Given that the accuracy of the extraction is different for the different data sources used (Geonames elements are more accurate than DBPedia geotagged elements, which are more accurate than other Wikipedia elements and Flickr elements), information about the data source it was extracted from is provided for each element of the dataset. When merging the data sources, we tried to remove as many multiple elements as possible and priority was given to Geonames elements, followed by DBPedia elements and by other data sources.

License

This work is licensed under a Creative Commons Attribution 3.0 License, the data is provided "as is" without any warranty or any representation of accuracy, timeliness or completness

More info reading the following README

Download