Manipulações básicas em dados de geolocalização utilizando Python

Mãos à obra

pip install pandas
import pandas as pddf = pd.read_csv('olist_geolocation_dataset.csv')
df.head()
df.info()
df.geolocation_state.value_counts()
df_agg = df.groupby(['geolocation_lat', 'geolocation_lng']).agg(
uf=('geolocation_state', 'min'),
n_pontos=('geolocation_lat', 'count')
).reset_index()
df_agg.head()
df_agg.shape

Garantindo somente dados do estado de Alagoas

pip install shapely
import json
from shapely.geometry import Point, Polygon
with open('AL.json') as json_file:
data_geo = json.load(json_file)
df_borders = pd.DataFrame(data_geo['borders'][0])df_borders.head()
poligono = Polygon(zip(list(df_borders.lng), list(df_borders.lat)))poligono
def within_polygon(lng, lat, polygon):
point = Point(float(lng), float(lat))
return point.within(polygon)
df_agg['localizado_no_poligono'] = df_agg.apply(lambda x: within_polygon(
x.geolocation_lng, x.geolocation_lat, poligono), axis=1)
df_agg.head()
df_agg.localizado_no_poligono.value_counts()
pd.crosstab(df_agg.uf, df_agg.localizado_no_poligono).reset_index()
df_al = df_agg[df_agg.localizado_no_poligono]df_al.shape

Plotando os pontos no mapa de Alagoas

pip install seabornpip install geopandas
import geopandas as gpd
import os
import seaborn as sns
import matplotlib.pyplot as plt
shape_path = os.path.join('al_municipios', 'AL_Municipios_2019.shp')
shape_al = gpd.read_file(shape_path)
fig, ax = plt.subplots(figsize=(15,8))
shape_al.plot(ax=ax, color='lightgray')
fig, ax = plt.subplots(figsize=(15,8))
shape_al.plot(ax=ax, color='lightgray')
sns.scatterplot(data=df_al,
x='geolocation_lng',
y='geolocation_lat',
size='n_pontos',
sizes=(50, 500),
alpha=0.3)
Fonte: Wikipedia
pip install haversine
from haversine import haversine, Unit, haversine_vectormaceio = (-9.647449, -35.709190)
itamaraca = (-9.345859, -35.865804)
haversine(maceio, itamaraca)
haversine(maceio, itamaraca, unit=Unit.METERS)
haversine(maceio, itamaraca, unit=Unit.MILES)
haversine(maceio, itamaraca, Unit.NAUTICAL_MILES)
df_al_10 = df_al.head(10).reset_index(drop=True)df_al_10
dij = df_al_10[['geolocation_lat', 'geolocation_lng']]
dij = [tuple(x) for x in dij.to_numpy()]
dij = haversine_vector(dij, dij, Unit.KILOMETERS, comb=True)
pd.DataFrame(dij).head(10)

Conclusão

--

--

--

Cientista de Dados — https://acsjunior.com

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
António C. da Silva Júnior

António C. da Silva Júnior

Cientista de Dados — https://acsjunior.com

More from Medium

Where is this IP address located?

Compiling video clips using python.

Well Euled Machines

Build a photo book using PIL on Python