{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://raw.githubusercontent.com/rafneta/CienciaDatosPythonCIDE/master/imagenes/banner.png)\n", "\n", "```{contents}\n", ":depth: 4\n", "```\n", "\n", "# Pandas\n", "\n", "## ¿A que nos sirve?\n", "\n", "Con Pandas podemos manejar nuestros datos, **modificandolos**, **transformandolos** y **analizandolos**.\n", "\n", "En particular podemos:\n", "* limpiar los datos presentes en un DataFrame\n", "* Visualizar los datos con la ayuda de Matplotlib\n", "* Almacenar los datos trasformados en otro archivo CSV\n", "\n", "Panda está construido con base en Numpy y normalmente su output preferido es **Scipy** o **Scikit-learn**. \n", "\n", "Antes que todo tendremos que importar los paquetes:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Las dos componentes principales de Pandas son:\n", "* Series: es un vector vertical\n", "* DataFrame: es una Tabla (una matriz)\n", "\n", "Las dos estructuras comparten muchas operaciones, que pueden ser hechas en ambos (como calcular la media).\n", "\n", "## Crear un diccionario\n", "\n", "Tenemos diferentes maneras de construir un diccionario:\n", "* Por medio de un diccionario" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InterJuventus
033
132
207
30106
\n", "
" ], "text/plain": [ " Inter Juventus\n", "0 3 3\n", "1 3 2\n", "2 0 7\n", "3 0 106" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " 'Inter': [3, 3, 0, 0], \n", " 'Juventus': [3, 2, 7, 106]\n", "}\n", "\n", "equipos = pd.DataFrame(data)\n", "\n", "equipos" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide-output" ] }, "source": [ "Cuando creamos un DataFrame de default nos dará un indice numerico, através del cual poemos acceder a las observaciones:" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "scrolled": true, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "Inter 3\n", "Juventus 3\n", "Name: 0, dtype: int64" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "equipos.loc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pero resultaria mucho mas facil por medio de un indice \"autoexplicativo\", para asignarlo es suficiente especificarlo:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InterJuventus
Europa League33
Champions32
Ligas Robadas07
Arbitros comprados0106
\n", "
" ], "text/plain": [ " Inter Juventus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7\n", "Arbitros comprados 0 106" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams = pd.DataFrame(data, index=['Europa League', 'Champions', 'Ligas Robadas', 'Arbitros comprados'])\n", "teams" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos crearlo también a partir de listas:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0123
1456
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 1 2 3\n", "1 4 5 6" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list=[[1,2,3],[4,5,6]]\n", "prueba= pd.DataFrame(list)\n", "prueba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos además importar un database:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
01Aguascalientes1Aguascalientes1AGUASCALIENTES02290100100010229410104...18.39.639.44.73.42.90.00.01.0Bajo
11Aguascalientes1Aguascalientes1AGUASCALIENTES023301001000102331536421...21.99.340.93.47.11.70.51.00.2Bajo
21Aguascalientes1Aguascalientes1AGUASCALIENTES028601001000102863469986...6.81.214.91.01.82.60.40.00.0Bajo
31Aguascalientes1Aguascalientes1AGUASCALIENTES029001001000102901884504...4.60.613.50.91.80.20.00.00.0Bajo
41Aguascalientes1Aguascalientes1AGUASCALIENTES030301001000103032397625...8.63.025.91.93.80.20.20.20.0Bajo
..................................................................
5102932Zacatecas57Trancoso1TRANCOSO04123205700010412122...NDNDNDNDNDNDNDNDNDAlto
5103032Zacatecas57Trancoso1TRANCOSO04313205700010431143...33.30.0100.00.00.00.066.70.00.0Alto
5103132Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ00153205800010015651175...19.412.040.67.01.01.13.40.01.1Bajo
5103232Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ002A320580001002A945251...18.714.337.16.84.51.61.61.21.6Bajo
5103332Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ010432058000101048625...52.032.0100.06.10.04.00.00.00.0Medio
\n", "

51034 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc \\\n", "0 1 Aguascalientes 1 Aguascalientes 1 \n", "1 1 Aguascalientes 1 Aguascalientes 1 \n", "2 1 Aguascalientes 1 Aguascalientes 1 \n", "3 1 Aguascalientes 1 Aguascalientes 1 \n", "4 1 Aguascalientes 1 Aguascalientes 1 \n", "... ... ... ... ... ... \n", "51029 32 Zacatecas 57 Trancoso 1 \n", "51030 32 Zacatecas 57 Trancoso 1 \n", "51031 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "51032 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "51033 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "\n", " loc cve_ageb folio_ageb pob_tot viv_par_hab \\\n", "0 AGUASCALIENTES 0229 0100100010229 410 104 \n", "1 AGUASCALIENTES 0233 0100100010233 1536 421 \n", "2 AGUASCALIENTES 0286 0100100010286 3469 986 \n", "3 AGUASCALIENTES 0290 0100100010290 1884 504 \n", "4 AGUASCALIENTES 0303 0100100010303 2397 625 \n", "... ... ... ... ... ... \n", "51029 TRANCOSO 0412 3205700010412 12 2 \n", "51030 TRANCOSO 0431 3205700010431 14 3 \n", "51031 SANTA MAR�A DE LA PAZ 0015 3205800010015 651 175 \n", "51032 SANTA MAR�A DE LA PAZ 002A 320580001002A 945 251 \n", "51033 SANTA MAR�A DE LA PAZ 0104 3205800010104 86 25 \n", "\n", " ... porc_vivsnlavadora porc_vivsnrefri porc_vivstelefono \\\n", "0 ... 18.3 9.6 39.4 \n", "1 ... 21.9 9.3 40.9 \n", "2 ... 6.8 1.2 14.9 \n", "3 ... 4.6 0.6 13.5 \n", "4 ... 8.6 3.0 25.9 \n", "... ... ... ... ... \n", "51029 ... ND ND ND \n", "51030 ... 33.3 0.0 100.0 \n", "51031 ... 19.4 12.0 40.6 \n", "51032 ... 18.7 14.3 37.1 \n", "51033 ... 52.0 32.0 100.0 \n", "\n", " porc_pob_15_mas_analfa porc_pob6_14_noasiste porc_vivpisotierra \\\n", "0 4.7 3.4 2.9 \n", "1 3.4 7.1 1.7 \n", "2 1.0 1.8 2.6 \n", "3 0.9 1.8 0.2 \n", "4 1.9 3.8 0.2 \n", "... ... ... ... \n", "51029 ND ND ND \n", "51030 0.0 0.0 0.0 \n", "51031 7.0 1.0 1.1 \n", "51032 6.8 4.5 1.6 \n", "51033 6.1 0.0 4.0 \n", "\n", " porc_snaguaent porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "0 0.0 0.0 1.0 Bajo \n", "1 0.5 1.0 0.2 Bajo \n", "2 0.4 0.0 0.0 Bajo \n", "3 0.0 0.0 0.0 Bajo \n", "4 0.2 0.2 0.0 Bajo \n", "... ... ... ... ... \n", "51029 ND ND ND Alto \n", "51030 66.7 0.0 0.0 Alto \n", "51031 3.4 0.0 1.1 Bajo \n", "51032 1.6 1.2 1.6 Bajo \n", "51033 0.0 0.0 0.0 Medio \n", "\n", "[51034 rows x 25 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('C:\\DAVE2\\CIDE\\Tesis\\Base de Datos\\Ageb_2010_DA_indice_de_rezago_social.csv')\n", "\n", "df" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "Para saber las dimensiones de nuestros DB es suficiente: " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(51034, 25)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cve_ent\n", "ent\n", "cve_mun\n", "mun\n", "cve_loc\n", "loc\n", "cve_ageb\n", "folio_ageb\n", "pob_tot\n", "viv_par_hab\n", "porc_pob_15_mas_basic_incom\n", "porc_pob_15_24_noasiste\n", "porc_pob_snservsal\n", "porc_vivhacina\n", "porc_vivsnsan\n", "porc_vivsnlavadora\n", "porc_vivsnrefri\n", "porc_vivstelefono\n", "porc_pob_15_mas_analfa\n", "porc_pob6_14_noasiste\n", "porc_vivpisotierra\n", "porc_snaguaent\n", "porc_vivsndren\n", "porc_vivsnenergia\n", "gdo_rezsoc\n" ] } ], "source": [ "for col in df.columns:\n", " print(col)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['cve_ent', 'ent', 'cve_mun', 'mun', 'cve_loc', 'loc', 'cve_ageb',\n", " 'folio_ageb', 'pob_tot', 'viv_par_hab', 'porc_pob_15_mas_basic_incom',\n", " 'porc_pob_15_24_noasiste', 'porc_pob_snservsal', 'porc_vivhacina',\n", " 'porc_vivsnsan', 'porc_vivsnlavadora', 'porc_vivsnrefri',\n", " 'porc_vivstelefono', 'porc_pob_15_mas_analfa', 'porc_pob6_14_noasiste',\n", " 'porc_vivpisotierra', 'porc_snaguaent', 'porc_vivsndren',\n", " 'porc_vivsnenergia', 'gdo_rezsoc'],\n", " dtype='object')" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
entcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_habporc_pob_15_mas_basic_incom...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
cve_ent
1Aguascalientes1Aguascalientes1AGUASCALIENTES0229010010001022941010435.1...18.39.639.44.73.42.90.00.01.0Bajo
1Aguascalientes1Aguascalientes1AGUASCALIENTES02330100100010233153642135.5...21.99.340.93.47.11.70.51.00.2Bajo
1Aguascalientes1Aguascalientes1AGUASCALIENTES02860100100010286346998616.0...6.81.214.91.01.82.60.40.00.0Bajo
1Aguascalientes1Aguascalientes1AGUASCALIENTES02900100100010290188450418.5...4.60.613.50.91.80.20.00.00.0Bajo
1Aguascalientes1Aguascalientes1AGUASCALIENTES03030100100010303239762530.3...8.63.025.91.93.80.20.20.20.0Bajo
..................................................................
32Zacatecas57Trancoso1TRANCOSO04123205700010412122ND...NDNDNDNDNDNDNDNDNDAlto
32Zacatecas57Trancoso1TRANCOSO0431320570001043114342.9...33.30.0100.00.00.00.066.70.00.0Alto
32Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ0015320580001001565117554.6...19.412.040.67.01.01.13.40.01.1Bajo
32Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ002A320580001002A94525155.8...18.714.337.16.84.51.61.61.21.6Bajo
32Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ01043205800010104862542.9...52.032.0100.06.10.04.00.00.00.0Medio
\n", "

51034 rows × 24 columns

\n", "
" ], "text/plain": [ " ent cve_mun mun cve_loc \\\n", "cve_ent \n", "1 Aguascalientes 1 Aguascalientes 1 \n", "1 Aguascalientes 1 Aguascalientes 1 \n", "1 Aguascalientes 1 Aguascalientes 1 \n", "1 Aguascalientes 1 Aguascalientes 1 \n", "1 Aguascalientes 1 Aguascalientes 1 \n", "... ... ... ... ... \n", "32 Zacatecas 57 Trancoso 1 \n", "32 Zacatecas 57 Trancoso 1 \n", "32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "\n", " loc cve_ageb folio_ageb pob_tot viv_par_hab \\\n", "cve_ent \n", "1 AGUASCALIENTES 0229 0100100010229 410 104 \n", "1 AGUASCALIENTES 0233 0100100010233 1536 421 \n", "1 AGUASCALIENTES 0286 0100100010286 3469 986 \n", "1 AGUASCALIENTES 0290 0100100010290 1884 504 \n", "1 AGUASCALIENTES 0303 0100100010303 2397 625 \n", "... ... ... ... ... ... \n", "32 TRANCOSO 0412 3205700010412 12 2 \n", "32 TRANCOSO 0431 3205700010431 14 3 \n", "32 SANTA MAR�A DE LA PAZ 0015 3205800010015 651 175 \n", "32 SANTA MAR�A DE LA PAZ 002A 320580001002A 945 251 \n", "32 SANTA MAR�A DE LA PAZ 0104 3205800010104 86 25 \n", "\n", " porc_pob_15_mas_basic_incom ... porc_vivsnlavadora porc_vivsnrefri \\\n", "cve_ent ... \n", "1 35.1 ... 18.3 9.6 \n", "1 35.5 ... 21.9 9.3 \n", "1 16.0 ... 6.8 1.2 \n", "1 18.5 ... 4.6 0.6 \n", "1 30.3 ... 8.6 3.0 \n", "... ... ... ... ... \n", "32 ND ... ND ND \n", "32 42.9 ... 33.3 0.0 \n", "32 54.6 ... 19.4 12.0 \n", "32 55.8 ... 18.7 14.3 \n", "32 42.9 ... 52.0 32.0 \n", "\n", " porc_vivstelefono porc_pob_15_mas_analfa porc_pob6_14_noasiste \\\n", "cve_ent \n", "1 39.4 4.7 3.4 \n", "1 40.9 3.4 7.1 \n", "1 14.9 1.0 1.8 \n", "1 13.5 0.9 1.8 \n", "1 25.9 1.9 3.8 \n", "... ... ... ... \n", "32 ND ND ND \n", "32 100.0 0.0 0.0 \n", "32 40.6 7.0 1.0 \n", "32 37.1 6.8 4.5 \n", "32 100.0 6.1 0.0 \n", "\n", " porc_vivpisotierra porc_snaguaent porc_vivsndren porc_vivsnenergia \\\n", "cve_ent \n", "1 2.9 0.0 0.0 1.0 \n", "1 1.7 0.5 1.0 0.2 \n", "1 2.6 0.4 0.0 0.0 \n", "1 0.2 0.0 0.0 0.0 \n", "1 0.2 0.2 0.2 0.0 \n", "... ... ... ... ... \n", "32 ND ND ND ND \n", "32 0.0 66.7 0.0 0.0 \n", "32 1.1 3.4 0.0 1.1 \n", "32 1.6 1.6 1.2 1.6 \n", "32 4.0 0.0 0.0 0.0 \n", "\n", " gdo_rezsoc \n", "cve_ent \n", "1 Bajo \n", "1 Bajo \n", "1 Bajo \n", "1 Bajo \n", "1 Bajo \n", "... ... \n", "32 Alto \n", "32 Alto \n", "32 Bajo \n", "32 Bajo \n", "32 Medio \n", "\n", "[51034 rows x 24 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.read_csv('C:\\DAVE2\\CIDE\\Tesis\\Base de Datos\\Ageb_2010_DA_indice_de_rezago_social.csv', index_col=0)\n", "\n", "df2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Con *pd.read_json('purchases.json')* podemos leer file json, de igual forma podemos también leer file *sql*.\n", "\n", "Podemos cambiar el indice escogido" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_habporc_pob_15_mas_basic_incomporc_pob_15_24_noasiste...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
ent
Aguascalientes1Aguascalientes1AGUASCALIENTES0229010010001022941010435.152.5...18.39.639.44.73.42.90.00.01.0Bajo
Aguascalientes1Aguascalientes1AGUASCALIENTES02330100100010233153642135.557.7...21.99.340.93.47.11.70.51.00.2Bajo
Aguascalientes1Aguascalientes1AGUASCALIENTES02860100100010286346998616.032.4...6.81.214.91.01.82.60.40.00.0Bajo
Aguascalientes1Aguascalientes1AGUASCALIENTES02900100100010290188450418.531.8...4.60.613.50.91.80.20.00.00.0Bajo
Aguascalientes1Aguascalientes1AGUASCALIENTES03030100100010303239762530.342.7...8.63.025.91.93.80.20.20.20.0Bajo
..................................................................
Zacatecas57Trancoso1TRANCOSO04123205700010412122NDND...NDNDNDNDNDNDNDNDNDAlto
Zacatecas57Trancoso1TRANCOSO0431320570001043114342.9100.0...33.30.0100.00.00.00.066.70.00.0Alto
Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ0015320580001001565117554.659.0...19.412.040.67.01.01.13.40.01.1Bajo
Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ002A320580001002A94525155.856.6...18.714.337.16.84.51.61.61.21.6Bajo
Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ01043205800010104862542.966.7...52.032.0100.06.10.04.00.00.00.0Medio
\n", "

51034 rows × 23 columns

\n", "
" ], "text/plain": [ " cve_mun mun cve_loc \\\n", "ent \n", "Aguascalientes 1 Aguascalientes 1 \n", "Aguascalientes 1 Aguascalientes 1 \n", "Aguascalientes 1 Aguascalientes 1 \n", "Aguascalientes 1 Aguascalientes 1 \n", "Aguascalientes 1 Aguascalientes 1 \n", "... ... ... ... \n", "Zacatecas 57 Trancoso 1 \n", "Zacatecas 57 Trancoso 1 \n", "Zacatecas 58 Santa Mar�a de la Paz 1 \n", "Zacatecas 58 Santa Mar�a de la Paz 1 \n", "Zacatecas 58 Santa Mar�a de la Paz 1 \n", "\n", " loc cve_ageb folio_ageb pob_tot \\\n", "ent \n", "Aguascalientes AGUASCALIENTES 0229 0100100010229 410 \n", "Aguascalientes AGUASCALIENTES 0233 0100100010233 1536 \n", "Aguascalientes AGUASCALIENTES 0286 0100100010286 3469 \n", "Aguascalientes AGUASCALIENTES 0290 0100100010290 1884 \n", "Aguascalientes AGUASCALIENTES 0303 0100100010303 2397 \n", "... ... ... ... ... \n", "Zacatecas TRANCOSO 0412 3205700010412 12 \n", "Zacatecas TRANCOSO 0431 3205700010431 14 \n", "Zacatecas SANTA MAR�A DE LA PAZ 0015 3205800010015 651 \n", "Zacatecas SANTA MAR�A DE LA PAZ 002A 320580001002A 945 \n", "Zacatecas SANTA MAR�A DE LA PAZ 0104 3205800010104 86 \n", "\n", " viv_par_hab porc_pob_15_mas_basic_incom \\\n", "ent \n", "Aguascalientes 104 35.1 \n", "Aguascalientes 421 35.5 \n", "Aguascalientes 986 16.0 \n", "Aguascalientes 504 18.5 \n", "Aguascalientes 625 30.3 \n", "... ... ... \n", "Zacatecas 2 ND \n", "Zacatecas 3 42.9 \n", "Zacatecas 175 54.6 \n", "Zacatecas 251 55.8 \n", "Zacatecas 25 42.9 \n", "\n", " porc_pob_15_24_noasiste ... porc_vivsnlavadora \\\n", "ent ... \n", "Aguascalientes 52.5 ... 18.3 \n", "Aguascalientes 57.7 ... 21.9 \n", "Aguascalientes 32.4 ... 6.8 \n", "Aguascalientes 31.8 ... 4.6 \n", "Aguascalientes 42.7 ... 8.6 \n", "... ... ... ... \n", "Zacatecas ND ... ND \n", "Zacatecas 100.0 ... 33.3 \n", "Zacatecas 59.0 ... 19.4 \n", "Zacatecas 56.6 ... 18.7 \n", "Zacatecas 66.7 ... 52.0 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "ent \n", "Aguascalientes 9.6 39.4 4.7 \n", "Aguascalientes 9.3 40.9 3.4 \n", "Aguascalientes 1.2 14.9 1.0 \n", "Aguascalientes 0.6 13.5 0.9 \n", "Aguascalientes 3.0 25.9 1.9 \n", "... ... ... ... \n", "Zacatecas ND ND ND \n", "Zacatecas 0.0 100.0 0.0 \n", "Zacatecas 12.0 40.6 7.0 \n", "Zacatecas 14.3 37.1 6.8 \n", "Zacatecas 32.0 100.0 6.1 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent \\\n", "ent \n", "Aguascalientes 3.4 2.9 0.0 \n", "Aguascalientes 7.1 1.7 0.5 \n", "Aguascalientes 1.8 2.6 0.4 \n", "Aguascalientes 1.8 0.2 0.0 \n", "Aguascalientes 3.8 0.2 0.2 \n", "... ... ... ... \n", "Zacatecas ND ND ND \n", "Zacatecas 0.0 0.0 66.7 \n", "Zacatecas 1.0 1.1 3.4 \n", "Zacatecas 4.5 1.6 1.6 \n", "Zacatecas 0.0 4.0 0.0 \n", "\n", " porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "ent \n", "Aguascalientes 0.0 1.0 Bajo \n", "Aguascalientes 1.0 0.2 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.2 0.0 Bajo \n", "... ... ... ... \n", "Zacatecas ND ND Alto \n", "Zacatecas 0.0 0.0 Alto \n", "Zacatecas 0.0 1.1 Bajo \n", "Zacatecas 1.2 1.6 Bajo \n", "Zacatecas 0.0 0.0 Medio \n", "\n", "[51034 rows x 23 columns]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.set_index('ent')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Podemos luego guardar los resultados en un archivo CSV:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "df2.to_csv('nueva_base.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Primero de todo necesitamos usualmente analizar los primeros datos de nuestro DB:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
01Aguascalientes1Aguascalientes1AGUASCALIENTES02290100100010229410104...18.39.639.44.73.42.90.00.01.0Bajo
11Aguascalientes1Aguascalientes1AGUASCALIENTES023301001000102331536421...21.99.340.93.47.11.70.51.00.2Bajo
21Aguascalientes1Aguascalientes1AGUASCALIENTES028601001000102863469986...6.81.214.91.01.82.60.40.00.0Bajo
31Aguascalientes1Aguascalientes1AGUASCALIENTES029001001000102901884504...4.60.613.50.91.80.20.00.00.0Bajo
41Aguascalientes1Aguascalientes1AGUASCALIENTES030301001000103032397625...8.63.025.91.93.80.20.20.20.0Bajo
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc loc \\\n", "0 1 Aguascalientes 1 Aguascalientes 1 AGUASCALIENTES \n", "1 1 Aguascalientes 1 Aguascalientes 1 AGUASCALIENTES \n", "2 1 Aguascalientes 1 Aguascalientes 1 AGUASCALIENTES \n", "3 1 Aguascalientes 1 Aguascalientes 1 AGUASCALIENTES \n", "4 1 Aguascalientes 1 Aguascalientes 1 AGUASCALIENTES \n", "\n", " cve_ageb folio_ageb pob_tot viv_par_hab ... porc_vivsnlavadora \\\n", "0 0229 0100100010229 410 104 ... 18.3 \n", "1 0233 0100100010233 1536 421 ... 21.9 \n", "2 0286 0100100010286 3469 986 ... 6.8 \n", "3 0290 0100100010290 1884 504 ... 4.6 \n", "4 0303 0100100010303 2397 625 ... 8.6 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "0 9.6 39.4 4.7 \n", "1 9.3 40.9 3.4 \n", "2 1.2 14.9 1.0 \n", "3 0.6 13.5 0.9 \n", "4 3.0 25.9 1.9 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent porc_vivsndren \\\n", "0 3.4 2.9 0.0 0.0 \n", "1 7.1 1.7 0.5 1.0 \n", "2 1.8 2.6 0.4 0.0 \n", "3 1.8 0.2 0.0 0.0 \n", "4 3.8 0.2 0.2 0.2 \n", "\n", " porc_vivsnenergia gdo_rezsoc \n", "0 1.0 Bajo \n", "1 0.2 Bajo \n", "2 0.0 Bajo \n", "3 0.0 Bajo \n", "4 0.0 Bajo \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(5)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
5102732Zacatecas57Trancoso1TRANCOSO03613205700010361205...20.040.0100.020.014.30.00.020.020.0Alto
5102832Zacatecas57Trancoso1TRANCOSO0408320570001040814325...52.056.096.018.38.10.028.048.08.0Medio
5102932Zacatecas57Trancoso1TRANCOSO04123205700010412122...NDNDNDNDNDNDNDNDNDAlto
5103032Zacatecas57Trancoso1TRANCOSO04313205700010431143...33.30.0100.00.00.00.066.70.00.0Alto
5103132Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ00153205800010015651175...19.412.040.67.01.01.13.40.01.1Bajo
5103232Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ002A320580001002A945251...18.714.337.16.84.51.61.61.21.6Bajo
5103332Zacatecas58Santa Mar�a de la Paz1SANTA MAR�A DE LA PAZ010432058000101048625...52.032.0100.06.10.04.00.00.00.0Medio
\n", "

7 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc \\\n", "51027 32 Zacatecas 57 Trancoso 1 \n", "51028 32 Zacatecas 57 Trancoso 1 \n", "51029 32 Zacatecas 57 Trancoso 1 \n", "51030 32 Zacatecas 57 Trancoso 1 \n", "51031 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "51032 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "51033 32 Zacatecas 58 Santa Mar�a de la Paz 1 \n", "\n", " loc cve_ageb folio_ageb pob_tot viv_par_hab \\\n", "51027 TRANCOSO 0361 3205700010361 20 5 \n", "51028 TRANCOSO 0408 3205700010408 143 25 \n", "51029 TRANCOSO 0412 3205700010412 12 2 \n", "51030 TRANCOSO 0431 3205700010431 14 3 \n", "51031 SANTA MAR�A DE LA PAZ 0015 3205800010015 651 175 \n", "51032 SANTA MAR�A DE LA PAZ 002A 320580001002A 945 251 \n", "51033 SANTA MAR�A DE LA PAZ 0104 3205800010104 86 25 \n", "\n", " ... porc_vivsnlavadora porc_vivsnrefri porc_vivstelefono \\\n", "51027 ... 20.0 40.0 100.0 \n", "51028 ... 52.0 56.0 96.0 \n", "51029 ... ND ND ND \n", "51030 ... 33.3 0.0 100.0 \n", "51031 ... 19.4 12.0 40.6 \n", "51032 ... 18.7 14.3 37.1 \n", "51033 ... 52.0 32.0 100.0 \n", "\n", " porc_pob_15_mas_analfa porc_pob6_14_noasiste porc_vivpisotierra \\\n", "51027 20.0 14.3 0.0 \n", "51028 18.3 8.1 0.0 \n", "51029 ND ND ND \n", "51030 0.0 0.0 0.0 \n", "51031 7.0 1.0 1.1 \n", "51032 6.8 4.5 1.6 \n", "51033 6.1 0.0 4.0 \n", "\n", " porc_snaguaent porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "51027 0.0 20.0 20.0 Alto \n", "51028 28.0 48.0 8.0 Medio \n", "51029 ND ND ND Alto \n", "51030 66.7 0.0 0.0 Alto \n", "51031 3.4 0.0 1.1 Bajo \n", "51032 1.6 1.2 1.6 Bajo \n", "51033 0.0 0.0 0.0 Medio \n", "\n", "[7 rows x 25 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(7)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "df3 = pd.read_csv('C:\\DAVE2\\CIDE\\Tesis\\Base de Datos\\Ageb_2010_DA_indice_de_rezago_social.csv', index_col='ent')\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_entcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_habporc_pob_15_mas_basic_incom...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
ent
Aguascalientes11Aguascalientes1AGUASCALIENTES0229010010001022941010435.1...18.39.639.44.73.42.90.00.01.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02330100100010233153642135.5...21.99.340.93.47.11.70.51.00.2Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02860100100010286346998616.0...6.81.214.91.01.82.60.40.00.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02900100100010290188450418.5...4.60.613.50.91.80.20.00.00.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES03030100100010303239762530.3...8.63.025.91.93.80.20.20.20.0Bajo
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " cve_ent cve_mun mun cve_loc loc \\\n", "ent \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "\n", " cve_ageb folio_ageb pob_tot viv_par_hab \\\n", "ent \n", "Aguascalientes 0229 0100100010229 410 104 \n", "Aguascalientes 0233 0100100010233 1536 421 \n", "Aguascalientes 0286 0100100010286 3469 986 \n", "Aguascalientes 0290 0100100010290 1884 504 \n", "Aguascalientes 0303 0100100010303 2397 625 \n", "\n", " porc_pob_15_mas_basic_incom ... porc_vivsnlavadora \\\n", "ent ... \n", "Aguascalientes 35.1 ... 18.3 \n", "Aguascalientes 35.5 ... 21.9 \n", "Aguascalientes 16.0 ... 6.8 \n", "Aguascalientes 18.5 ... 4.6 \n", "Aguascalientes 30.3 ... 8.6 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "ent \n", "Aguascalientes 9.6 39.4 4.7 \n", "Aguascalientes 9.3 40.9 3.4 \n", "Aguascalientes 1.2 14.9 1.0 \n", "Aguascalientes 0.6 13.5 0.9 \n", "Aguascalientes 3.0 25.9 1.9 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent \\\n", "ent \n", "Aguascalientes 3.4 2.9 0.0 \n", "Aguascalientes 7.1 1.7 0.5 \n", "Aguascalientes 1.8 2.6 0.4 \n", "Aguascalientes 1.8 0.2 0.0 \n", "Aguascalientes 3.8 0.2 0.2 \n", "\n", " porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "ent \n", "Aguascalientes 0.0 1.0 Bajo \n", "Aguascalientes 1.0 0.2 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.2 0.0 Bajo \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos obtener información con respecto a nuestro DB:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Index: 51034 entries, Aguascalientes to Zacatecas\n", "Data columns (total 24 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 cve_ent 51034 non-null int64 \n", " 1 cve_mun 51034 non-null int64 \n", " 2 mun 51034 non-null object\n", " 3 cve_loc 51034 non-null int64 \n", " 4 loc 51034 non-null object\n", " 5 cve_ageb 51034 non-null object\n", " 6 folio_ageb 51034 non-null object\n", " 7 pob_tot 51034 non-null int64 \n", " 8 viv_par_hab 51034 non-null int64 \n", " 9 porc_pob_15_mas_basic_incom 51034 non-null object\n", " 10 porc_pob_15_24_noasiste 51034 non-null object\n", " 11 porc_pob_snservsal 51034 non-null object\n", " 12 porc_vivhacina 51034 non-null object\n", " 13 porc_vivsnsan 51034 non-null object\n", " 14 porc_vivsnlavadora 51034 non-null object\n", " 15 porc_vivsnrefri 51034 non-null object\n", " 16 porc_vivstelefono 51034 non-null object\n", " 17 porc_pob_15_mas_analfa 51034 non-null object\n", " 18 porc_pob6_14_noasiste 51034 non-null object\n", " 19 porc_vivpisotierra 51034 non-null object\n", " 20 porc_snaguaent 51034 non-null object\n", " 21 porc_vivsndren 51034 non-null object\n", " 22 porc_vivsnenergia 51034 non-null object\n", " 23 gdo_rezsoc 51034 non-null object\n", "dtypes: int64(5), object(19)\n", "memory usage: 9.7+ MB\n" ] } ], "source": [ "df3.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos mergear dos DB verticalmente con la siguiente función:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(102068, 24)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_df = df3.append(df3)\n", "\n", "temp_df.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Para eliminar duplicados será suficiente hacer:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(51034, 24)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_df = temp_df.drop_duplicates()\n", "\n", "temp_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Siempre estaremos regresando una copia del DB, si queremos modificarlo directamente tendremos que ocupar la funcion inplace" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(51034, 24)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_df.shape" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(102068, 24)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_df2 = df3.append(df3)\n", "\n", "temp_df2.drop_duplicates()\n", "\n", "temp_df2.shape" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0, 24)" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_df2 = df3.append(df3)\n", "temp_df2.drop_duplicates(keep=False, inplace=True)\n", "\n", "#temp_df2.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Keep especifica cual de los duplicados considerar:\n", "* false dropea todo\n", "* first mantiene el primero\n", "* last mantiene el ultimo" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Para ver los nombres de las columnas podemos utilizar la siguiente función:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['cve_ent', 'cve_mun', 'mun', 'cve_loc', 'loc', 'cve_ageb', 'folio_ageb',\n", " 'pob_tot', 'viv_par_hab', 'porc_pob_15_mas_basic_incom',\n", " 'porc_pob_15_24_noasiste', 'porc_pob_snservsal', 'porc_vivhacina',\n", " 'porc_vivsnsan', 'porc_vivsnlavadora', 'porc_vivsnrefri',\n", " 'porc_vivstelefono', 'porc_pob_15_mas_analfa', 'porc_pob6_14_noasiste',\n", " 'porc_vivpisotierra', 'porc_snaguaent', 'porc_vivsndren',\n", " 'porc_vivsnenergia', 'gdo_rezsoc'],\n", " dtype='object')" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para cambiar el nombre de una columna haremos lo siguiente: " ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_entcve_munmunicipiocve_locloccve_agebfolio_agebpoblacion totalviv_par_habporc_pob_15_mas_basic_incom...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
ent
Aguascalientes11Aguascalientes1AGUASCALIENTES0229010010001022941010435.1...18.39.639.44.73.42.90.00.01.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02330100100010233153642135.5...21.99.340.93.47.11.70.51.00.2Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02860100100010286346998616.0...6.81.214.91.01.82.60.40.00.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES02900100100010290188450418.5...4.60.613.50.91.80.20.00.00.0Bajo
Aguascalientes11Aguascalientes1AGUASCALIENTES03030100100010303239762530.3...8.63.025.91.93.80.20.20.20.0Bajo
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " cve_ent cve_mun municipio cve_loc loc \\\n", "ent \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "Aguascalientes 1 1 Aguascalientes 1 AGUASCALIENTES \n", "\n", " cve_ageb folio_ageb poblacion total viv_par_hab \\\n", "ent \n", "Aguascalientes 0229 0100100010229 410 104 \n", "Aguascalientes 0233 0100100010233 1536 421 \n", "Aguascalientes 0286 0100100010286 3469 986 \n", "Aguascalientes 0290 0100100010290 1884 504 \n", "Aguascalientes 0303 0100100010303 2397 625 \n", "\n", " porc_pob_15_mas_basic_incom ... porc_vivsnlavadora \\\n", "ent ... \n", "Aguascalientes 35.1 ... 18.3 \n", "Aguascalientes 35.5 ... 21.9 \n", "Aguascalientes 16.0 ... 6.8 \n", "Aguascalientes 18.5 ... 4.6 \n", "Aguascalientes 30.3 ... 8.6 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "ent \n", "Aguascalientes 9.6 39.4 4.7 \n", "Aguascalientes 9.3 40.9 3.4 \n", "Aguascalientes 1.2 14.9 1.0 \n", "Aguascalientes 0.6 13.5 0.9 \n", "Aguascalientes 3.0 25.9 1.9 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent \\\n", "ent \n", "Aguascalientes 3.4 2.9 0.0 \n", "Aguascalientes 7.1 1.7 0.5 \n", "Aguascalientes 1.8 2.6 0.4 \n", "Aguascalientes 1.8 0.2 0.0 \n", "Aguascalientes 3.8 0.2 0.2 \n", "\n", " porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "ent \n", "Aguascalientes 0.0 1.0 Bajo \n", "Aguascalientes 1.0 0.2 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.0 0.0 Bajo \n", "Aguascalientes 0.2 0.0 Bajo \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.rename(columns={\n", " 'mun': 'municipio', \n", " 'pob_tot': 'poblacion total'\n", " }, inplace=True)\n", "\n", "df3.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df3.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Podemos tambien asignarlos por medio de una lista:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['inter', 'juventus'], dtype='object')" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams.columns=[col.lower() for col in teams]\n", "teams.columns" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League33
Champions32
Ligas Robadas07
Arbitros comprados0106
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7\n", "Arbitros comprados 0 106" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams.columns=['Internazionale','Rubentus']\n", "\n", "teams.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Es buena norma utilizar caracteres minuscolos, quitar espacio y utilizar underscores y evitar accentos. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Los valores faltantes\n", "\n", "Cuando analizamos unas bases, en particular las del INEGI ;) , tenemos que estar cuidado con el manejo de los valores faltantes. Esto podrán aparecernos como np.nan (Numpy) o como None (Python). A veces el manejo de los dos diferentes tipos puede ser distinto. \n", "\n", "cuando tenemos valores mancantes tenemos distintas manera con que resolver el problema:\n", "\n", "* Eliminar la observación\n", "* Remplazar el valor nulo por medio de una *imputación*" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 51034 entries, 0 to 51033\n", "Data columns (total 25 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 cve_ent 51034 non-null int64 \n", " 1 ent 51034 non-null object \n", " 2 cve_mun 51025 non-null float64\n", " 3 mun 51025 non-null object \n", " 4 cve_loc 51025 non-null float64\n", " 5 loc 51025 non-null object \n", " 6 cve_ageb 51021 non-null object \n", " 7 folio_ageb 51021 non-null object \n", " 8 pob_tot 51021 non-null float64\n", " 9 viv_par_hab 51013 non-null float64\n", " 10 porc_pob_15_mas_basic_incom 51013 non-null object \n", " 11 porc_pob_15_24_noasiste 51013 non-null object \n", " 12 porc_pob_snservsal 51004 non-null object \n", " 13 porc_vivhacina 51004 non-null object \n", " 14 porc_vivsnsan 51008 non-null object \n", " 15 porc_vivsnlavadora 51008 non-null object \n", " 16 porc_vivsnrefri 51008 non-null object \n", " 17 porc_vivstelefono 51017 non-null object \n", " 18 porc_pob_15_mas_analfa 51017 non-null object \n", " 19 porc_pob6_14_noasiste 51017 non-null object \n", " 20 porc_vivpisotierra 51023 non-null object \n", " 21 porc_snaguaent 51023 non-null object \n", " 22 porc_vivsndren 51023 non-null object \n", " 23 porc_vivsnenergia 51023 non-null object \n", " 24 gdo_rezsoc 51034 non-null object \n", "dtypes: float64(4), int64(1), object(20)\n", "memory usage: 9.7+ MB\n" ] } ], "source": [ "db=pd.read_csv('C:\\DAVE2\\CIDE\\Tesis\\Base de Datos\\Ageb_2010_DA_indice_de_rezago_social2.csv')\n", "\n", "db.info()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue...TrueTrueTrueTrueTrueFalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue...TrueTrueTrueTrueTrueFalseFalseFalseFalseFalse
4FalseFalseFalseFalseFalseFalseFalseFalseFalseTrue...TrueTrueTrueTrueTrueFalseFalseFalseFalseFalse
..................................................................
51029FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
51030FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
51031FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
51032FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
51033FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse...FalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
\n", "

51034 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc loc cve_ageb folio_ageb \\\n", "0 False False False False False False False False \n", "1 False False False False False False False False \n", "2 False False False False False False False False \n", "3 False False False False False False False False \n", "4 False False False False False False False False \n", "... ... ... ... ... ... ... ... ... \n", "51029 False False False False False False False False \n", "51030 False False False False False False False False \n", "51031 False False False False False False False False \n", "51032 False False False False False False False False \n", "51033 False False False False False False False False \n", "\n", " pob_tot viv_par_hab ... porc_vivsnlavadora porc_vivsnrefri \\\n", "0 False False ... False False \n", "1 False False ... False False \n", "2 False True ... True True \n", "3 False True ... True True \n", "4 False True ... True True \n", "... ... ... ... ... ... \n", "51029 False False ... False False \n", "51030 False False ... False False \n", "51031 False False ... False False \n", "51032 False False ... False False \n", "51033 False False ... False False \n", "\n", " porc_vivstelefono porc_pob_15_mas_analfa porc_pob6_14_noasiste \\\n", "0 False False False \n", "1 False False False \n", "2 True True True \n", "3 True True True \n", "4 True True True \n", "... ... ... ... \n", "51029 False False False \n", "51030 False False False \n", "51031 False False False \n", "51032 False False False \n", "51033 False False False \n", "\n", " porc_vivpisotierra porc_snaguaent porc_vivsndren porc_vivsnenergia \\\n", "0 False False False False \n", "1 False False False False \n", "2 False False False False \n", "3 False False False False \n", "4 False False False False \n", "... ... ... ... ... \n", "51029 False False False False \n", "51030 False False False False \n", "51031 False False False False \n", "51032 False False False False \n", "51033 False False False False \n", "\n", " gdo_rezsoc \n", "0 False \n", "1 False \n", "2 False \n", "3 False \n", "4 False \n", "... ... \n", "51029 False \n", "51030 False \n", "51031 False \n", "51032 False \n", "51033 False \n", "\n", "[51034 rows x 25 columns]" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.isnull()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(51034, 25)" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.shape" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cve_ent 0\n", "ent 0\n", "cve_mun 9\n", "mun 9\n", "cve_loc 9\n", "loc 9\n", "cve_ageb 13\n", "folio_ageb 13\n", "pob_tot 13\n", "viv_par_hab 21\n", "porc_pob_15_mas_basic_incom 21\n", "porc_pob_15_24_noasiste 21\n", "porc_pob_snservsal 30\n", "porc_vivhacina 30\n", "porc_vivsnsan 26\n", "porc_vivsnlavadora 26\n", "porc_vivsnrefri 26\n", "porc_vivstelefono 17\n", "porc_pob_15_mas_analfa 17\n", "porc_pob6_14_noasiste 17\n", "porc_vivpisotierra 11\n", "porc_snaguaent 11\n", "porc_vivsndren 11\n", "porc_vivsnenergia 11\n", "gdo_rezsoc 0\n", "dtype: int64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si queremos eliminar las observaciones con por lo menos un *NaN* utilizaremos la siguiente expresión:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(50993, 25)" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db2=db.dropna()\n", "\n", "db2.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Podemos tambié escoger de eliminar las columnas con valores nulos" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(51034, 3)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db3=db.dropna(axis=1)\n", "\n", "db3.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Para poder imputar nuestro valores mancantes podemos hacer lo siguiente:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "cvep=db['cve_mun']" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "cvep_mean=cvep.mean()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "cvep.fillna(cvep_mean, inplace=True)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cve_ent 0\n", "ent 0\n", "cve_mun 0\n", "mun 9\n", "cve_loc 9\n", "loc 9\n", "cve_ageb 13\n", "folio_ageb 13\n", "pob_tot 13\n", "viv_par_hab 21\n", "porc_pob_15_mas_basic_incom 21\n", "porc_pob_15_24_noasiste 21\n", "porc_pob_snservsal 30\n", "porc_vivhacina 30\n", "porc_vivsnsan 26\n", "porc_vivsnlavadora 26\n", "porc_vivsnrefri 26\n", "porc_vivstelefono 17\n", "porc_pob_15_mas_analfa 17\n", "porc_pob6_14_noasiste 17\n", "porc_vivpisotierra 11\n", "porc_snaguaent 11\n", "porc_vivsndren 11\n", "porc_vivsnenergia 11\n", "gdo_rezsoc 0\n", "dtype: int64" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entender el DB\n", "\n", "Para mejor comprender la estructura dle DB podmeos utilizar :" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_entcve_muncve_locpob_totviv_par_hab
count51034.00000051034.00000051025.00000051021.00000051013.000000
mean16.87026351.98447821.7638221703.227534434.833297
std8.30698671.438801110.3219501726.229945441.300656
min1.0000001.0000001.0000002.0000001.000000
25%11.00000012.0000001.000000294.00000074.000000
50%15.00000031.0000001.0000001268.000000325.000000
75%24.00000064.0000001.0000002555.000000653.000000
max32.000000570.0000004705.00000022876.0000006160.000000
\n", "
" ], "text/plain": [ " cve_ent cve_mun cve_loc pob_tot viv_par_hab\n", "count 51034.000000 51034.000000 51025.000000 51021.000000 51013.000000\n", "mean 16.870263 51.984478 21.763822 1703.227534 434.833297\n", "std 8.306986 71.438801 110.321950 1726.229945 441.300656\n", "min 1.000000 1.000000 1.000000 2.000000 1.000000\n", "25% 11.000000 12.000000 1.000000 294.000000 74.000000\n", "50% 15.000000 31.000000 1.000000 1268.000000 325.000000\n", "75% 24.000000 64.000000 1.000000 2555.000000 653.000000\n", "max 32.000000 570.000000 4705.000000 22876.000000 6160.000000" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.describe()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 51034\n", "unique 3\n", "top Bajo\n", "freq 33024\n", "Name: gdo_rezsoc, dtype: object" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['gdo_rezsoc'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ahora bien para contar los valores en una columna haremos:" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Bajo 33024\n", "Medio 12305\n", "Alto 5705\n", "Name: gdo_rezsoc, dtype: int64" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db['gdo_rezsoc'].value_counts().head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos también analizar la correlacion entre variables, por medio de una matriz de correlación. Puede ser utili particularmente cuando queremos analizar una variable resultado con respecto a alguna potencial explicativa:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_entcve_muncve_locpob_totviv_par_hab
cve_ent1.0000000.193240-0.079972-0.089065-0.089841
cve_mun0.1932401.000000-0.052669-0.054085-0.063511
cve_loc-0.079972-0.0526691.000000-0.029778-0.029908
pob_tot-0.089065-0.054085-0.0297781.0000000.986179
viv_par_hab-0.089841-0.063511-0.0299080.9861791.000000
\n", "
" ], "text/plain": [ " cve_ent cve_mun cve_loc pob_tot viv_par_hab\n", "cve_ent 1.000000 0.193240 -0.079972 -0.089065 -0.089841\n", "cve_mun 0.193240 1.000000 -0.052669 -0.054085 -0.063511\n", "cve_loc -0.079972 -0.052669 1.000000 -0.029778 -0.029908\n", "pob_tot -0.089065 -0.054085 -0.029778 1.000000 0.986179\n", "viv_par_hab -0.089841 -0.063511 -0.029908 0.986179 1.000000" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Seleccionar y \"slice\"\n", "\n", "Para seleccionar una columna podremos:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pob_col=df['pob_tot']\n", "\n", "type(pob_col)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Si queremos extraer una columna como dataframe tendremos que:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pob_col2=df[['pob_tot']]\n", "\n", "type(pob_col2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Siendo una lista podemos extrer todas las olumnas que queremos:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pob_totcve_entent
04101Aguascalientes
115361Aguascalientes
234691Aguascalientes
318841Aguascalientes
423971Aguascalientes
\n", "
" ], "text/plain": [ " pob_tot cve_ent ent\n", "0 410 1 Aguascalientes\n", "1 1536 1 Aguascalientes\n", "2 3469 1 Aguascalientes\n", "3 1884 1 Aguascalientes\n", "4 2397 1 Aguascalientes" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub=df[['pob_tot','cve_ent','ent']]\n", "sub.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para seleccionar los renglones tenemos dos opciones:\n", "* loc\n", "* iloc" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Internazionale 3\n", "Rubentus 2\n", "Name: Champions, dtype: int64" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prueba = teams.loc['Champions']\n", "\n", "prueba" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League33
Champions32
Ligas Robadas07
Arbitros comprados0106
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7\n", "Arbitros comprados 0 106" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prueba2=teams.iloc[2,1]\n", "\n", "prueba2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos de igual manera seleccionar más de un renglon:" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League33
Champions32
Ligas Robadas07
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prueba3=teams.iloc[0:3]\n", "\n", "prueba3" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League33
Champions32
Ligas Robadas07
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prueba4=teams.loc['Europa League':'Ligas Robadas']\n", "\n", "prueba4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos también seleccionar con base en un criterio:" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2000" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "condition = (df['ent'] == 'Guerrero')\n", "\n", "condition.head()\n", "\n", "condition.sum()" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
1494412Guerrero1Acapulco de Ju�rez1ACAPULCO DE JU�REZ003412001000100344049982...41.49.146.07.14.36.52.40.60.2Bajo
1494512Guerrero1Acapulco de Ju�rez1ACAPULCO DE JU�REZ0049120010001004942751017...40.89.644.48.43.95.30.90.00.0Bajo
1494612Guerrero1Acapulco de Ju�rez1ACAPULCO DE JU�REZ009112001000100913080914...22.57.937.91.41.96.54.72.60.3Bajo
1494712Guerrero1Acapulco de Ju�rez1ACAPULCO DE JU�REZ010412001000101042299610...31.65.238.95.81.41.51.50.00.0Bajo
1494812Guerrero1Acapulco de Ju�rez1ACAPULCO DE JU�REZ0231120010001023136091022...29.55.935.31.64.01.51.30.00.5Bajo
..................................................................
1693912Guerrero80Juchit�n1JUCHIT�N012712080000101275311...36.427.390.951.60.018.227.336.49.1Alto
1694012Guerrero81Iliatenco1ILIATENCO001612081000100161492294...81.026.281.011.11.015.022.413.63.4Medio
1694112Guerrero81Iliatenco1ILIATENCO004A120810001004A6113...84.638.592.314.30.015.423.153.815.4Alto
1694212Guerrero81Iliatenco1ILIATENCO00541208100010054111...NDNDNDNDNDNDNDNDNDAlto
1694312Guerrero81Iliatenco1ILIATENCO0069120810001006914326...84.634.669.212.52.37.742.33.80.0Medio
\n", "

2000 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc \\\n", "14944 12 Guerrero 1 Acapulco de Ju�rez 1 \n", "14945 12 Guerrero 1 Acapulco de Ju�rez 1 \n", "14946 12 Guerrero 1 Acapulco de Ju�rez 1 \n", "14947 12 Guerrero 1 Acapulco de Ju�rez 1 \n", "14948 12 Guerrero 1 Acapulco de Ju�rez 1 \n", "... ... ... ... ... ... \n", "16939 12 Guerrero 80 Juchit�n 1 \n", "16940 12 Guerrero 81 Iliatenco 1 \n", "16941 12 Guerrero 81 Iliatenco 1 \n", "16942 12 Guerrero 81 Iliatenco 1 \n", "16943 12 Guerrero 81 Iliatenco 1 \n", "\n", " loc cve_ageb folio_ageb pob_tot viv_par_hab ... \\\n", "14944 ACAPULCO DE JU�REZ 0034 1200100010034 4049 982 ... \n", "14945 ACAPULCO DE JU�REZ 0049 1200100010049 4275 1017 ... \n", "14946 ACAPULCO DE JU�REZ 0091 1200100010091 3080 914 ... \n", "14947 ACAPULCO DE JU�REZ 0104 1200100010104 2299 610 ... \n", "14948 ACAPULCO DE JU�REZ 0231 1200100010231 3609 1022 ... \n", "... ... ... ... ... ... ... \n", "16939 JUCHIT�N 0127 1208000010127 53 11 ... \n", "16940 ILIATENCO 0016 1208100010016 1492 294 ... \n", "16941 ILIATENCO 004A 120810001004A 61 13 ... \n", "16942 ILIATENCO 0054 1208100010054 11 1 ... \n", "16943 ILIATENCO 0069 1208100010069 143 26 ... \n", "\n", " porc_vivsnlavadora porc_vivsnrefri porc_vivstelefono \\\n", "14944 41.4 9.1 46.0 \n", "14945 40.8 9.6 44.4 \n", "14946 22.5 7.9 37.9 \n", "14947 31.6 5.2 38.9 \n", "14948 29.5 5.9 35.3 \n", "... ... ... ... \n", "16939 36.4 27.3 90.9 \n", "16940 81.0 26.2 81.0 \n", "16941 84.6 38.5 92.3 \n", "16942 ND ND ND \n", "16943 84.6 34.6 69.2 \n", "\n", " porc_pob_15_mas_analfa porc_pob6_14_noasiste porc_vivpisotierra \\\n", "14944 7.1 4.3 6.5 \n", "14945 8.4 3.9 5.3 \n", "14946 1.4 1.9 6.5 \n", "14947 5.8 1.4 1.5 \n", "14948 1.6 4.0 1.5 \n", "... ... ... ... \n", "16939 51.6 0.0 18.2 \n", "16940 11.1 1.0 15.0 \n", "16941 14.3 0.0 15.4 \n", "16942 ND ND ND \n", "16943 12.5 2.3 7.7 \n", "\n", " porc_snaguaent porc_vivsndren porc_vivsnenergia gdo_rezsoc \n", "14944 2.4 0.6 0.2 Bajo \n", "14945 0.9 0.0 0.0 Bajo \n", "14946 4.7 2.6 0.3 Bajo \n", "14947 1.5 0.0 0.0 Bajo \n", "14948 1.3 0.0 0.5 Bajo \n", "... ... ... ... ... \n", "16939 27.3 36.4 9.1 Alto \n", "16940 22.4 13.6 3.4 Medio \n", "16941 23.1 53.8 15.4 Alto \n", "16942 ND ND ND Alto \n", "16943 42.3 3.8 0.0 Medio \n", "\n", "[2000 rows x 25 columns]" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['ent'] == 'Guerrero']" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
2329515M�xico39Ixtapaluca1IXTAPALUCA075A150390001075A228766160...13.64.332.30.71.71.10.30.00.0Bajo
2445915M�xico81Tec�mac19OJO DE AGUA11301508100191130201835929...15.24.641.20.31.70.60.10.00.0Bajo
\n", "

2 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc loc cve_ageb \\\n", "23295 15 M�xico 39 Ixtapaluca 1 IXTAPALUCA 075A \n", "24459 15 M�xico 81 Tec�mac 19 OJO DE AGUA 1130 \n", "\n", " folio_ageb pob_tot viv_par_hab ... porc_vivsnlavadora \\\n", "23295 150390001075A 22876 6160 ... 13.6 \n", "24459 1508100191130 20183 5929 ... 15.2 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "23295 4.3 32.3 0.7 \n", "24459 4.6 41.2 0.3 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent porc_vivsndren \\\n", "23295 1.7 1.1 0.3 0.0 \n", "24459 1.7 0.6 0.1 0.0 \n", "\n", " porc_vivsnenergia gdo_rezsoc \n", "23295 0.0 Bajo \n", "24459 0.0 Bajo \n", "\n", "[2 rows x 25 columns]" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['pob_tot'] >= 20000].head(3)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cve_ententcve_munmuncve_locloccve_agebfolio_agebpob_totviv_par_hab...porc_vivsnlavadoraporc_vivsnrefriporc_vivstelefonoporc_pob_15_mas_analfaporc_pob6_14_noasisteporc_vivpisotierraporc_snaguaentporc_vivsndrenporc_vivsnenergiagdo_rezsoc
2329515M�xico39Ixtapaluca1IXTAPALUCA075A150390001075A228766160...13.64.332.30.71.71.10.30.00.0Bajo
\n", "

1 rows × 25 columns

\n", "
" ], "text/plain": [ " cve_ent ent cve_mun mun cve_loc loc cve_ageb \\\n", "23295 15 M�xico 39 Ixtapaluca 1 IXTAPALUCA 075A \n", "\n", " folio_ageb pob_tot viv_par_hab ... porc_vivsnlavadora \\\n", "23295 150390001075A 22876 6160 ... 13.6 \n", "\n", " porc_vivsnrefri porc_vivstelefono porc_pob_15_mas_analfa \\\n", "23295 4.3 32.3 0.7 \n", "\n", " porc_pob6_14_noasiste porc_vivpisotierra porc_snaguaent porc_vivsndren \\\n", "23295 1.7 1.1 0.3 0.0 \n", "\n", " porc_vivsnenergia gdo_rezsoc \n", "23295 0.0 Bajo \n", "\n", "[1 rows x 25 columns]" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['pob_tot'] >= 20000) & (df['viv_par_hab'] >= 6000)].head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos finalmente utilizar funciones junto con Pandas, tanto lambda como unas ya definidas:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League44
Champions43
Ligas Robadas18
Arbitros comprados1107
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 4 4\n", "Champions 4 3\n", "Ligas Robadas 1 8\n", "Arbitros comprados 1 107" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add (x):\n", " y=x+1\n", " return y\n", "\n", "teams.apply(add)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "teams.apply(lambda x: x+1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ordenar" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Europa League33
Champions32
Ligas Robadas07
Arbitros comprados0106
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Europa League 3 3\n", "Champions 3 2\n", "Ligas Robadas 0 7\n", "Arbitros comprados 0 106" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams.sort_values(by = 'Internazionale', ascending=False).head()" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InternazionaleRubentus
Champions32
Europa League33
Ligas Robadas07
Arbitros comprados0106
\n", "
" ], "text/plain": [ " Internazionale Rubentus\n", "Champions 3 2\n", "Europa League 3 3\n", "Ligas Robadas 0 7\n", "Arbitros comprados 0 106" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams.sort_values(by = ['Internazionale','Rubentus'], ascending=[False,True]).head()" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }