{ "cells": [ { "cell_type": "markdown", "id": "7091d4c4", "metadata": {}, "source": [ "![](https://raw.githubusercontent.com/rafneta/RNlibro/master/imagenes/banner.png)\n", "\n", "```{contents}\n", ":depth: 4\n", "```\n", "\n", "# Pandas\n" ] }, { "cell_type": "markdown", "id": "30fb7fd8", "metadata": {}, "source": [ "- [Página principal de Pandas](https://pandas.pydata.org/docs/index.html)\n", "- [Documentación](https://pandas.pydata.org/docs/reference/index.html)\n", "\n", "Tomamos una muestra de la guía rápida de Pandas ([10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html)), junto con un par de complementos.\n" ] }, { "cell_type": "markdown", "id": "7f1320ae-981d-46f4-b9be-23cf66170f83", "metadata": {}, "source": [ "## Creamos objetos\n" ] }, { "cell_type": "code", "execution_count": 70, "id": "a5d10adb-f9d9-4524-b63c-92edfeef6e35", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "id": "b74e8425-02a2-4ded-adc6-9675208ab808", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 3.0\n", "2 5.0\n", "3 NaN\n", "4 6.0\n", "5 8.0\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series([1, 3, 5, np.nan, 6, 8])\n", "s" ] }, { "cell_type": "code", "execution_count": 6, "id": "3101ecba-961e-4767-a570-cd851da2340c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", " '2013-01-05', '2013-01-06'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates = pd.date_range(\"20130101\", periods=6)\n", "dates" ] }, { "cell_type": "code", "execution_count": 7, "id": "2d830aa9-4414-4bc3-b102-6d6c0aea38b1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.2554451.309822-0.168210-0.584129
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
2013-01-041.471408-1.5398701.2728891.255482
2013-01-05-1.197831-0.1394751.067496-0.351955
2013-01-060.203198-0.5735901.523726-0.788816
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.255445 1.309822 -0.168210 -0.584129\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482\n", "2013-01-05 -1.197831 -0.139475 1.067496 -0.351955\n", "2013-01-06 0.203198 -0.573590 1.523726 -0.788816" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list(\"ABCD\"))\n", "df" ] }, { "cell_type": "code", "execution_count": 8, "id": "914f5adf-1606-44a2-8445-d7b8cea34627", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDEF
01.02013-01-021.03testfoo
11.02013-01-021.03trainfoo
21.02013-01-021.03testfoo
31.02013-01-021.03trainfoo
\n", "
" ], "text/plain": [ " A B C D E F\n", "0 1.0 2013-01-02 1.0 3 test foo\n", "1 1.0 2013-01-02 1.0 3 train foo\n", "2 1.0 2013-01-02 1.0 3 test foo\n", "3 1.0 2013-01-02 1.0 3 train foo" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame(\n", "\n", " {\n", "\n", " \"A\": 1.0,\n", "\n", " \"B\": pd.Timestamp(\"20130102\"),\n", "\n", " \"C\": pd.Series(1, index=list(range(4)), dtype=\"float32\"),\n", "\n", " \"D\": np.array([3] * 4, dtype=\"int32\"),\n", "\n", " \"E\": pd.Categorical([\"test\", \"train\", \"test\", \"train\"]),\n", "\n", " \"F\": \"foo\",\n", "\n", " }\n", "\n", ")\n", "\n", "df2" ] }, { "cell_type": "code", "execution_count": 9, "id": "36105fd1-9f93-407f-ad7c-f5b440ef49a5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A float64\n", "B datetime64[ns]\n", "C float32\n", "D int32\n", "E category\n", "F object\n", "dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.dtypes" ] }, { "cell_type": "markdown", "id": "7cef5d41-82b2-43f1-8d28-35b0af175a66", "metadata": {}, "source": [ "Dependiendo del editor de texto, se puede utilizar `` para tener un despliegue de métodos.\n", " \n", "Normalmente tenemos un archivo (local o remoto), con los datos. Tomaremos un ejemplo del repositorio de datos de [UCI, Machine Learning Repository](https://archive-beta.ics.uci.edu/)\n", "\n", "\n", "Se tomará el siguiente conjunto de datos [Adult](https://archive-beta.ics.uci.edu/ml/datasets/adult) (1996). UCI Machine Learning Repository.\n", "\n", "- age: continuous.\n", "\n", "- workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.\n", "\n", "- fnlwgt: continuous.\n", "\n", "- education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.\n", "\n", "- education-num: continuous.\n", "\n", "- marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.\n", "\n", "- occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.\n", "\n", "- relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.\n", "\n", "- race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.\n", "\n", "- sex: Female, Male.\n", "\n", "- capital-gain: continuous.\n", "\n", "- capital-loss: continuous.\n", "\n", "- hours-per-week: continuous.\n", "\n", "- native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "b6b0afff-8444-4fc8-a2af-66b4571ec3f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
39State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
050Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
138Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
253Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
328Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
437Private284582Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0040United-States<=50K
................................................
3255527Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
3255640Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
3255758Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3255822Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K
3255952Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K
\n", "

32560 rows × 15 columns

\n", "
" ], "text/plain": [ " 39 State-gov 77516 Bachelors 13 Never-married \\\n", "0 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse \n", "1 38 Private 215646 HS-grad 9 Divorced \n", "2 53 Private 234721 11th 7 Married-civ-spouse \n", "3 28 Private 338409 Bachelors 13 Married-civ-spouse \n", "4 37 Private 284582 Masters 14 Married-civ-spouse \n", "... .. ... ... ... ... ... \n", "32555 27 Private 257302 Assoc-acdm 12 Married-civ-spouse \n", "32556 40 Private 154374 HS-grad 9 Married-civ-spouse \n", "32557 58 Private 151910 HS-grad 9 Widowed \n", "32558 22 Private 201490 HS-grad 9 Never-married \n", "32559 52 Self-emp-inc 287927 HS-grad 9 Married-civ-spouse \n", "\n", " Adm-clerical Not-in-family White Male 2174 0 40 \\\n", "0 Exec-managerial Husband White Male 0 0 13 \n", "1 Handlers-cleaners Not-in-family White Male 0 0 40 \n", "2 Handlers-cleaners Husband Black Male 0 0 40 \n", "3 Prof-specialty Wife Black Female 0 0 40 \n", "4 Exec-managerial Wife White Female 0 0 40 \n", "... ... ... ... ... ... .. ... \n", "32555 Tech-support Wife White Female 0 0 38 \n", "32556 Machine-op-inspct Husband White Male 0 0 40 \n", "32557 Adm-clerical Unmarried White Female 0 0 40 \n", "32558 Adm-clerical Own-child White Male 0 0 20 \n", "32559 Exec-managerial Wife White Female 15024 0 40 \n", "\n", " United-States <=50K \n", "0 United-States <=50K \n", "1 United-States <=50K \n", "2 United-States <=50K \n", "3 Cuba <=50K \n", "4 United-States <=50K \n", "... ... ... \n", "32555 United-States <=50K \n", "32556 United-States >50K \n", "32557 United-States <=50K \n", "32558 United-States <=50K \n", "32559 United-States >50K \n", "\n", "[32560 rows x 15 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datos = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data')\n", "datos" ] }, { "cell_type": "code", "execution_count": 13, "id": "d226e42e-b4d1-40df-b387-8d36ca8097f6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234567891011121314
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
................................................
3255627Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
3255740Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3255922Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K
\n", "

32561 rows × 15 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "0 39 State-gov 77516 Bachelors 13 Never-married \n", "1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse \n", "2 38 Private 215646 HS-grad 9 Divorced \n", "3 53 Private 234721 11th 7 Married-civ-spouse \n", "4 28 Private 338409 Bachelors 13 Married-civ-spouse \n", "... .. ... ... ... .. ... \n", "32556 27 Private 257302 Assoc-acdm 12 Married-civ-spouse \n", "32557 40 Private 154374 HS-grad 9 Married-civ-spouse \n", "32558 58 Private 151910 HS-grad 9 Widowed \n", "32559 22 Private 201490 HS-grad 9 Never-married \n", "32560 52 Self-emp-inc 287927 HS-grad 9 Married-civ-spouse \n", "\n", " 6 7 8 9 10 11 12 \\\n", "0 Adm-clerical Not-in-family White Male 2174 0 40 \n", "1 Exec-managerial Husband White Male 0 0 13 \n", "2 Handlers-cleaners Not-in-family White Male 0 0 40 \n", "3 Handlers-cleaners Husband Black Male 0 0 40 \n", "4 Prof-specialty Wife Black Female 0 0 40 \n", "... ... ... ... ... ... .. .. \n", "32556 Tech-support Wife White Female 0 0 38 \n", "32557 Machine-op-inspct Husband White Male 0 0 40 \n", "32558 Adm-clerical Unmarried White Female 0 0 40 \n", "32559 Adm-clerical Own-child White Male 0 0 20 \n", "32560 Exec-managerial Wife White Female 15024 0 40 \n", "\n", " 13 14 \n", "0 United-States <=50K \n", "1 United-States <=50K \n", "2 United-States <=50K \n", "3 United-States <=50K \n", "4 Cuba <=50K \n", "... ... ... \n", "32556 United-States <=50K \n", "32557 United-States >50K \n", "32558 United-States <=50K \n", "32559 United-States <=50K \n", "32560 United-States >50K \n", "\n", "[32561 rows x 15 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datos = pd.read_csv(\n", " 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',\n", " header = None)\n", "datos" ] }, { "cell_type": "code", "execution_count": 23, "id": "851d1d97-941d-48dc-a02d-57b1c3636b53", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageworkclassfnlwgteducationeducation_nummarital_statusoccupationrelationshipracesexcapital_gaincapital_losshours-per-weeknative-countryclass
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
................................................
3255627Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
3255740Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
3255858Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
3255922Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K
3256052Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K
\n", "

32561 rows × 15 columns

\n", "
" ], "text/plain": [ " age workclass fnlwgt education education_num \\\n", "0 39 State-gov 77516 Bachelors 13 \n", "1 50 Self-emp-not-inc 83311 Bachelors 13 \n", "2 38 Private 215646 HS-grad 9 \n", "3 53 Private 234721 11th 7 \n", "4 28 Private 338409 Bachelors 13 \n", "... ... ... ... ... ... \n", "32556 27 Private 257302 Assoc-acdm 12 \n", "32557 40 Private 154374 HS-grad 9 \n", "32558 58 Private 151910 HS-grad 9 \n", "32559 22 Private 201490 HS-grad 9 \n", "32560 52 Self-emp-inc 287927 HS-grad 9 \n", "\n", " marital_status occupation relationship race \\\n", "0 Never-married Adm-clerical Not-in-family White \n", "1 Married-civ-spouse Exec-managerial Husband White \n", "2 Divorced Handlers-cleaners Not-in-family White \n", "3 Married-civ-spouse Handlers-cleaners Husband Black \n", "4 Married-civ-spouse Prof-specialty Wife Black \n", "... ... ... ... ... \n", "32556 Married-civ-spouse Tech-support Wife White \n", "32557 Married-civ-spouse Machine-op-inspct Husband White \n", "32558 Widowed Adm-clerical Unmarried White \n", "32559 Never-married Adm-clerical Own-child White \n", "32560 Married-civ-spouse Exec-managerial Wife White \n", "\n", " sex capital_gain capital_loss hours-per-week native-country \\\n", "0 Male 2174 0 40 United-States \n", "1 Male 0 0 13 United-States \n", "2 Male 0 0 40 United-States \n", "3 Male 0 0 40 United-States \n", "4 Female 0 0 40 Cuba \n", "... ... ... ... ... ... \n", "32556 Female 0 0 38 United-States \n", "32557 Male 0 0 40 United-States \n", "32558 Female 0 0 40 United-States \n", "32559 Male 0 0 20 United-States \n", "32560 Female 15024 0 40 United-States \n", "\n", " class \n", "0 <=50K \n", "1 <=50K \n", "2 <=50K \n", "3 <=50K \n", "4 <=50K \n", "... ... \n", "32556 <=50K \n", "32557 >50K \n", "32558 <=50K \n", "32559 <=50K \n", "32560 >50K \n", "\n", "[32561 rows x 15 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columnas = [\"age\", \"workclass\", \"fnlwgt\", \"education\", \"education_num\",\n", " \"marital_status\", \"occupation\",\"relationship\", \"race\", \"sex\",\n", " \"capital_gain\",\"capital_loss\",\"hours-per-week\", \"native-country\", \"class\"]\n", "\n", "datos = pd.read_csv(\n", " 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',\n", " header = None,\n", " names = columnas\n", " )\n", "datos" ] }, { "cell_type": "markdown", "id": "aed87c76-21db-4bfc-a1d7-8ba252b919be", "metadata": {}, "source": [ "## Desplegar los datos\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "id": "98c0df67-8128-47d8-b856-4be584b72181", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.2554451.309822-0.168210-0.584129
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
2013-01-041.471408-1.5398701.2728891.255482
2013-01-05-1.197831-0.1394751.067496-0.351955
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.255445 1.309822 -0.168210 -0.584129\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482\n", "2013-01-05 -1.197831 -0.139475 1.067496 -0.351955" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 25, "id": "fe9ab02b-3a68-4e9f-9184-3a572205ed08", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
2013-01-041.471408-1.5398701.2728891.255482
2013-01-05-1.197831-0.1394751.067496-0.351955
2013-01-060.203198-0.5735901.523726-0.788816
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482\n", "2013-01-05 -1.197831 -0.139475 1.067496 -0.351955\n", "2013-01-06 0.203198 -0.573590 1.523726 -0.788816" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "code", "execution_count": 27, "id": "020f5bcf-94b9-4a6d-840c-8312cbb450f1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.25544475, 1.30982242, -0.16820985, -0.58412855],\n", " [ 0.44063178, 0.47391843, 0.09211497, 0.99485562],\n", " [ 1.14075548, -0.28870699, 1.07547046, 1.24564126],\n", " [ 1.47140771, -1.53987018, 1.27288894, 1.25548203],\n", " [-1.19783097, -0.13947515, 1.06749632, -0.35195464],\n", " [ 0.20319828, -0.57359001, 1.52372616, -0.78881627]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.to_numpy()" ] }, { "cell_type": "code", "execution_count": 28, "id": "4b989e3b-be14-480b-b31f-75c919f51610", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],\n", " [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],\n", " [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],\n", " [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],\n", " dtype=object)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.to_numpy()" ] }, { "cell_type": "code", "execution_count": 29, "id": "0c721caa-b8df-4f91-a116-c4bcaeb56820", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
count6.0000006.0000006.0000006.000000
mean0.300453-0.1263170.8105810.295180
std0.9654250.9641070.6830380.967690
min-1.197831-1.539870-0.168210-0.788816
25%-0.140784-0.5023690.335960-0.526085
50%0.321915-0.2140911.0714830.321450
75%0.9657250.3205701.2235341.182945
max1.4714081.3098221.5237261.255482
\n", "
" ], "text/plain": [ " A B C D\n", "count 6.000000 6.000000 6.000000 6.000000\n", "mean 0.300453 -0.126317 0.810581 0.295180\n", "std 0.965425 0.964107 0.683038 0.967690\n", "min -1.197831 -1.539870 -0.168210 -0.788816\n", "25% -0.140784 -0.502369 0.335960 -0.526085\n", "50% 0.321915 -0.214091 1.071483 0.321450\n", "75% 0.965725 0.320570 1.223534 1.182945\n", "max 1.471408 1.309822 1.523726 1.255482" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 30, "id": "5eed9cc0-398c-4fa2-bd2a-34db02435e27", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-06
A-0.2554450.4406321.1407551.471408-1.1978310.203198
B1.3098220.473918-0.288707-1.539870-0.139475-0.573590
C-0.1682100.0921151.0754701.2728891.0674961.523726
D-0.5841290.9948561.2456411.255482-0.351955-0.788816
\n", "
" ], "text/plain": [ " 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06\n", "A -0.255445 0.440632 1.140755 1.471408 -1.197831 0.203198\n", "B 1.309822 0.473918 -0.288707 -1.539870 -0.139475 -0.573590\n", "C -0.168210 0.092115 1.075470 1.272889 1.067496 1.523726\n", "D -0.584129 0.994856 1.245641 1.255482 -0.351955 -0.788816" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.T" ] }, { "cell_type": "code", "execution_count": 31, "id": "6dafeefe-49c3-4162-afaa-7e69171e0b6b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DCBA
2013-01-01-0.584129-0.1682101.309822-0.255445
2013-01-020.9948560.0921150.4739180.440632
2013-01-031.2456411.075470-0.2887071.140755
2013-01-041.2554821.272889-1.5398701.471408
2013-01-05-0.3519551.067496-0.139475-1.197831
2013-01-06-0.7888161.523726-0.5735900.203198
\n", "
" ], "text/plain": [ " D C B A\n", "2013-01-01 -0.584129 -0.168210 1.309822 -0.255445\n", "2013-01-02 0.994856 0.092115 0.473918 0.440632\n", "2013-01-03 1.245641 1.075470 -0.288707 1.140755\n", "2013-01-04 1.255482 1.272889 -1.539870 1.471408\n", "2013-01-05 -0.351955 1.067496 -0.139475 -1.197831\n", "2013-01-06 -0.788816 1.523726 -0.573590 0.203198" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_index(axis = 1, ascending = False)" ] }, { "cell_type": "code", "execution_count": 32, "id": "96a85e30-6a1f-47cb-b8f2-eb004d9d30ea", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-041.471408-1.5398701.2728891.255482
2013-01-060.203198-0.5735901.523726-0.788816
2013-01-031.140755-0.2887071.0754701.245641
2013-01-05-1.197831-0.1394751.067496-0.351955
2013-01-020.4406320.4739180.0921150.994856
2013-01-01-0.2554451.309822-0.168210-0.584129
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482\n", "2013-01-06 0.203198 -0.573590 1.523726 -0.788816\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-05 -1.197831 -0.139475 1.067496 -0.351955\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-01 -0.255445 1.309822 -0.168210 -0.584129" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by=\"B\")" ] }, { "cell_type": "markdown", "id": "b5d20d65-8a1a-4546-ab1c-2cb36dd50fcb", "metadata": {}, "source": [ "## Selección" ] }, { "cell_type": "code", "execution_count": 33, "id": "6e77123d-ae60-4683-be83-15fbcfd3b83c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2013-01-01 -0.255445\n", "2013-01-02 0.440632\n", "2013-01-03 1.140755\n", "2013-01-04 1.471408\n", "2013-01-05 -1.197831\n", "2013-01-06 0.203198\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"A\"]" ] }, { "cell_type": "code", "execution_count": 34, "id": "9f874e68-c46e-45e4-8434-295ad779e02c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.2554451.309822-0.168210-0.584129
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.255445 1.309822 -0.168210 -0.584129\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[0:3]" ] }, { "cell_type": "code", "execution_count": 41, "id": "f01f6ef0-f541-40dd-840c-e9f48fafdeb6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
2013-01-041.471408-1.5398701.2728891.255482
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"20130102\":\"20130104\"]" ] }, { "cell_type": "markdown", "id": "86f5b16d-91dc-4afc-ae84-2db2680df46e", "metadata": {}, "source": [ "### Selección con etiqueta" ] }, { "cell_type": "code", "execution_count": 36, "id": "f106e83f-4d88-4dd6-ab10-ec33d484f2e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A -0.255445\n", "B 1.309822\n", "C -0.168210\n", "D -0.584129\n", "Name: 2013-01-01 00:00:00, dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[dates[0]]" ] }, { "cell_type": "code", "execution_count": 37, "id": "d87bc647-7953-4f8c-bba4-8659f55bd426", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-01-0.2554451.309822
2013-01-020.4406320.473918
2013-01-031.140755-0.288707
2013-01-041.471408-1.539870
2013-01-05-1.197831-0.139475
2013-01-060.203198-0.573590
\n", "
" ], "text/plain": [ " A B\n", "2013-01-01 -0.255445 1.309822\n", "2013-01-02 0.440632 0.473918\n", "2013-01-03 1.140755 -0.288707\n", "2013-01-04 1.471408 -1.539870\n", "2013-01-05 -1.197831 -0.139475\n", "2013-01-06 0.203198 -0.573590" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, [\"A\", \"B\"]]" ] }, { "cell_type": "code", "execution_count": 38, "id": "5cb37f5b-2a4d-4ed5-96bc-bc52b39d2ea2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-020.4406320.473918
2013-01-031.140755-0.288707
2013-01-041.471408-1.539870
\n", "
" ], "text/plain": [ " A B\n", "2013-01-02 0.440632 0.473918\n", "2013-01-03 1.140755 -0.288707\n", "2013-01-04 1.471408 -1.539870" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[\"20130102\":\"20130104\", [\"A\", \"B\"]]" ] }, { "cell_type": "code", "execution_count": 39, "id": "bad854c8-77c9-4de9-bedb-2242004c9cee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 0.440632\n", "B 0.473918\n", "Name: 2013-01-02 00:00:00, dtype: float64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[\"20130102\", [\"A\", \"B\"]]" ] }, { "cell_type": "code", "execution_count": 40, "id": "ed635a4c-862b-486f-bce8-5ae276dd99f9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.2554447489300441" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[dates[0], \"A\"]" ] }, { "cell_type": "markdown", "id": "15e40b04-3c37-4b9a-bf08-38a34ad13433", "metadata": {}, "source": [ "### Selección por posición " ] }, { "cell_type": "code", "execution_count": 42, "id": "29813ce7-a710-4bda-a02c-835d4e5209cb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 1.471408\n", "B -1.539870\n", "C 1.272889\n", "D 1.255482\n", "Name: 2013-01-04 00:00:00, dtype: float64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[3]" ] }, { "cell_type": "code", "execution_count": 43, "id": "309bc7b3-330d-43df-b488-6a26841a36f3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-041.471408-1.539870
2013-01-05-1.197831-0.139475
\n", "
" ], "text/plain": [ " A B\n", "2013-01-04 1.471408 -1.539870\n", "2013-01-05 -1.197831 -0.139475" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[3:5, 0:2]" ] }, { "cell_type": "code", "execution_count": 44, "id": "86ea5e59-c804-42ba-909c-97aae546157a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AC
2013-01-020.4406320.092115
2013-01-031.1407551.075470
2013-01-05-1.1978311.067496
\n", "
" ], "text/plain": [ " A C\n", "2013-01-02 0.440632 0.092115\n", "2013-01-03 1.140755 1.075470\n", "2013-01-05 -1.197831 1.067496" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[[1, 2, 4], [0, 2]]" ] }, { "cell_type": "code", "execution_count": 45, "id": "b5ed4a9b-e1df-499e-a17f-31e256ff9568", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[1:3, :]" ] }, { "cell_type": "markdown", "id": "aad559fd-b055-4f5c-af83-f70a0165597e", "metadata": {}, "source": [ "### Indexado lógico" ] }, { "cell_type": "code", "execution_count": 46, "id": "cd27528d-816a-42f5-8455-2006626e85fd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755-0.2887071.0754701.245641
2013-01-041.471408-1.5398701.2728891.255482
2013-01-060.203198-0.5735901.523726-0.788816
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 -0.288707 1.075470 1.245641\n", "2013-01-04 1.471408 -1.539870 1.272889 1.255482\n", "2013-01-06 0.203198 -0.573590 1.523726 -0.788816" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df[\"A\"] > 0]" ] }, { "cell_type": "code", "execution_count": 47, "id": "81365a6a-1db1-4822-8ef9-b0fccbf6db93", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01NaN1.309822NaNNaN
2013-01-020.4406320.4739180.0921150.994856
2013-01-031.140755NaN1.0754701.245641
2013-01-041.471408NaN1.2728891.255482
2013-01-05NaNNaN1.067496NaN
2013-01-060.203198NaN1.523726NaN
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 NaN 1.309822 NaN NaN\n", "2013-01-02 0.440632 0.473918 0.092115 0.994856\n", "2013-01-03 1.140755 NaN 1.075470 1.245641\n", "2013-01-04 1.471408 NaN 1.272889 1.255482\n", "2013-01-05 NaN NaN 1.067496 NaN\n", "2013-01-06 0.203198 NaN 1.523726 NaN" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df > 0]" ] }, { "cell_type": "markdown", "id": "44efd26a-5a06-41af-952f-86f62d37d3ec", "metadata": {}, "source": [ "### Asignación" ] }, { "cell_type": "code", "execution_count": 49, "id": "c98a2df9-679f-4373-aa45-6ce6a01312a8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2013-01-02 1\n", "2013-01-03 2\n", "2013-01-04 3\n", "2013-01-05 4\n", "2013-01-06 5\n", "2013-01-07 6\n", "Freq: D, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range(\"20130102\", periods=6))\n", "s1" ] }, { "cell_type": "code", "execution_count": 52, "id": "2be639c9-008a-4e84-85d2-2aca435a4851", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-010.0000001.309822-0.1682105
2013-01-020.4406320.4739180.0921155
2013-01-031.140755-0.2887071.0754705
2013-01-041.471408-1.5398701.2728895
2013-01-05-1.197831-0.1394751.0674965
2013-01-060.203198-0.5735901.5237265
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 0.000000 1.309822 -0.168210 5\n", "2013-01-02 0.440632 0.473918 0.092115 5\n", "2013-01-03 1.140755 -0.288707 1.075470 5\n", "2013-01-04 1.471408 -1.539870 1.272889 5\n", "2013-01-05 -1.197831 -0.139475 1.067496 5\n", "2013-01-06 0.203198 -0.573590 1.523726 5" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, \"D\"] = np.array([5] * len(df))\n", "df" ] }, { "cell_type": "markdown", "id": "f447ff68-56ce-4017-ad4e-20952c5bedeb", "metadata": {}, "source": [ "## Datos Faltantes" ] }, { "cell_type": "code", "execution_count": 53, "id": "41a99d49-7216-4ef4-b959-cbb092baca54", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-010.0000001.309822-0.16821051.0
2013-01-020.4406320.4739180.09211551.0
2013-01-031.140755-0.2887071.0754705NaN
2013-01-041.471408-1.5398701.2728895NaN
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 0.000000 1.309822 -0.168210 5 1.0\n", "2013-01-02 0.440632 0.473918 0.092115 5 1.0\n", "2013-01-03 1.140755 -0.288707 1.075470 5 NaN\n", "2013-01-04 1.471408 -1.539870 1.272889 5 NaN" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + [\"E\"])\n", "df1.loc[dates[0] : dates[1], \"E\"] = 1\n", "df1" ] }, { "cell_type": "code", "execution_count": 54, "id": "746dfad9-b955-476b-a676-1978af62c5a5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-010.0000001.309822-0.16821051.0
2013-01-020.4406320.4739180.09211551.0
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 0.000000 1.309822 -0.168210 5 1.0\n", "2013-01-02 0.440632 0.473918 0.092115 5 1.0" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.dropna(how=\"any\")" ] }, { "cell_type": "code", "execution_count": 55, "id": "d9ab4ddf-de54-4025-b4d6-53f5153cff1f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-010.0000001.309822-0.16821051.0
2013-01-020.4406320.4739180.09211551.0
2013-01-031.140755-0.2887071.07547055.0
2013-01-041.471408-1.5398701.27288955.0
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 0.000000 1.309822 -0.168210 5 1.0\n", "2013-01-02 0.440632 0.473918 0.092115 5 1.0\n", "2013-01-03 1.140755 -0.288707 1.075470 5 5.0\n", "2013-01-04 1.471408 -1.539870 1.272889 5 5.0" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.fillna(value=5)" ] }, { "cell_type": "code", "execution_count": 56, "id": "c4c1802a-9fca-451c-a80d-3dbfc74c750b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-01FalseFalseFalseFalseFalse
2013-01-02FalseFalseFalseFalseFalse
2013-01-03FalseFalseFalseFalseTrue
2013-01-04FalseFalseFalseFalseTrue
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 False False False False False\n", "2013-01-02 False False False False False\n", "2013-01-03 False False False False True\n", "2013-01-04 False False False False True" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.isna(df1)" ] }, { "cell_type": "markdown", "id": "f98c89bf-dd2f-4147-b79c-ab901e3ac577", "metadata": {}, "source": [ "## Operaciones" ] }, { "cell_type": "code", "execution_count": 58, "id": "4e58f4e1-8d2b-483b-bbd8-7b0d2a9822be", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 0.343027\n", "B -0.126317\n", "C 0.810581\n", "D 5.000000\n", "dtype: float64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "code", "execution_count": 59, "id": "a14bb1f2-8c8a-42ca-b970-527855bc02ae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2013-01-01 1.535403\n", "2013-01-02 1.501666\n", "2013-01-03 1.731880\n", "2013-01-04 1.551107\n", "2013-01-05 1.182548\n", "2013-01-06 1.538334\n", "Freq: D, dtype: float64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean(1)" ] }, { "cell_type": "code", "execution_count": 60, "id": "c2267b92-7c80-47b5-8571-adb542a67784", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-010.0000001.309822-0.1682105
2013-01-020.4406321.783741-0.07609510
2013-01-031.5813871.4950340.99937615
2013-01-043.052795-0.0448362.27226520
2013-01-051.854964-0.1843113.33976125
2013-01-062.058162-0.7579014.86348730
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 0.000000 1.309822 -0.168210 5\n", "2013-01-02 0.440632 1.783741 -0.076095 10\n", "2013-01-03 1.581387 1.495034 0.999376 15\n", "2013-01-04 3.052795 -0.044836 2.272265 20\n", "2013-01-05 1.854964 -0.184311 3.339761 25\n", "2013-01-06 2.058162 -0.757901 4.863487 30" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.apply(np.cumsum)" ] }, { "cell_type": "markdown", "id": "c968adfe-59fc-4a71-b1b0-fd5a04b9dc0a", "metadata": {}, "source": [ "## Gráficas" ] }, { "cell_type": "code", "execution_count": 64, "id": "787d91a9-1330-4331-a78b-1c8a51868f99", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ts = pd.Series(np.random.randn(1000), index=pd.date_range(\"1/1/2000\", periods=1000))\n", "ts = ts.cumsum()\n", "ts.plot(); # Investigar tarea moral" ] }, { "cell_type": "code", "execution_count": 67, "id": "7f46063e-26b8-4e5f-a7d5-5351fe0d6d9c", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df = pd.DataFrame(\n", "\n", " np.random.randn(1000, 4), index=ts.index, columns=[\"A\", \"B\", \"C\", \"D\"]\n", "\n", ")\n", "\n", "df = df.cumsum()\n", "df.plot();\n" ] }, { "cell_type": "markdown", "id": "86b282c3-263a-4b27-9147-136305dd64fd", "metadata": {}, "source": [ "## Guardar" ] }, { "cell_type": "code", "execution_count": 68, "id": "e45943f3-f14b-4bea-9f85-38db428223b2", "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"foo.csv\")" ] }, { "cell_type": "code", "execution_count": 69, "id": "7a7b9b6f-554e-4f14-9848-f14b3deac0a0", "metadata": {}, "outputs": [], "source": [ "df.to_excel(\"foo.xlsx\", sheet_name=\"Sheet1\")" ] }, { "cell_type": "code", "execution_count": null, "id": "819e4230-e352-4a84-b41d-aa49d96b1bca", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc-showcode": false, "toc-showmarkdowntxt": false, "toc-showtags": true }, "nbformat": 4, "nbformat_minor": 5 }