37 38 37 38 38 40 37 37 38 37 37
What's wrong? Why (look at the data)? How do we fix it?
37 37 37 37 37 37 37 37 37 37 37
or...
pandas
¶http://pandas.pydata.org/index.html
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.
Basically, if you have a big spreadsheet of data with mixed types.
C:\Users\Akhlore\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3071: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False. has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
Note: This is a large dataset so I am preemptively selecting a subset of columns. Try to avoid loading this data more than once as it may take up all your memory and you'll have to restart the python kernel
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 37.0 | F | 8 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | $9546.85 |
1 | 37.0 | F | 3 | 146 | Diverticulosis and diverticulitis | 0 | NO PROC | 90335341.0 | NaN | $11462.75 |
2 | 37.0 | M | 1 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $1609.40 |
3 | 37.0 | F | 1 | 154 | Noninfectious gastroenteritis | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $2638.75 |
4 | 37.0 | F | 3 | 124 | Acute and chronic tonsillitis | 0 | NO PROC | 90335341.0 | NaN | $3538.25 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2367278 | 943.0 | F | 1 | 245 | Syncope | 0 | NO PROC | 156102.0 | NaN | $10074.00 |
2367279 | 943.0 | M | 1 | 149 | Biliary tract disease | 0 | NO PROC | 267443.0 | NaN | $21252.00 |
2367280 | 943.0 | F | 1 | 102 | Nonspecific chest pain | 0 | NO PROC | 267443.0 | NaN | $11673.00 |
2367281 | 943.0 | F | 1 | 660 | Alcohol-related disorders | 171 | SUTURE SKIN/SUBCUT TISS | 267443.0 | 249597.0 | $16722.00 |
2367282 | 943.0 | M | 1 | 2 | Septicemia (except in labor) | 0 | NO PROC | 251080.0 | NaN | $11637.00 |
2367283 rows × 10 columns
array([[37.0, 'F', '8', ..., 90335341.0, nan, '$9546.85'], [37.0, 'F', '3', ..., 90335341.0, nan, '$11462.75'], [37.0, 'M', '1', ..., 90335341.0, 167816.0, '$1609.40'], ..., [943.0, 'F', '1', ..., 267443.0, nan, '$11673.00'], [943.0, 'F', '1', ..., 267443.0, 249597.0, '$16722.00'], [943.0, 'M', '1', ..., 251080.0, nan, '$11637.00']], dtype=object)
Columns (and rows) have names that you can use to access them.
Index(['Facility ID', 'Gender', 'Length of Stay', 'CCS Diagnosis Code', 'CCS Diagnosis Description', 'CCS Procedure Code', 'CCS Procedure Description', 'Attending Provider License Number', 'Operating Provider License Number', 'Total Charges'], dtype='object')
0 F 1 F 2 M 3 F 4 F .. 2367278 F 2367279 M 2367280 F 2367281 F 2367282 M Name: Gender, Length: 2367283, dtype: object
0 197 1 146 2 50 Name: CCS Diagnosis Code, dtype: int64
[]
slices by rows, but indexes by column name - must provide range or it interprets the index as a column label.
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 37.0 | F | 8 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | $9546.85 |
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2645 try: -> 2646 return self._engine.get_loc(key) 2647 except KeyError: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0 During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) <ipython-input-59-3512e1a6c539> in <module> ----> 1 data[0] ~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key) -> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer] ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2646 return self._engine.get_loc(key) 2647 except KeyError: -> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2650 if indexer.ndim > 1 or indexer.size > 1: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0
iloc
: Position indexing¶If you want to reference a pandas data frame with position based indexing, use .iloc - work's just like numpy
'F'
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges $9546.85 Name: 0, dtype: object
Pandas uses NaN to indicate missing data
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges 9546.85 Charge per day 1193.36 Name: 0, dtype: object
Facility ID 37 Gender F Length of Stay 8 CCS Diagnosis Code 197 CCS Diagnosis Description Skin and subcutaneous tissue infections CCS Procedure Code 0 CCS Procedure Description NO PROC Attending Provider License Number 9.03353e+07 Operating Provider License Number NaN Total Charges $9546.85 Name: 0, dtype: object
.loc
: Label indexing¶You can also index by the label names. Note the rows are being indexed by their named index.
0 F 1 F 2 M 3 F Name: Gender, dtype: object
CCS Diagnosis Description | CCS Diagnosis Code | |
---|---|---|
0 | Skin and subcutaneous tissue infections | 197 |
3 | Noninfectious gastroenteritis | 154 |
5 | Influenza | 123 |
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
10 | 37.0 | F | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4566.15 |
11 | 37.0 | M | 7 | 122 | Pneumonia (except that caused by tuberculosis ... | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $9822.90 |
12 | 37.0 | M | 3 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $5063.05 |
13 | 37.0 | M | 2 | 155 | Other gastrointestinal disorders | 0 | NO PROC | 90335341.0 | NaN | $3125.75 |
14 | 37.0 | F | 3 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $5055.45 |
15 | 37.0 | F | 6 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $9734.05 |
16 | 37.0 | F | 4 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $7168.05 |
17 | 37.0 | F | 2 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90301264.0 | NaN | $2812.85 |
18 | 37.0 | M | 3 | 58 | Other nutritional; endocrine; and metabolic di... | 0 | NO PROC | 90335341.0 | NaN | $3377.50 |
19 | 37.0 | F | 2 | 125 | Acute bronchitis | 0 | NO PROC | 90335341.0 | NaN | $3214.25 |
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-67-4009993f2f6d> in <module> ----> 1 data[10:20].loc[[0,3,5]] ~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key) 1766 1767 maybe_callable = com.apply_if_callable(key, self.obj) -> 1768 return self._getitem_axis(maybe_callable, axis=axis) 1769 1770 def _is_scalar_access(self, key: Tuple): ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis) 1952 raise ValueError("Cannot index with multidimensional key") 1953 -> 1954 return self._getitem_iterable(key, axis=axis) 1955 1956 # nested tuple slicing ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis) 1593 else: 1594 # A collection of keys -> 1595 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False) 1596 return self.obj._reindex_with_indexers( 1597 {axis: [keyarr, indexer]}, copy=True, allow_dups=True ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing) 1550 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr) 1551 -> 1552 self._validate_read_indexer( 1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing 1554 ) ~\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing) 1638 if missing == len(indexer): 1639 axis_name = self.obj._get_axis_name(axis) -> 1640 raise KeyError(f"None of [{key}] are in the [{axis_name}]") 1641 1642 # We (temporarily) allow for some missing keys with .loc, except in KeyError: "None of [Int64Index([0, 3, 5], dtype='int64')] are in the [index]"
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
10 | 37.0 | F | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4566.15 |
13 | 37.0 | M | 2 | 155 | Other gastrointestinal disorders | 0 | NO PROC | 90335341.0 | NaN | $3125.75 |
15 | 37.0 | F | 6 | 127 | Chronic obstructive pulmonary disease and bron... | 0 | NO PROC | 90335341.0 | NaN | $9734.05 |
Just like numpy
we can index by a boolean array or an array of indices.
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 37.0 | M | 1 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | $1609.40 |
6 | 37.0 | M | 5 | 122 | Pneumonia (except that caused by tuberculosis ... | 0 | NO PROC | 90335341.0 | NaN | $6148.10 |
7 | 37.0 | M | 3 | 123 | Influenza | 0 | NO PROC | 90335341.0 | NaN | $4204.15 |
Facility ID | CCS Diagnosis Code | CCS Procedure Code | |
---|---|---|---|
0 | 37.0 | 197 | 0 |
1 | 37.0 | 146 | 0 |
2 | 37.0 | 50 | 202 |
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
1426998 | 1439.0 | M | 1 | 108 | Congestive heart failure; nonhypertensive | 0 | NO PROC | 251948.0 | NaN | $0.50 |
780065 | 989.0 | M | 2 | 657 | Mood disorders | 218 | PSYCHO/PSYCHI EVAL/THER | 199003.0 | 154247.0 | $1.00 |
781531 | 989.0 | F | 1 | 661 | Substance-related disorders | 0 | NO PROC | 145418.0 | NaN | $1.00 |
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
77722 | 213.0 | F | 10 | 100 | Acute myocardial infarction | 44 | COR ARTERY BYP GRF-CABG | 224769.0 | 224769.0 | $99999.65 |
451015 | 635.0 | F | 1 | 115 | Aortic; peripheral; and visceral artery aneurysms | 52 | AORTIC RESECTION; REPL | 232988.0 | 232988.0 | $99999.46 |
1579047 | 1456.0 | M | 12 | 2 | Septicemia (except in labor) | 157 | AMPUTATE LOWER EXTRMITY | 258717.0 | 265448.0 | $99998.45 |
Can apply standard string functions to all cells. This returns the changed value; it does not mutate in place.
The above overwrites the previous Total Charges column to be a floating point number instead of a string with a dollar sign.
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | |
---|---|---|---|---|---|---|---|---|---|---|
967254 | 1169.0 | M | 120 + | 63 | Diseases of white blood cells | 64 | BONE MARROW TRANSPLANT | 198304.0 | 229870.0 | 8593455.88 |
1560003 | 1456.0 | F | 120 + | 143 | Abdominal hernia | 86 | OTHER HERNIA REPAIR | 165181.0 | 165181.0 | 6272871.31 |
957685 | 1169.0 | F | 120 + | 6 | Hepatitis | 176 | OT ORGAN TRANSPLANTATN | 236414.0 | 183253.0 | 5745201.42 |
Facility ID | Gender | Length of Stay | CCS Diagnosis Code | CCS Diagnosis Description | CCS Procedure Code | CCS Procedure Description | Attending Provider License Number | Operating Provider License Number | Total Charges | Charge per day | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 37.0 | F | 8.0 | 197 | Skin and subcutaneous tissue infections | 0 | NO PROC | 90335341.0 | NaN | 9546.85 | 1193.356250 |
1 | 37.0 | F | 3.0 | 146 | Diverticulosis and diverticulitis | 0 | NO PROC | 90335341.0 | NaN | 11462.75 | 3820.916667 |
2 | 37.0 | M | 1.0 | 50 | Diabetes mellitus with complications | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | 1609.40 | 1609.400000 |
3 | 37.0 | F | 1.0 | 154 | Noninfectious gastroenteritis | 202 | ELECTROCARDIOGRAM | 90335341.0 | 167816.0 | 2638.75 | 2638.750000 |
4 | 37.0 | F | 3.0 | 124 | Acute and chronic tonsillitis | 0 | NO PROC | 90335341.0 | NaN | 3538.25 | 1179.416667 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2367278 | 943.0 | F | 1.0 | 245 | Syncope | 0 | NO PROC | 156102.0 | NaN | 10074.00 | 10074.000000 |
2367279 | 943.0 | M | 1.0 | 149 | Biliary tract disease | 0 | NO PROC | 267443.0 | NaN | 21252.00 | 21252.000000 |
2367280 | 943.0 | F | 1.0 | 102 | Nonspecific chest pain | 0 | NO PROC | 267443.0 | NaN | 11673.00 | 11673.000000 |
2367281 | 943.0 | F | 1.0 | 660 | Alcohol-related disorders | 171 | SUTURE SKIN/SUBCUT TISS | 267443.0 | 249597.0 | 16722.00 | 16722.000000 |
2367282 | 943.0 | M | 1.0 | 2 | Septicemia (except in labor) | 0 | NO PROC | 251080.0 | NaN | 11637.00 | 11637.000000 |
2367283 rows × 11 columns
nan
'120 +'
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string "120 +" During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-81-260213ec4baa> in <module> ----> 1 pd.to_numeric('120 +') ~\anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast) 147 coerce_numeric = errors not in ("ignore", "raise") 148 try: --> 149 values = lib.maybe_convert_numeric( 150 values, set(), coerce_numeric=coerce_numeric 151 ) pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric() ValueError: Unable to parse string "120 +" at position 0
Group records that have the same value for a column
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FFD926BBB0>
We can then apply an aggregating function to these groups.
Length of Stay | CCS Diagnosis Code | CCS Procedure Code | Attending Provider License Number | Operating Provider License Number | Total Charges | Charge per day | |
---|---|---|---|---|---|---|---|
Facility ID | |||||||
377.0 | 2.670732 | 121.170732 | 202.439024 | 4.664694e+07 | 4.826434e+07 | 4060.734024 | 1614.759785 |
111.0 | 3.611111 | 159.777778 | 0.000000 | 1.929986e+05 | NaN | 4492.587778 | 1473.003327 |
37.0 | 3.162791 | 123.906977 | 54.779070 | 8.298531e+07 | 7.666251e+06 | 4939.879651 | 1628.579552 |
9250.0 | 2.203456 | 205.840173 | 137.955076 | 1.082429e+07 | 1.264187e+07 | 5618.928860 | 2497.869519 |
165.0 | 4.597884 | 180.206349 | 181.920635 | 2.051072e+05 | 4.744077e+07 | 6376.906455 | 1580.321499 |
... | ... | ... | ... | ... | ... | ... | ... |
563.0 | 6.279619 | 129.367643 | 78.085210 | 2.363745e+05 | 6.581608e+05 | 89385.986450 | 20937.033431 |
1446.0 | 5.633545 | 214.859847 | 157.190406 | 3.497396e+05 | 3.955800e+05 | 124204.521077 | 36290.348526 |
1139.0 | 8.937250 | 205.031446 | 87.044577 | 3.341266e+05 | 3.847957e+05 | 131992.454991 | 17629.688410 |
1138.0 | 38.527005 | 215.072013 | 213.108020 | 1.963923e+05 | 1.966056e+05 | 175997.805859 | 4195.405483 |
1486.0 | 61.821429 | 134.039683 | 194.801587 | 1.863599e+05 | 1.921974e+05 | 195780.272579 | 2486.885826 |
215 rows × 7 columns
Total Charges | |
---|---|
Gender | |
U | 9858.198939 |
F | 36867.585805 |
M | 44419.344760 |
The group by column has become an indexing column. Need to reset_index
to convert index to columns.
Gender | Total Charges | |
---|---|---|
0 | U | 9858.198939 |
1 | F | 36867.585805 |
2 | M | 44419.344760 |
CCS Procedure Description Gender ABDOMINAL PARACENTESIS F 56996.536195 M 58532.559231 ABORTION (TERM OF PREG) F 26512.622947 ALCO/DRUG REHAB/DETOX F 18954.578164 M 18187.073614 ... UPPER GI X-RAY M 31221.460246 URETERAL CATHETERIZATN F 36143.329481 M 34182.121658 VARI VEIN STRIP;LOW LMB F 42519.938571 M 65763.081875 Name: Total Charges, Length: 446, dtype: float64
unstack Pivot a level of the (necessarily hierarchical) index labels.
Gender | F | M | U |
---|---|---|---|
CCS Procedure Description | |||
ABDOMINAL PARACENTESIS | 56996.536195 | 58532.559231 | NaN |
ABORTION (TERM OF PREG) | 26512.622947 | NaN | NaN |
ALCO/DRUG REHAB/DETOX | 18954.578164 | 18187.073614 | NaN |
AMPUTATE LOWER EXTRMITY | 100926.639344 | 92821.328596 | NaN |
AORTIC RESECTION; REPL | 144019.629159 | 114210.484050 | NaN |
... | ... | ... | ... |
UNGROUPABLE | 68101.627500 | 105811.388636 | NaN |
UP GASTRO ENDOSC/BIOPSY | 42625.453678 | 43340.006917 | NaN |
UPPER GI X-RAY | 28339.654074 | 31221.460246 | NaN |
URETERAL CATHETERIZATN | 36143.329481 | 34182.121658 | NaN |
VARI VEIN STRIP;LOW LMB | 42519.938571 | 65763.081875 | NaN |
233 rows × 3 columns
CCS Procedure Description ABDOMINAL PARACENTESIS -1536.023036 ABORTION (TERM OF PREG) NaN ALCO/DRUG REHAB/DETOX 767.504550 AMPUTATE LOWER EXTRMITY 8105.310748 AORTIC RESECTION; REPL 29809.145110 ... UNGROUPABLE -37709.761136 UP GASTRO ENDOSC/BIOPSY -714.553239 UPPER GI X-RAY -2881.806172 URETERAL CATHETERIZATN 1961.207823 VARI VEIN STRIP;LOW LMB -23243.143304 Length: 233, dtype: float64
CCS Procedure Description ABDOMINAL PARACENTESIS -1536.023036 ALCO/DRUG REHAB/DETOX 767.504550 AMPUTATE LOWER EXTRMITY 8105.310748 AORTIC RESECTION; REPL 29809.145110 APPENDECTOMY 209.527573 ... UNGROUPABLE -37709.761136 UP GASTRO ENDOSC/BIOPSY -714.553239 UPPER GI X-RAY -2881.806172 URETERAL CATHETERIZATN 1961.207823 VARI VEIN STRIP;LOW LMB -23243.143304 Length: 202, dtype: float64
CCS Procedure Description EXTRA CIRC AUX OPEN HRT -69453.636115 DX PRCS ON EYE -63462.345439 SWAN-GANZ CATH MONITOR -46738.846837 PRCS ON SPLEEN -46219.301962 CORNEAL TRANSPLANT -45028.501148 dtype: float64
CCS Procedure Description LENS & CATARACT PRCS 35378.240602 OT INTRAOCULAR THER PRC 35907.146236 BONE MARROW TRANSPLANT 37066.859998 DES LES RETINA/CHOROID 52461.621223 CORONARY THROMBOLYSIS 87870.550000 dtype: float64
pd.concat
concatenates rows (i.e., default axis=0) while merging columns with the same name.
A | B | C | D | |
---|---|---|---|---|
0 | A0 | B0 | C0 | D0 |
1 | A1 | B1 | C1 | D1 |
2 | A2 | B2 | C2 | D2 |
3 | A3 | B3 | C3 | D3 |
B | D | F | |
---|---|---|---|
2 | B2 | D2 | F2 |
3 | B3 | D3 | F3 |
6 | B6 | D6 | F6 |
7 | B7 | D7 | F7 |
A | B | C | D | F | |
---|---|---|---|---|---|
0 | A0 | B0 | C0 | D0 | NaN |
1 | A1 | B1 | C1 | D1 | NaN |
2 | A2 | B2 | C2 | D2 | NaN |
3 | A3 | B3 | C3 | D3 | NaN |
2 | NaN | B2 | NaN | D2 | F2 |
3 | NaN | B3 | NaN | D3 | F3 |
6 | NaN | B6 | NaN | D6 | F6 |
7 | NaN | B7 | NaN | D7 | F7 |
You can join two tables on a specific column (or columns). Rows that has the same value (or key) in that column will be combined.
key | A | B | |
---|---|---|---|
0 | K0 | A0 | B0 |
1 | K1 | A1 | B1 |
2 | K2 | A2 | B2 |
3 | K3 | A3 | B3 |
key | C | D | |
---|---|---|---|
0 | K1 | C1 | D1 |
1 | K2 | C2 | D2 |
2 | K4 | C4 | D4 |
key | A | B | C | D | |
---|---|---|---|---|---|
0 | K1 | A1 | B1 | C1 | D1 |
1 | K2 | A2 | B2 | C2 | D2 |
key | A | B | C | D | |
---|---|---|---|---|---|
0 | K0 | A0 | B0 | NaN | NaN |
1 | K1 | A1 | B1 | C1 | D1 |
2 | K2 | A2 | B2 | C2 | D2 |
3 | K3 | A3 | B3 | NaN | NaN |
4 | K4 | NaN | NaN | C4 | D4 |
key | A | B | C | D | |
---|---|---|---|---|---|
0 | K0 | A0 | B0 | NaN | NaN |
1 | K1 | A1 | B1 | C1 | D1 |
2 | K2 | A2 | B2 | C2 | D2 |
3 | K3 | A3 | B3 | NaN | NaN |
Download: https://mscbio2025.csb.pitt.edu/files/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2014.csv
Is the percentage of people who go to these facilities with the most common diagnosis and receive a coronary bypass significantly different between these two groups?
What about knee replacements?