# Binarize the labels
# print(class_names)
# lb = label_binarize(y = y, classes = list(class_names))
# classes.remove('unknown')
# lb.fit(y) #for LabelBinarizer not lable_binerize()
# lb.classes_ #for LabelBinarizer not lable_binerize
# Split the training data for cross validation
(X_train, X_test), (y_train, y_test) = train_test_split(X, y, test_size=0.2,
random_state=0)
df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
##### Dimensionality Reduction ####
Error Message::
File "<ipython-input-50-1c94ab12f530>", line 10
(X_train, X_test), (y_train, y_test) = train_test_split(X, y, test_size=0.2,
IndentationError: unexpected indent
Hello fngwira.
I have edited your post for readability. In the future, use Markdown to format your posts, by placing any code in between backticks (`).
To answer your question:
Remove the parentheses around your split output variables.
X_train, X_test, y_train, y_test = ...
Hope this helps
@Sky020 the error message is still there:
File “”, line 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
IndentationError: unexpected indent
This is my Code:::
def ML_with_CV_feat(cv_feat_file='../data/cv_feat.csv', n_comp=100,
plotting=False):
# Importing the bottleneck features for each image
feat_df = pd.read_csv(cv_feat_file, index_col=0, dtype='unicode')
##-- Dealing with NaN
feat_df.fillna(0, inplace=True)
feat_df['blob_detected'] = feat_df['blob_detected']*1
#['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red', 'blob_detected', 'num_of_blobs', 'average_blob_area']
# feat_df = feat_df.sample(frac=0.01)
feat_df.drop(columns=['cell_area', 'cell_eccentricity', 'cell_solidity',
'average_blue', 'average_green', 'average_red'],
inplace=True)
#Removing features that do not seperate populations of cell class
column_names = feat_names = list(feat_df.columns)
print(column_names)
for X in ['label','fn']:
feat_names.remove(x)
# feat_df = feat_df.iloc[0:300,:]
mask = feat_df.loc[:, 'label'].isin(['Infected', 'Uninfected'])
feat_df = feat_df.loc[mask, :].drop_duplicates()
print('Number of features:', len(feat_names))
y = feat_df.loc[:,['label']].values
print(type(y), y.shape)
print('Number of samples for each label \n', feat_df.groupby('label')['label'].count())
# print(feat_df.head())
X = feat_df.loc[:, feat_names].astype(float).values
print('/nColumn feat names after placing into X',
list(feat_df.loc[:, feat_names].columns))
class_names = set(feat_df.loc[:,'label'])
# Binarize the labels
# print(class_names)
# lb = label_binarize(y = y, classes = list(class_names))
# classes.remove('unknown')
# lb.fit(y) #for LabelBinarizer not lable_binerize()
# lb.classes_ #for LabelBinarizer not lable_binerize
# Split the training data for cross validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
##### Dimensionality Reduction ####
Error Message:: File “”, line 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
IndentationError: unexpected indent
If you want this inside the function ML_with_CV_feat()
:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Then, add however many spaces (indents) you need to this:
class_names = set(feat_df.loc[:,'label'])
So that it is the same level as all the other code inside the function.
If you do not want the split testing data to be defined inside the function, then make it the same indentation as the class_names
variable.
In Python, the indentation of your code defines what section goes with another.
Hope this helps
Use this:
def ML_with_CV_feat(cv_feat_file='../data/cv_feat.csv', n_comp=100, plotting=False):
feat_df = pd.read_csv(cv_feat_file, index_col=0, dtype='unicode')
feat_df.fillna(0, inplace=True)
feat_df['blob_detected'] = feat_df['blob_detected']*1
#['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red', 'blob_detected', 'num_of_blobs', 'average_blob_area']
#feat_df = feat_df.sample(frac=0.01)
feat_df.drop(columns=['cell_area', 'cell_eccentricity', 'cell_solidity', 'average_blue', 'average_green', 'average_red'], inplace=True)
column_names = feat_names = list(feat_df.columns)
print(column_names)
for X in ['label','fn']: #! THIS DOES NOT MAKE SENSE
feat_names.remove(x) #CHOOSE TO USE 'X' OR 'x'...WHAT IS 'x'?
#feat_df = feat_df.iloc[0:300,:]
mask = feat_df.loc[:, 'label'].isin(['Infected', 'Uninfected'])
feat_df = feat_df.loc[mask, :].drop_duplicates()
print('Number of features:', len(feat_names))
y = feat_df.loc[:,['label']].values
print(type(y), y.shape)
print('Number of samples for each label \n', feat_df.groupby('label')['label'].count())
X = feat_df.loc[:, feat_names].astype(float).values
print('/nColumn feat names after placing into X', list(feat_df.loc[:, feat_names].columns))
class_names = set(feat_df.loc[:,'label'])
# print(class_names)
#lb = label_binarize(y = y, classes = list(class_names))
# classes.remove('unknown')
# lb.fit(y) #for LabelBinarizer not lable_binerize()
# lb.classes_ #for LabelBinarizer not lable_binerize
# Split the training data for cross validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
df_y_train = pd.DataFrame(y_train, columns=['label']) #,'Date','group_idx'])
print('df_y_train.shape', df_y_train.shape,'X_train', X_train.shape)
##### Dimensionality Reduction ####
Try that. Look out for my comments that I added in CAPITAL LETTERS
As the error message indicates, you have an indentation error . This error occurs when a statement is unnecessarily indented or its indentation does not match the indentation of former statements in the same block. Python not only insists on indentation, it insists on consistent indentation . You are free to choose the number of spaces of indentation to use, but you then need to stick with it. If you indent one line by 4 spaces, but then indent the next by 2 (or 5, or 10, or …), you’ll get this error.
However, by default, mixing tabs and spaces is still allowed in Python 2 , but it is highly recommended not to use this “feature”. Python 3 disallows mixing the use of tabs and spaces for indentation. Replacing tabs with 4 spaces is the recommended approach for writing Python code .
Hi @fillermark!
This post has not been active for over a year.
Please only reply to newer topics.
Thanks!