== check python ===================================================
python version: 3.7.3
python branch:
python build version: ('default', 'Jun 27 2019 23:31:30')
python compiler version: GCC 5.5.0
python implementation: CPython
== check os platform ===============================================
os: Linux
os kernel version: #1 SMP Tue Jul 30 17:17:50 UTC 2019
os release version: 4.9.184-0.1.ac.235.83.329.metal1.x86_64
os platform: Linux-4.9.184-0.1.ac.235.83.329.metal1.x86_64-x86_64-with-redhat-5.3-Tikanga
linux distribution: ('Red Hat Enterprise Linux Server', '5.3', 'Tikanga')
linux os distribution: ('redhat', '5.3', 'Tikanga')
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='dev-dsk-jsahewal-1b-01f89cec.us-east-1.amazon.com', release='4.9.184-0.1.ac.235.83.329.metal1.x86_64', version='#1 SMP Tue Jul 30 17:17:50 UTC 2019', machine='x86_64', processor='x86_64')
architecture: ('64bit', 'ELF')
machine: x86_64
== are we in docker =============================================
== compiler =====================================================
c++ (Homebrew gcc 5.5.0_4) 5.5.0
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== check pips ===================================================
numpy 1.18.1
protobuf 3.9.2
tensorflow-cpu 2.1.0
tensorflow-estimator 2.1.0
== check for virtualenv =========================================
False
== tensorflow import ============================================
tf.version.VERSION = 2.1.0
tf.version.GIT_VERSION = v2.1.0-rc2-17-ge5bf8de
tf.version.COMPILER_VERSION = 7.3.1 20180303
== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset
== nvidia-smi ===================================================
tf_env_collect.sh: line 145: nvidia-smi: command not found
== cuda libs ===================================================
== tensorflow installed from info ==================
== python version ==============================================
(major, minor, micro, releaselevel, serial)
(3, 7, 3, 'final', 0)
== bazel version ===============================================
Describe the current behavior
2020-01-27 22:56:12.796098: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2800045000 Hz
2020-01-27 22:56:12.797977: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bad450 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-27 22:56:12.798009: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-01-27 22:56:12.806071: F tensorflow/core/framework/tensor_shape.cc:353] Check failed: 0 <= new_num_elements (0 vs. -4523975925047186833)
[1] 10698 abort python3 demo.py
Describe the expected behavior
Should not abort
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import feature_column
from tensorflow.keras import layers
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('target')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
# Replicate the dataFrame to make it a bit bigger
temp = dataframe.copy()
for _ in range(10):
dataframe = pd.concat([dataframe, temp.copy()])
# Replicate the columns to make it a bit bigger
col_lst = dataframe.columns.tolist()
for i in range(10):
for ind, col in enumerate(col_lst):
dataframe[f'{i+1}_{ind}'] = dataframe[col]
batch_size = 5 # A small batch sized is used for demonstration purposes
dataframe_ds = df_to_dataset(dataframe, batch_size=batch_size)
I'm having a similar issue. Everything's ok if I create a dataset from a pd.DataFrame
containing 100k samples. However, if I open a pd.DataFrame
containing 1k samples, then duplicate it to 2k samples, it crashes.
I'm working on TF 2.0.
edit: I have found a strange workaround.
In the simple case I add each column's value to a dict that I pass to tf.data.Dataset
:
for c in dataframe.columns:
input_dict[c] = dataframe[c]
dataset = tf.data.Dataset.from_tensor_slices(input_dict)
And it works fine.
In the problematic case I duplicate the dataframe using dataframe = pd.concat((dataframe, dataframe))
before adding values as in the first case
The working workaround is to add the values using:
for c in dataframe.columns:
input_dict[c] = dataframe[c].tolist() * 2
dataset = tf.data.Dataset.from_tensor_slices(input_dict)
which works!
Note that trying to add the column cast into a list will fail if the column contains a dict or something that cannot be converted into a Tensor. But duplicating the list does not throw this error...
@jayantsahewal This does not completely solve your problem since you perform more complicated dataframe manipulation but it could be a starting point.
This trick will skip the issue:
# if df is merged from several data source
# such as df=pd.concat(df_a, df_b)
# create tensor will crash
tf.data.Dataset.from_tensor_slices((dict(df), target))
# but copy the df will fix the issue
df.to_csv('tmp', index=False)
df1 = pd.read_csv('tmp')
tf.data.Dataset.from_tensor_slices((dict(df1), target))
I think this may cause by tensor shape computation, although i dont have time to dig deeper.
I also had this issue doing cross-fold validation and concatenating folds together to form the training set. Concatenated dataframes seem to fail. A simple solution that worked for me was to just reset the dataframe index after concatenating.
df = pd.concat(df_a, df_b)
df = df.reset_index()