By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please make sure that this is a bug. As per our GitHub Policy , we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
  • You can collect some of this information using our environment capture
    script
    You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

    == check python ===================================================
    python version: 3.7.3
    python branch:
    python build version: ('default', 'Jun 27 2019 23:31:30')
    python compiler version: GCC 5.5.0
    python implementation: CPython
    == check os platform ===============================================
    os: Linux
    os kernel version: #1 SMP Tue Jul 30 17:17:50 UTC 2019
    os release version: 4.9.184-0.1.ac.235.83.329.metal1.x86_64
    os platform: Linux-4.9.184-0.1.ac.235.83.329.metal1.x86_64-x86_64-with-redhat-5.3-Tikanga
    linux distribution: ('Red Hat Enterprise Linux Server', '5.3', 'Tikanga')
    linux os distribution: ('redhat', '5.3', 'Tikanga')
    mac version: ('', ('', '', ''), '')
    uname: uname_result(system='Linux', node='dev-dsk-jsahewal-1b-01f89cec.us-east-1.amazon.com', release='4.9.184-0.1.ac.235.83.329.metal1.x86_64', version='#1 SMP Tue Jul 30 17:17:50 UTC 2019', machine='x86_64', processor='x86_64')
    architecture: ('64bit', 'ELF')
    machine: x86_64
    == are we in docker =============================================
    == compiler =====================================================
    c++ (Homebrew gcc 5.5.0_4) 5.5.0
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    == check pips ===================================================
    numpy                             1.18.1
    protobuf                          3.9.2
    tensorflow-cpu                    2.1.0
    tensorflow-estimator              2.1.0
    == check for virtualenv =========================================
    False
    == tensorflow import ============================================
    tf.version.VERSION = 2.1.0
    tf.version.GIT_VERSION = v2.1.0-rc2-17-ge5bf8de
    tf.version.COMPILER_VERSION = 7.3.1 20180303
    == env ==========================================================
    LD_LIBRARY_PATH is unset
    DYLD_LIBRARY_PATH is unset
    == nvidia-smi ===================================================
    tf_env_collect.sh: line 145: nvidia-smi: command not found
    == cuda libs  ===================================================
    == tensorflow installed from info ==================
    == python version  ==============================================
    (major, minor, micro, releaselevel, serial)
    (3, 7, 3, 'final', 0)
    == bazel version  ===============================================
    

    Describe the current behavior

    2020-01-27 22:56:12.796098: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2800045000 Hz
    2020-01-27 22:56:12.797977: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3bad450 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-01-27 22:56:12.798009: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2020-01-27 22:56:12.806071: F tensorflow/core/framework/tensor_shape.cc:353] Check failed: 0 <= new_num_elements (0 vs. -4523975925047186833)
    [1]    10698 abort      python3 demo.py
    

    Describe the expected behavior
    Should not abort

    Code to reproduce the issue
    Provide a reproducible test case that is the bare minimum necessary to generate the problem.

    import numpy as np
    import pandas as pd
    import tensorflow as tf
    from tensorflow import feature_column
    from tensorflow.keras import layers
    # A utility method to create a tf.data dataset from a Pandas Dataframe
    def df_to_dataset(dataframe, shuffle=True, batch_size=32):
        dataframe = dataframe.copy()
        labels = dataframe.pop('target')
        ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
        if shuffle:
            ds = ds.shuffle(buffer_size=len(dataframe))
        ds = ds.batch(batch_size)
        return ds
    URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
    dataframe = pd.read_csv(URL)
    # Replicate the dataFrame to make it a bit bigger
    temp = dataframe.copy()
    for _ in range(10):
        dataframe = pd.concat([dataframe, temp.copy()])
    # Replicate the columns to make it a bit bigger
    col_lst = dataframe.columns.tolist()
    for i in range(10):
        for ind, col in enumerate(col_lst):
            dataframe[f'{i+1}_{ind}'] = dataframe[col]
    batch_size = 5 # A small batch sized is used for demonstration purposes
    dataframe_ds = df_to_dataset(dataframe, batch_size=batch_size)
              

    I'm having a similar issue. Everything's ok if I create a dataset from a pd.DataFrame containing 100k samples. However, if I open a pd.DataFrame containing 1k samples, then duplicate it to 2k samples, it crashes.

    I'm working on TF 2.0.

    edit: I have found a strange workaround.

  • In the simple case I add each column's value to a dict that I pass to tf.data.Dataset:
  • for c in dataframe.columns:
        input_dict[c] = dataframe[c]
    dataset = tf.data.Dataset.from_tensor_slices(input_dict)

    And it works fine.

  • In the problematic case I duplicate the dataframe using dataframe = pd.concat((dataframe, dataframe)) before adding values as in the first case
  • The working workaround is to add the values using:
  • for c in dataframe.columns:
        input_dict[c] = dataframe[c].tolist() * 2
    dataset = tf.data.Dataset.from_tensor_slices(input_dict)

    which works!
    Note that trying to add the column cast into a list will fail if the column contains a dict or something that cannot be converted into a Tensor. But duplicating the list does not throw this error...

    @jayantsahewal This does not completely solve your problem since you perform more complicated dataframe manipulation but it could be a starting point.

    This trick will skip the issue:

    # if df is merged  from several data source
    # such as df=pd.concat(df_a, df_b)
    # create tensor will crash
    tf.data.Dataset.from_tensor_slices((dict(df), target))
    # but copy the df will fix the issue
    df.to_csv('tmp', index=False)
    df1 = pd.read_csv('tmp')
    tf.data.Dataset.from_tensor_slices((dict(df1), target))

    I think this may cause by tensor shape computation, although i dont have time to dig deeper.

    I also had this issue doing cross-fold validation and concatenating folds together to form the training set. Concatenated dataframes seem to fail. A simple solution that worked for me was to just reset the dataframe index after concatenating.

    df = pd.concat(df_a, df_b)
    df = df.reset_index()