Abstract

Keywords: generative AI; text data mining; TDM; exceptions and limitations; DSM Directive; non-enjoyment


I. Introduction

Text and data mining (TDM) is essential for generative AI models to produce content. For example, The Next Rembrandt is a project in which AI uses machine learning to learn the contrast, colour, brush strokes, and geometric patterns of hundreds of Rembrandt's artworks to create paintings in the Rembrandt style. 1 Google's WaveNet project generates different types of music by learning and analysing existing musical compositions. 2 Large Language Models (LLMs), which are deep learning algorithms capable of performing natural language processing (NLP) tasks, can generate novels, screenplays, computer programs, and more. AI models like ChatGPT are trained on massive datasets. Content produced by these AI models are often indistinguishable from those created by human authors, and some have been sold for significant amounts of money. 3

While there has been considerable discussion and research on TDM exceptions and limitations under the EU Digital Single Market (DSM) Directive

TDM involves processing data to identify patterns and trends, thereby transforming it into valuable knowledge. This process can be compared to panning for gold, which requires significant time and effort to find even a small quantity of gold. Similarly, in data mining, if we replace gold with information and panning with algorithms, the concept remains the same. Just as gold mining involves extracting precious nuggets from vast amounts of rock, TDM seeks to extract valuable information from large datasets. 8 However, TDM processes can raise copyright issues if the dataset contains photos, videos, text, or other copyrighted original work. In addition, database producers who have compiled databases are protected under copyright law if the data is creatively selected or arranged to achieve the original database.

The data mining process involves accessing, collecting, storing (copying), transforming, and transmitting original works. From a TDM perspective, the input phases of generative AI can be divided into three steps: data access (step 1), data extraction and reproduction (step 2), and data mining and knowledge discovery (step 3). It is in the second step that legal issues are most likely to arise. 9

The first step, ‘data access’, involves gaining access to the data, which can be either digital or analog data contained within physical objects. This step occurs before the data is reproduced for TDM purposes. It is a pre-processing step that serves as a prerequisite for the actual data collection. In this step, the rights holders may receive economic compensation for the use of their works through the access control, which is exercised by the data provider. The access controller is usually the copyright holder if the data provider is also the copyright holder. However, the database producer - who organises the works into a database and makes them available to the user - may also control the user's access to the data, within the limits of the original copyright holder’s rights. 10 There are two primary methods of access control: contractual control and technological protection measures (TPM), such as login and authentication systems. At the ‘access’ step, the data provider has the ability to restrict not only access, but also the availability and scope of TDM activities.

Next, the ‘extraction’ step involves collecting data, such as works, for TDM and preparing it for analysis. This step may entail reproducing large volumes of data through various methods. Examples of this include the primary reproduction of the data during collection, its reproduction as the primary copy is transformed into input data for TDM processing, and additional reproduction for backup or verification purposes. 11 It is in this extraction step that the most direct legal conflicts with the exclusive rights of intellectual property holders often arise, particularly when TDM activities infringe upon these exclusive rights.

While the first two steps involve preparing the input data for TDM purposes, then ‘mining’ is the core of TDM technology as it involves the substantial analysis of the data to extract meaningful information. At this stage, AI algorithms read the data for analysis, resulting in at least temporary reproduction. In the process of analysing large amounts of data, the data may also be reproduced across multiple servers to facilitate distributed processing. 12

2. Korean Cases Regarding Data Access and Extraction

In Korea, there have been cases where the rights of database producers have been challenged because website operators have accessed others’ databases by crawling, then extracting and using the data. In a number of cases over the last few years, Korean courts have issued rulings on whether the act of crawling unreasonably interferes with the interests of database producers.

However, the appellate court found that the defendant's actions, which involved the extensive reproduction of the plaintiff's site content using mirroring techniques and then posting it on its own site for commercial purposes, constituted unauthorised use for its own business in a manner that violated fair trade practices and competitive order. In addition, given the potential for public confusion between the plaintiff's business sign and the defendant's site name, the defendant's actions were found to divert Internet users from the plaintiff's site to their own, thereby harming the plaintiff's economic interests, including a reduction in advertising revenue. Consequently, the court ruled that the defendant's actions violated Article 2 (i) (j) of the former Unfair Competition Prevention Act [currently known as Article 2 (i)(m)], and awarded damages to the plaintiff. 16

In the Saram-in case, 17 the defendant (Saram-in) was accused of mass copying the HTML source code of job postings from the plaintiff's website (Job Korea) using a crawling method and then posting this content on its own website for commercial use without authorisation. The district court ruled that these acts of mass copying from the plaintiff's website and using the content for commercial purposes without permission constituted unfair competition under Article 2(i) of the Unfair Competition Prevention Act.

3. The Need for TDM Exceptions and Limitations

Ideally, to avoid copyright infringement, AI developers should obtain licenses from copyrights holders to use their works as training data. 21 However, practical issues such as the sheer volume of works, difficulties in identifying copyrights holders, and excessive licensing fees make it difficult to obtain copyright licenses for all training data. The problem arises when only data that is readily available or legally licensed is used for training, as this severely limits the variety and volume of data that can be used. Such limitations can lead to the problem of AI bias, where the AI system does not have a sufficiently broad and diverse dataset for training. Training data reflects reality, and just as there are biases in the real world, there can be biases in the data itself. 22 The most effective way to mitigate these inherent biases and discriminators in the data is to increase the size of the training dataset.

1. The EU DSM Directive

The DSM Directive defines TDM as ‘any automated analytical technique aiming to analyse text and data in digital form to generate information, such as patterns, trends, and correlations’. 26 It also includes ‘the automated computational analysis of information in digital form, including text, sounds, images, or data’ facilitated by new technologies (Recital 8). Article 2(2) provides a comprehensive definition that captures the potential of tools capable of autonomous or semi-autonomous analysis of large amounts of data.

The European Union also recognizes a separate sui generis database right in the Database Directive (96/9/EC), 27 which applies to the contents of databases in which significant investment has been made to obtain, verify or present the data. This sui generis database right is usually granted to database producers. Therefore, when datasets and individual works are utilised in the TDM process, both the rights of database producers and the rights of authors may be infringed.

To facilitate the smooth use of copyrighted works and datasets in the data mining process, the EU introduced two mandatory exceptions for TDM within the DSM Directive (Articles 3 and 4), and Member States are under an obligation to implement both of them. Article 3 is particularly imperative as it specifically applies to TDM carried out for scientific research purposes in research and cultural institutions. 28

The objective of Article 3 is to establish a mandatory exception under EU copyright law. This exception permits acts of reproduction and extraction by research organisations and cultural heritage institutions for the purpose of conducting TDM in scientific research. Furthermore, contracts cannot prohibit these institutions from opting out of TDM for scientific research purposes. Additionally, Article 7(2) allows for the circumvention of TPM specifically for TDM activities. 29

Article 4 is similar to Article 3 but with notable differences. While Article 4 allows anyone to use copyrighted works for TDM, this permission can be explicitly overridden by rights holders through an ‘opt-out’ or ‘contract-out’ mechanism. In other words, rights holders have the option to exclude their works from being used under this provision. 30

As a result, if rights holders expressly reserve the use of works and other subject matter through an opt-out or contract-out mechanism, entities such as businesses, governments, citizens, journalists, and anyone else who is not part of a research or cultural organisation acting for research purposes will need to obtain specific authorisation from rights holders to develop AI. Conversely, in the absence of an opt-out or contract-out, reproductions and extractions for TDM may be retained for as long as necessary for the purposes of the TDM activity, as set out in Article 4(2).

In summary, Article 3, which pertains to scientific research conducted by research and cultural institutions, does not permit ‘opt-out’ or ‘contract-out' arrangements, nor can TPM be used to prevent these organisations from accessing, copying, and extracting copyrighted works. In contrast, for uses other than scientific research, TPM and contracts may override Article 4(2) by allowing rights holders to reserve the rights to their works and other subject matter. The DSM Directive distinguishes its provisions based on the intended purpose of the use, and sets them out in different articles accordingly.

With the enactment of the DSM Directive, EU Member States were required to implement the Directive into national law by 7 June 2021. While these amendments closely follow Articles 3 and 4 of the DSM Directive, there are some differences in form and content. This is because the amendments were enacted within the framework of existing national legislation, using the DSM Directive as a baseline.

Article 30-4 allows users to freely exploit works without the rights holders’ permission, provided that such exploitation is not for the purpose of enjoying the ideas or feelings expressed in the copyrighted works. Article 30-4 lists the types of uses that are not for the enjoyment of the ideas or feelings expressed in the works as follows: a) experiments for technological development (Article 30-4(i)), b) TDM (Article 30-4(ii)), and c) other uses that cannot be perceived by human senses. 32

Article 30-4(i) permits the use of works necessary for experiments in the development and use of technology. For example, if a company is researching or developing high technology for a movie player, it would be helpful for the company to experimentally copy a cinematographic work in order to evaluate the quality of the technology. Article 30-4(ii) permits the use of all works for TDM. Article 30-4 (ii) allows users to use all copyrighted works in any manner to the extent deemed necessary unless such exploitation unreasonably prejudices the copyright holder's interests regardless of whether the use is for commercial or non-commercial purposes for TDM. Article 30-4(iii) permits the exploitation of works that do not involve the perception of expressions in the work through human senses, such as exploitation in the process of computer data processing. 33

Article 43 (Reproduction and Transmission for Information Analysis)

(1) Reproduction or transmission of works is allowed to the extent necessary for the creation of additional information or added value (extraction of information such as rules, structure, tendency, and correlation, etc.) from a large volume of information including a number of works by applying automated analysis technology of computers, if such creation is possible without enjoying the ideas or feelings expressed in such works. Provided, however, that this shall be permitted only if lawful access to the works is available.

(2) Reproductions made under paragraph (1) may be kept to the extent necessary for the analysis of information.

Under the provisions of the amendment bill, data mining activities are permissible provided that the works are not used for personal enjoyment and are used only to the extent necessary, given that lawful access to these works is available. Consequently, the requirements for employing works in data mining under this bill are threefold: ‘lawful access’, ‘non-enjoyment’, and ‘the necessary extent’.

a. Lawful Access

The policy context for ‘lawful access’ is suggested in the ‘Impact Assessment for the establishment of the DSM Directive’ 34 , which mentions ‘the removal of legal uncertainty regarding the ability of researchers to perform TDM on lawfully accessible content’. Europe has a long academic and cultural tradition in which large publishers and media organisations accumulate and commercially provide access to large volumes of textual data that serve as valuable sources for TDM. 35

As a result, an important policy issue in Europe has been whether users, who have access to data and are allowed to use it in a permitted manner under the assumption that the data provider controls access, should be required to obtain an additional license for TDM. 36 This is evidenced by the fact that a frequently cited phrase in the discussions about the TDM exception in the DSM Directive has been the right to read should be the right to mine. 37

In addition, if enacted, the TDM exception in the bill, which includes the ‘lawful access’ requirement, could significantly impact the protection of database producers’ rights. This could reignite the debate over the current form of regulation of these rights. Such an outcome is an important consideration in the design of the TDM exception.

b. Non-Enjoyment

In the Google case, the court held that Google’s reproduction of copyrighted works to provide the public with ‘factual information’ about the texts, regardless of how many times a particular search term appeared in the text, was fair use. Similarly, in iParadigms , the court found that iParadigms’ reproduction of student papers for plagiarism detection purposes was fair use. This decision was based on the reasoning that comparing textual similarities between works does not relate to their creative elements.

c. The Necessary Extent

d. Whether or Not Acknowledgement Is Required

1. South Korea

Korea’s fair use clause was introduced during the negotiations for the Korea-US Free Trade Agreement in 2011, adopting the four-factor test from the US fair use doctrine. Unlike the United States, which has a longstanding practice of fair use supported by a large body of case law, Korea's experience with fair use is relatively recent and lacks a substantial body of case law. This is especially pertinent in the context of TDM and generative AI, areas in which the United States has developed significant case law. Consequently, the US fair use precedents, particularly those involving TDM and generative AI, could serve as valuable references for Korea. Therefore, this discussion will examine the fair use provisions as they exist in the United States.

2. The US

The determination of ‘fair use’ requires a case-by-case analysis based on four non-exclusive factors set forth in the statute, taking into account the purposes of copyright. 50 This section examines how these factors apply to the fair use of works for data mining. This examination will be conducted by analysing court decisions that have established conducting computational analysis as an example of fair use.

a. Four-Factor Analysis for TDM

The first factor concerns the purpose and nature of the use, including whether the use is commercial or for non-profit educational purposes. Courts have generally found that extracting information from searchable databases or search engines constitutes a highly transformative act that often qualifies as fair use. For example, in Authors Guild v HathiTrust , the court identified transformative work as one that serves a new and different function compared to the original work, rather than serving as a substitute for it. 51 In Perfect 10, Inc. v Amazon.com, Inc. , the court found that the use of copyrighted thumbnail images in internet search results was transformative. This decision was based on the understanding that the thumbnails served a function distinct from that of the original copyrighted images. 52

In Kelly v Arriba Soft Corp , the Ninth Circuit ruled that the systematic and institutional copying of images for the transformative purpose of operating a commercial image search service constituted fair use. 53 Similarly, in Authors Guild v Google , the court concluded that Google's systematic and institutional copying of books, to enable a full-text search yielding snippets of text containing the search terms was also fair use. 54

Transformative use is characterised by its ability to convey a different meaning or message. It involves the addition of new elements with a different purpose or character, altering the original work by giving it with a new expression, rather than simply replacing it. 55 This concept becomes particularly relevant when a secondary user makes unauthorised copies of copyrighted material, often with the intent to profit by copying the original work without compensating the rights holders. 56 In such cases, the degree of transformation imparted by the new work inversely affects the significance of its commercial intent: the more transformative the new work is, the less its commercial nature affects the fair use analysis. 57 As a result, TDM that adds value to the original work without compensating the rights holders may meet the criteria of the first factor for fair use. This is the case when the copyrighted expression in the original is used as raw material that is transformed in the process of creating new information, aesthetics, insights, or understandings. 58 The cases discussed above illustrate this condition. In these cases, the court found that the use of copyrighted works for computational analysis was transformative.

The second factor considers the nature of the copyrighted work. This aspect acknowledges that some works are more central to the core objectives of copyright protection than others. 59 Where a secondary use incorporates copyrighted materials but does not make the original works easily identifiable, courts have often leaned towards considering such use as fair. This factor generally favours the rights holders when the secondary use prominently features the copyrighted material. 60 However, in cases where the original work is not discernible in the secondary use, like in Authors Guild, Inc. v Hathitrust , the use is more likely to fall under fair use. 61

The third factor in the fair use analysis evaluates the amount and substantiality of the portion used in relation to the copyrighted work as a whole. This factor assesses whether the secondary use incorporates more of the copyrighted work than necessary. While copying an entire work generally weighs against fair use, courts have found this factor to be neutral in the context of the computational analysis. For TDM to be effective, particularly in research or AI-related sectors, it often requires access to the full text or entire datasets in order to extract meaningful information. This need closely links the third factor closely with the second: if the original work is not identifiable in the output of TDM, despite the use of the entire copyrighted material, such use may still qualify as fair use.

The fourth factor assesses the impact of the secondary use on the potential market for, or value of, the copyrighted work. This consideration includes not only the potential harm that displaces demand for the original, but also the impact on markets for derivative works. 62 The key focus of this factor is on ‘the harm resulting from the secondary use acting as a substitute for the original work’. 63 However, if TDM as a secondary use is transformative and the original works are not recognizable in it, then that use does not act as a substitute for the originals. Transformative uses generally diminish the importance of the fourth factor. The more the copying is intended to serve a purpose different from that of the original, the less likely it is that the copy will serve as a satisfactory substitute for the original. 64

In summary, data mining that adds value to the original work without compensation and serves a different purpose from the original may be considered transformative. 65 Furthermore, if the output of data mining is transformative and does not resemble the original, despite the use of the entire copyrighted work, it is unlikely to act as a substitute or threaten the potential market for the original works. 66 When all four statutory factors are considered together, TDM often qualifies as fair use. In addition, the circumvention of TPMs for the purpose of TDM does not negatively affect the fair use inquiry under 17 USC § 1201(c).

b. Generative AI’s Output and Fair Use

Generative AI that generates images often makes it easy to identity the original works in the output. Due to this, rights holders argue that AI tools are illegally scraping their works for use in training datasets, and by using copies of images from training datasets, they are generating derivative digital images and other outcomes that do not add anything new. In the past, to obtain images in a particular artist’s ‘style’, one had to commission the artist or acquire a license.

With generative AI, however, it has become possible to create works in the style of a particular artist without any cost, and these creations have begun to compete with original images in the market. Here, the term ‘style’ refers to a manner of creation that can be perceived as similar to what that particular artist would have produced. 67 If AI-generated outputs are considered to be copyright infringement, the question arises as to whether TDM for generative AI might not also be considered fair use.

For example, according to the complaint in the Getty Images lawsuit, which is one of the ongoing generative AI cases in the United States and the United Kingdom, plaintiff Getty Images alleges copyright infringement against defendant Stability AI, claiming that Stability AI extensively replicated some images for model training purposes.

In January 2023, artists Sarah Andersen, Kelly McKernan, and Karla Ortiz filed a putative class action lawsuit in California against Stability AI, Midjourney, and DeviantArt. 68 The plaintiffs allege that Stability AI’s generative AI tool, Stable Diffusion, was developed using ‘billions of unauthorized copyrighted images’ to train its software. The lawsuit includes claims of copyright infringement, violation of the right of publicity, unfair competition, and breach of contract against the companies. Both Midjourney and DeviantArt utilise Stable Diffusion’s text-to-image software. However, on 30 October 2023, US District Court Judge William Orrick found the plaintiffs’ claims ‘deficient in numerous respects’. He ruled that only the direct infringement claim related to Stability AI’s alleged ‘scraping, copying, and use of training images to train Stable Diffusion’ could proceed. 69

In these lawsuits, AI model platforms are expected to assert fair use defences based on the previously reviewed cases. However, in the case of generative AI, because there are instances where the outputs are nearly similar to the original works, it remains uncertain whether this will also qualify as fair use.

As demonstrated in the Authors Guild, Inc. v Hathitrust case, the second factor often weighs in favour of fair use when it is challenging to recognize the original work in the information abstracted through data mining processes. However, if the outputs resulting from TDM are clearly recognizable as the original works in their secondary use, it becomes difficult to argue for fair use.

In addition, AI-generated works may be less likely to qualify for fair use if they are used for the same purposes as the original works. This can be inferred from the recent Andy Warhol case, 70 which provides guidance on whether AI-generated works can escape liability for copyright infringement through a fair use defence. 71 In this case, the US Supreme Court ruled that using Warhol’s silk prints as magazine cover images did not constitute fair use, as it served essentially the same purpose as using Goldsmith's black and white portrait. The Andy Warhol case is likely to have a significant impact on copyright infringement decisions involving AI-generated works. 72

V. Conclusion

The increasing emphasis on the importance of data alongside the advancement of AI technology makes it clear that TDM technology will significantly contribute to the enhancement of human knowledge and convenience. However, since TDM involves the reproduction of large amounts of data, it inevitably comes into conflict with copyright law, which exclusively protects data and databases.

In this context, this article presents and compares the recent legislative developments in Korea with the trends in major countries regarding TDM exceptions. Although each country’s legislation has unique characteristics, a general trend towards relaxing non-profit and non-commercial requirements for TDM has been observed.

Notes

[2] A van den Oord et al, ‘WaveNet: A Generative Model for Raw Audio’ arXiv:1609.03499v2 < https://doi.org/10.48550/arXiv.1609.03499 > accessed 20 October 2023.

[5] Microsoft, GitHub, and OpenAI are facing a class action lawsuit that alleges copyright infringement. The lawsuit claims that Copilot, their code-generating AI system, which was trained on billions of lines of publicly available code, reproduces licensed code snippets without proper attribution ( DOE 1 et al v GitHub, Inc. et al, 4:2022cv06823); Midjourney and Stability AI, the companies behind popular AI art tools, are facing a lawsuit alleging that they infringed the rights of millions of artists. The suit claims that their tools were trained on images scraped from the web ( Sarah Andersen v Midjouney, 3:23-cv-00201); Getty Images took Stability AI to court for using millions of images from its site without permission to train an art-generating AI (Getty Images v Stability AI 1:23-cv-00135-UNA); Richard Kadrey and other plaintiffs allege that the LLaMA language models are themselves infringing derivative works because the models cannot function without the expressive information extracted from the plaintiffs’ books. In response, Meta has moved to dismiss all claims except the one alleging that the unauthorized copying of the plaintiffs’ books for purposes of training LLaMA constitutes copyright infringement ( Kadrey v Meta Platforms, Inc., 23-cv-03417-VC); The Author’s Guild in the United States opened a class-action lawsuit against the Microsoft-backed OpenAI on 19 September due to its alleged misuse of copyrighted material in the training of its artificial intelligence (AI) models ( Authors Guild v OpenAI Inc. , 1:23-cv-08292); The New York Times is suing OpenAI and Microsoft, alleging that OpenAI copied millions of Times’ articles to train the language models that power ChatGPT and Microsoft Copilot ( The New York Times v OpenAI and Microsoft , 1:23-cv-11195).

[6] Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC.

[8] Kibbm Lee and Seok-Jae Moon, ‘Experience Way of Artificial Intelligence PLAY Educational Model for Elementary School Students’ (2020) 12(4) IJIBC, 232-237 < https://doi.org/10.7236/IJIBC.2020.12.4.232 > accessed 10 November 2023.

[10] The structure of this can be observed in the Google Books case. The Google Books project consisted of two main components: the ‘Partner Program’, which was an agreement with publishers and other rights holders, and the ‘Library Project’, in cooperation with libraries. In the latter, Google obtained permission through agreements with libraries to scan and digitally reproduce their collections. Nevertheless, Google faced a copyright infringement lawsuit from the Authors Guild, a rights holder organisation. (See Authors Guild, Inc. v Google, Inc. [2013], 954 F Supp 2d 282, 285-286)

[11] Sag (n 9) 353-357.

[12] This situation arises because AI machine learning and other applications of TDM technology predominantly involve centrally processed data. However, in the case of ‘federated learning’, an emerging area of machine learning that focuses on centralising and minimising data exchange between devices, data collection, analysis, and model building occur solely at the local device level. Therefore, there is no additional data reproduction or transmission between devices. See Li Li et al, ‘A review of applications in federated learning’ (2020) 149 Computers & Industrial Engineering, 106854 < https://doi.org/10.1016/j.cie.2020.106854 > accessed 10 November 2023.

[15] Rigvedawiki v Enhawiki Mirror [2015] Seoul Central District Court 2014Gahap44470.

[16] Rigvedawiki v Enhawiki Mirror [2016] Seoul High Court 2015Na2074198.

[17] Saram-in v Job Korea [2017] Seoul Hight Court 2016Na2019365.

[18] Seoul Central District Court, 9 July 2020 Decision, 2018Gahap528464.

[19] Everytime v Spec-up-ad [2021] Seoul Hight Court 2020Na2036862.

[20] Yanolja v Yogi-eottae [2022] Supreme Court 2021Do1533 (Criminal case), [2022] Seoul High Court 2021Na2034740 (Civil Case).

[21] Licensing data in the realms of artificial intelligence and machine learning could be developed into a common framework, similar to the licensing models used for open source software . See Misha Benjamin et al, ‘Towards Standardization of Data Licenses: The Montreal Data License’ (2019) arXiv preprint arXiv:1903.12262 < https://doi.org/10.48550/arXiv.1903.12262 > accessed 28 October 2023.

[22] Drew Roselli, Jeanna Matthews and Nisha Talagala, ‘Managing bias in AI’ (Companion Proceedings of the 2019 World Wide Web Conference, 2019) 539-544 < https://doi.org/10.1145/3308560.3317590 > accessed 27 October 2023; Eirini Ntoutsi et al, ‘Bias in data‐driven artificial intelligence systems—An introductory survey’ (2020) 10(3) Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1356 < https://doi.org/10.1002/widm.1356 > accessed 27 October 2023.

[23] Nicolas Binctin, ‘TDM: A Challenge for Artificial Intelligence’ (2019) RIDA–Revue In‐ternationale du Droit d’Auteur, 5-32 < http://rida.ideesculture.fr/sites/default/files/2020-02/262-D1VA.pdf > accessed 30 November 2023; Rossana Ducato and Alain M. Strowel, ‘Ensuring text and data mining: remaining issues with the EU copyright exceptions and possible ways out’ (2021) European Intellectual Property Review < https://ssrn.com/abstract=3829858 > accessed 30 November 2023.

[26] DSM Directive, art 2(2).

[27] Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases.

[28] Thomas Margoni and Martin Kretschmer, ‘A Deeper Look into the EU Text and Data Mining Exceptions: Harmonisation, Data Ownership, and the Future of Technology’ (2022) 71(8) GRUR International, 685-686 < https://doi.org/10.1093/grurint/ikac054 > accessed 15 September 2023.

[30] Ibid.

[35] See European Union, ‘A Report on Policies and Good Practices in the Public Arts and in Cultural Institutions—to Promote Better Access to and Wider Participation in Culture’ (2012) <h ttps://ec.europa.eu/assets/eac/culture/policy/strategic-framework/documents/omc-report-access-to-culture_en.pdf > accessed 2 October 2023.

[36] Geiger et al (n 29) 28-29.

[37] Ibid, 30, fn 123.

[38] Ueno (n 33) 149.

[39] Act No 18871, 10 June 2022.

[40] Act No 19289, 28 March 2023.

[41] Supreme Court, 12 May 2022 Decision, 2021Do1533; Seoul High Court, 25 August 2022 Decision, 2021Na2034740.

[44] Authors Guild v Google Inc. [2015] 804 F 3d 202.

[45] A. V. ex rel. Vanderhye v iParadigms, L.L.C. [2009] 562 F 3d 630.

[46] Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases.

[48] Ryu (n 25) 380-381.

[49] FireSabre Consulting LLC v Sheehy [2013] No 11–cv–4719 (CS), 2013 WL 5420977, 7.

[50] Arrow Productions, Ltd., v The Weinstein Company [2014] 44 F Supp 3d 359; Harper & Row, Publishers, Inc. v Nation Enters [1985] 471 US 539, 549, 105 S Ct 2218, 85 L Ed 2d 588; Bill Graham Archives v Dorling Kindersley Limited [2006] 448 F 3d 605.

[51] Authors Guild v HathiTrust [2014] 755 F 3d 87; Dominick Ranieri v Adirondack Dev. Group LLC [2016] No 1.11-cv-1013-GTS-CFH.

[52] Perfect 10, Inc. v Amazon.com, Inc. [2007] 508 F 3d 1146, 1169; Kelly v Arriba Soft Corp. [2003] 336 F 3d 811, 819.

[53] Kelly v Arriba Soft Corp. [2003] 336 F 3d 811, 822.

[54] Authors Guild v Google [2011] 770 F Supp 2d 666.

[55] Folsom v Marsh, 9F.Cas. [1841] 342, 348; Harper&Row Publishers Inc. v Nation Enterprises [1985] 471 US 539, 562.

[56] Cariou v Prince [2013] 714 F 3d 694, 708.

[57] Campbell v Acuff-Rose Music, Inc. [1994] 510 US 569, 579.

[58] Castle Rock Entm’t, Inc. v Carol Pub. Grp., Inc. [1998] 150 F 3d 132, 142.

[59] Campbell v Acuff-Rose Music, Inc. [1994] 510 US 569, 586.

[60] Kelly v Arriba Soft Corp. [2003] 336 F 3d 811, 822.

[61] Authors guild, Inc. v Hathitrust [2014] 755 F 3d 87, 97.

[62] Harper&Row Publishers Inc. v Nation Enterprises [1985] 471 US 539, 568.

[63] Authors Guild, Inc. v HathiTrust [2014] 755 F.3d 87, 99 (citing Campbell v Acuff-Rose Music, Inc. [1994] 510 US 569, 591).

[64] Authors Guild, Inc. v HathiTrust [2014] 755F.3d 87; Authors Guild, Inc. v Google, Inc. (Google Books) [2015] 804 F 3d 202, 222-223.

[65] Castle Rock Entm’t, Inc. v Carol Pub. Grp., Inc. [1998] 150 F 3d 132, 142.

[66] Authors Guild, Inc. v HathiTrust [2014] 755 F 3d 87, 90.

[67] Cong Hung Mai et al, ‘Learning of art style using AI and its evaluation based on psychological experiments’ (2022) 14(3) International Journal of Arts and Technology, 171-173 < https://doi.org/10.1504/IJART.2022.128444 > accessed 26 September 2023.

[68] Sarah Andersen, et al v Stability AI, et al [2023] 3:23-cv-00201.

[69] Ibid.

[70] Andy Warhol Found. for Visual Arts, Inc. v Goldsmith et al [2023] 598 US ___.

[72] Hugh Davies, ‘The Art of Surveillance: Surveying the Lives and Works of Andy Warhol and Ai Weiwei’ in Surveillance| Society| Culture (Contributions to English and American Literary Studies (CEALS) Book 3) (1 st edn, Peter Lang 2020) 153-174; Patrick K. Lin, ‘Retrofitting Fair Use: Art & Generative AI After Warhol’ (SSRN, 2023) 64 Santa Clara Law Review (forthcoming 2024) < https://ssrn.com/abstract=4566945 > or < http://dx.doi.org/10.2139/ssrn.4566945 > accessed 21 October 2023.

Journal of AI Law and Regulation
Volume 1, Issue 1 (2024)
pp. 64 - 76 DOI: https://doi.org/10.21552/aire/2024/1/8